Derek Ferguson — Machine learning in Java from nothing to production in one hour



you thank you very much well good morning so as was mentioned my name is Derrick Ferguson I'm head of engineering for JPMorgan Chase's commercial bank and today we will talk through tensor flow for Java developers so before I get started just a show of hands how many people have worked with tensor flow before okay just a few how many people have worked with any kind of machine learning a few more how many folks have worked with Java excellent all right that's that's a good start so that's really the only skill set that you need for this presentation let me start by sort of setting the stage here so the request your business wants you to investigate machine learning you know they've seen it on television they've heard it on the radio they're sure this is the next big thing they want you to find a suitable problem within your organization for machine learning so find the problem and then solve it but the challenge is we assume no prior background in machine learning so we'll go through very briefly what machine learning is in this session then we assume no prior background in advanced math or data so when you start to study this area the vast majority of documentation that's out there starts from a perspective of advanced math and theory and then it works towards the practical we're actually gonna work in exactly the opposite direction this morning I'll start from the practical and then we'll get a little bit more into the Mathi stuff but not too deep I promise it's like I said very pragmatic quick intro to machine learning then an equally pragmatic quick intro to tensor flow so you know why tensor flow tensor flow can be made to work with Java it requires some effort but we'll see how to do that and it's sort of you know it's becoming a de facto standard just in terms of usage out in the community its originated by Google it's an open source initiative so it's got a lot of traction there are other things out there obviously but that's what we'll queues in this session and then we'll do for live Java coding demonstrations so if you like an element of risk in your presentations hopefully this will this will satisfy that need so what is machine learning alright so I like to think about it in you know this is these are sort of like the most simplistic terms this is how I does how I explain it to like friends and relatives who aren't computers so you teach a computer to solve 2 plus 2 minus 1 simple arithmetic right that's been around since Charles Babbage's analytical engine in terms of computer capabilities 2 plus fill-in-the-blank minus 1 equals 4 algebra hmm I even think that I think Babbage's analytical engine was even able to do stuff like this or maybe came in a later version of you know big machinery based computers that's been around for a long time 2 3 1 is inputs give me the for a computer being able to go in and actually fill in those operations that's where you start to get into machine learning by providing numerous data examples of inputs tagged with outputs and having the computer inductively reason ok if I see this set of inputs this many times paired with this set of outputs I will eventually through machine learning be able to tell you this is the pattern this is the model that you need to use in order to produce these outputs out of these inputs now this has a very sort of close relationship with the scientific method so you know imagine that the business problem that you're coming up with is something about like you know customers and the products in which they might be interested so you have a theory that based on a bunch of factors about your customers you can deduce the products upon which they might be interested ok fine so that's sort of your generalized hypothesis what's the first thing that you're going to do in order to work this out you're going to want to get your hands on a bunch of information a bunch of data about your customers and the products that they've historically shown an interest in that's your training data now there winds up being an interesting thing it's sort of like this rule of thirds where what you will typically do is you'll take the set of training day that you have which is typically a rather large set and you'll divide it into thirds the first two thirds you put through your inference engine which will be tensorflow for us as we'll see shortly and the inference engine uses that two thirds of your data to devise a proposed model so it keeps seeing all these pairings between our eight customers of this type seem to be interested in these products customers of this types and the interest in these products so on and so forth therefore I think the model that would explain this behavior is as follows you've used two thirds of the data at your disposal at this point in time now what you basically do you're ready for the experiment you use the last third of your data to go through and say all right if this is what I think my model is let me pass my last third data through that model and check it because in this case I have both the inputs and the outputs and let me see if the outputs from my model match reality to the degree that you see that that's the case that's the accuracy of your model so that's sort of like testing your hypothesis and the scientific method all right so I think I mentioned in passing a lot of data folks who know hardly anything about machine learning have just through osmosis I think realized all right there's some connection between big data and machine learning let's let's talk about why it is that you need that much data so once again very simple example in fact this is like this has got to be the simplest machine learning model that you can possibly have right to two operands to two inputs and one operating method zero and zero producing zero could be multiplication could be addition could be subtraction we're not sure we add in a second data point by the way I love this one one point five and three everyone sees it and the first thing you think is well all right so it's addition but of course then you think about for two more seconds in your eyes well no that could be most location also okay so we've eliminated subtraction at this point in time finally we get a six and a seven producing of 42 we've narrowed down okay this simple model seems to be multiplication but in actuality you know one thing that tells us that that's not nearly gonna be enough for us to have a reliable model even for this case we need exponentially more data to really build some certainty that yes we're really talking about multiplication in this case that's already an exponential need now think about a more real-world example in a real-world example this would be considered a very small model even and I've actually gone through the legwork of checking this out there are tons of ways to produce negative 2 out of the bottom of the values on the left it gets to a point very quickly where in order to use an inference engine you need reams and reams of data to really build a reliable model thus the relationship between the recent surge and interest in machine learning and the slightly less reagent surge and the availability of Big Data technologies 1 is really being able or for the other the fact that we can now tap into huge amounts of data is the reason that machine learning which has been the concepts have been around for decades and decades in academia this is the reason that's now actually practicable and being used by lots of folks in sort of mainstream computing the next objection that I often tend to hear in discussing these things is well you know what the data that we're looking for is not numeric all right there's really no such thing as non numeric data let's suppose that you were working in the area of health surveys and you want you suspect that there's some relationship between like smoking and health issues you can you would probably send out a survey to get people's health behaviors and then you'd also you'd look up to find out what their health histories are and find out what sort of conditions they've had you can take every question on that survey and you can encode the answers to it as numbers 0 equals no 1 equals yes you can then encode their health outcomes as numbers also it's very straightforward images in time data are just number matrixes let's talk about time data let's talk about the example of you want to be able to predict when an airplane is going to is going to have some sort of serious mechanical issue there are thousands of sensors in a modern airplane right on every apparatus within within the airplane and they're constantly sending back to limit REE you can take feeds you can take samples from that telemetry at different time intervals those become your rows and a matrix and the individual monitors become your columns that's a matrix I threw this slide in for two reasons I'm gonna go out and 11 think pretty much everyone in the room knows that images are basically matrix sees also but besides demonstrating that with an image this is an example that you will see when you start looking into tensor flow and depth right off the bat I promise it's called the M nest data set it's 60,000 handwritten digits each of which have been electronically tagged with what that digit is it's used all over the documentation as a good sort of starter training set and the fact that 60,000 is kind of useful because it makes the rule of thirds pretty you know 40,000 for training 20,000 for testing your prediction afterwards all right so what is tensor flow tensor flow is an open source C++ library for Java so right off the bat and the fact that it's not a Java library and of itself not ideal for our purposes but it's really you know it's nothing too stressful you've got J and I there's all sorts of different ways to get into C++ code the wrapper that has been built for it is Python um I think I say ninety percent really think 90 percent plus when you go out and you look at the documentation pretty much all the documentation is about Python Python Python so job integration requires craftiness or at the very least it requires being very good with Google and being able to piece together a bunch of fun stacktrace which hopefully this presentation will save you from having to do that let's take a step back and talk about how the name sort of relates to the underlying functionality what are tensors so tensors were a pre a previously existing concept in mathematics they have types they have ranks which is a fancy word for a number of dimensions and they have shapes which is the size of each of their ranks a K the size of each of their dimensions so let's see that with just some simple examples so if you've got a rank zero tensor that's a scalar that's three or negative seven point five or what have you single value Rank 1 means you have an array and since you have an array for the very first time you have to provide a shape in order to describe this also so rank one shape five five element array pretty straightforward Rank 2 shape three comma three got three by three matrix keys cake right little more about tensors tensors may be contained in tensorflow by variables this is super useful because if you couldn't change the value of tensors you'd never be able to evolve your model right you'd have to guess it the first time and then it doesn't add any value at all so variables important those variables can be mutated by operators with operations so the operations are the things that will actually go in and as you pass data through your model to try to train it these are the things that will influence how those tensors get changed to be an accurate model to reflect your situation now tensors variables and operations are the three kinds of things in tensor flow that are contained by a graph and the graph of it just like you know a database for tensors and variables and operations so why is it called tensor flow well the technical reason is that this representation is what's known as a stateful data flow graph as a simpler way of thinking about it when you create these graphs which are these databases of your tensors and you connect to them with a session you start feeding your training data through it it's basically thought to think of it as flowing through your tensors and as it flows through the tensors they evolve and mutate there there's there are values to match the data that they're seeing so they give you a nice model that's truly representative of your underlying data I think we already talked about sensors train via mutation two variables models make prediction after training that's the whole reason for doing this right you don't just create them all them and say oh that's great after you have the model you want to be able to give it to folks or use it yourself to make predictions if we stick with the example I had earlier of customers and they're interested products it's not so much for historical reasons that I'm going through and saying ok what products have different kinds of customers been interested in it's probably because in the future I want to say ok I know you're a product you're a customer of this type therefore I'm probably gonna market the following products to you that's the name of the game right so what we're gonna see in the first live demo demo is the absolute simplest thing I could think of just to make sure that tensorflow is set up once you get your hands on it two tensors x and y connected by an addition operation so what we're gonna do is we're going to connect two tensor flow from Java we'll put in two values we'll ask it to add it give it back to us that's can be anything simpler than that right all right so how are we going to make the connection there's a tensor flow Java SDK it's available as a maven dependency this actually this is one of the few things out they all say just works basically you take that you put the entry in your palm dot XML file it goes also available in other formats there's a grave Vil a Gradle version of it I think tensorflow Lite also allows basic tensorflow in Android Java but I'm not going to speak to that at all in this session just calling out that you know that's also a another variant of Java that has some tensorflow for it so here's the crux of the presentation and what we're going to be spending the next 45 minutes on the Java SDK only wraps about 20% of the tensorflow classes so the good news here is that the 8020 rule does apply you can do about 80% of the stuff with 20% of the classes but once again where I have to be a little bit crafty with some of this that leads to the question what's in the 20% basic model creation well that's good we want to be able to create models model training you can train the models excellent in line predictions meaning that once you have that model you can load it into your Java process space and say hey give me a prediction based on the following inputs and we'll be back the response and predictions across the network so if you have different infrastructure that you want to run your tensorflow infrastructure on but you want to run your java over here and just make the call to get predictions that's all supported also so what could possibly go wrong with any of this right all right so what are the classes that we're going to see in demo number one or tensorflow tensor generic type this is this is really the underpinnings of everything very nice that this is that they've chosen to express this as a generic type in Java because it means you can use that same class for any sort of you know underlying primitive that you want output this is the class allows us to reach into a running graph and examine the values that are in it for example the answer you know the prediction that we want graph as I already mentioned this is think about this is like a database of tensors and session so if the graph is your database the session is your connection to the basse nice analogy all right so it's Showtime demo 1 so note I'm running on VirtualBox I found that setting up tensorflow in a development environment it's very easy on Ubuntu a little more complicated on Mac OS so if you have the option I would go with this this is our source code let's talk through this a little bit to be on with note so you know I've added in the library this point is just entry and found out XML I import the organ tensorflow dot star package that's good all you really need to do their main spin up a graph so now we've got an empty database quote-unquote ready to hold our tensors and I've got two helper operations that we'll look at next one adds constants and it takes a reference to the graph a nice human friendly name that we would like to refer to the tensor as that we're going to add and a value which in this case will be two we call the same thing we'd give it the name of c2 in a value of 12 okay then we add an add operation we pass in the graph plus the two tensors that we just created to say okay this operation is going to run on these two tensors fine then we spin up a session against that graph and we use the runner method on our session which is just a built-in to fetch this operation directly out of the setup graph and run it and get back the first value that it that it produces which is gonna be 14 I considered making that an audience question to see if anybody could add two and twelve but I'll skip over there and just say it's going to be 14 if it isn't then there's a real problem with my demo so let's take a quick look at the helper methods will go with add constant first so add constant starts by calling the create method on the tensor class nice factory method you pass it in the value it gives you back a tensor and notice it even little it'll type it for you so that's great really really really important that you understand at that line of code you have a tensor but it's not in a graph it's it's useless okay so this is sort of a stumbling point for it because you know folks well I've got a tensor it's not associated with graph yet after you create the tensor you have to use an op builder operation off the graph tell it the type that you want which in this case is going to be a constant remember it could also be a variable the type which you can take directly off the tensor and the value which is going to be the tensor itself you call build on that that puts the tensor in then you call output which gives you a handle back which we're going to want because we need to return it then an add operation is very similar but because this is an operation we don't need to create a tensor for it we just called G dot builder with the add in give it a friendly name like the big adder we add the two the two tensors in this case that we passed in as inputs once again build it meaning putting in the graph and then we output it so we've got a handle to it so that's all the code let's go ahead and run this and see what sort of a day I'm going to have all right so the good news is we got back 14 that would be a problem if we hadn't let me call your attention to this red text here because I guarantee you when you run this so try if you remember anything out of this presentation try to remember this bit because you will see this and it will panic you at first so you see this red all this is telling us is when this tensorflow binary was built it was not compiled to use the instructions some of the special instructions for my simulated CPU on the development desktop this could not possibly matter any less if you're going to production obviously this could be very substantial to you particularly if it's in your training environment you're gonna want a binary that's optimized for your CPU infrastructure because remember here be passing tons of data through this every millisecond is gonna count right but on a development desktop when you see this red don't freak out all right so let's do something a little bit more interesting let's imagine the case where you want to do some sort of a health survey or the relationship between cigarette consumption and CHD deaths per hundred thousand within a certain age group all right so the first thing you do is you try to get your hands on some data about adult cigarette consumption and then you'd want to get your hands on some data about deaths and what you do is you plot the thing you know or the thing you're going to know in the future when you're going to make your predictions across the x-axis and you plot the thing you're going to want to predict against your Y access so if we did that for this underlying data and I just pull this off the internet we can see it's got a nice you know cloud shape that goes from the lower left-hand corner up to the upper right-hand corner so we look at this and we think to ourselves well what would a model that was able to predict the number of deaths based on cigarette consumption what would that look like obviously you know the answer is we'd won the line that goes through that data um as much in the middle of all of the dots as possible now you know I sort of promised at the outset we wouldn't get into math I promise this is the absolute most complicated equation will have the entire talk y equals MX plus B so this is that's that's how you express a line you know with Y in X basically being the data points you pass in and then your m and your B values being the slope and the why of set so the question becomes how can we get ourselves into a situation where by poking in all of these different data points giving that to tensorflow tensorflow can set the line for us aka this is known as linear regression so that's evolving model when a class that adjusts them all to fit the data that's moving the line into position tensorflow provides the gradient descent optimizer actually it provides a few classes the do linear regression gradient descent optimizer is like I guess the big one you know that fits 80% of your use cases so that's great of course it's not in the 20% and would have guessed that right so what do we do about this well the nice one of the nice things about tensor flow is that all of the underlying data structures everything that's in a graph everything that goes across the wire etc etc is stored in protobuf format now I'm not going to get into the details of protobuf but it's basically it's a Google binary standard for communication and storage and typically you have a protobuf definition file that defines your data structure then there are code generators for all sorts of different languages that can take that file and give you you know code in those languages that represents that data structure which you can use to encode it or decode it's so from all sorts of different languages you can put stuff into files some across the wire so and so forth you can download a binary of an untrained GDL model online with no problem that's there there are many places that have just put untrained great in to send optimizers online specifically because they realize folks that aren't using Python don't necessarily have direct access to this we said stores protobufs just in case everything else were to change before between the time I created this deck and the time I'm given this presentation today I went off to Apogee and I actually wrapped a little Python script for everyone that you're welcome to use and basically you pass in three values I give you back a starter gradient descent optimizer model you provide the initial slope you provide the initial y-intercept you can see those and the at the end of the URL here and you provide the learning rate so slope is pretty obvious y-intercept it's pretty obvious what's the learning rate the learning rate is closely related to a question I suspect might have occurred to many of you when you realize you had to provide initial slope and y-intercept which is I thought the tensorflow was going to do that for me well it will so where to start now here's the thing if you were actually going to go out and try to do a model that related smoking to health issues chances are you probably already had some sort of gut instinct about what you thought you were going to find right that's the case with pretty much every model you folks don't start their research without any sort of opinions on what they're likely to find so by choosing a starting point all you're really doing is reducing potentially if your guess is correct the amount of data that you need to use to get your model in line and the amount of time that you need to run your training of course if you make a bad guess it's going to take even longer so if you don't know if you're really just sort of fishing out there in the data to find out what sort of patterns might exist you can start with a flat line that's fine flat line in the middle of your data set that will work the learning rate is the thing that is going to determine how much tensorflow moves the line with each new data point you provide during training to get a model that matches your data so folks here this and the first thing you think is well you know I'm gonna set that fairly large because I want to get this done quick and I don't have much data thing is you don't want to set it too large let me show you why so in this case let's imagine the blind was originally horizontal and we poked in the data point that was sort of high the line has jumped from horizontal clear over that data point because it moves by the amount of the learning rate now we poke in a second data point and it sees oh all of my data is beneath where my line is let me adjust downwards by the amount of the learning rate and now you've got a line that's completely under your data set then you put in another data point and that's above and so when you set the learning rate too high you can get into a situation where a linear model never can intersect with the data because the granularity is too large the fancy name for getting a line that goes through data by the way is having your model converge but so don't set it to large set it low and have lots of data and if you don't have lots of data a way that you can cheat is by repeating the data that you already have multiple times into an inference engine that will bring down your reliability overall just so you know so it's better to have actually have lots of data like any sort of scientific study but just as a practical note there is a way to cheat if you don't so let's see what happens with a low learning rate data point moves a little another data point moves a little more another data point moves a little more it's sort of like watching ice melt once you get enough data the line goes straight through the middle of it it's not doing the same which jumping over and down and back in sideways and all that sort of stuff let's do demo number two in this we're going to take a look at graph importation so remember we just went out to the internet I didn't actually just go out to the internet I did this before the presentation but we theoretically we just went out to the internet and we downloaded a model which I think I set with a slope of five and a y-intercept of three we'll see shortly so we're gonna see how we can import a graph that we downloaded we're going to see placeholders which are special kinds of tensors that don't have initial values they expect to be fed values in as a part of training and we're going to see feeds which is the way that you push data in and that's that so you all right so here I moved from just using the asterisk to calling out the classes specifically no shocking difference there I could have just continued to use the asterisk but first thing I do has nothing to do with tensorflow other than the fact that the file that I'm reading into this byte array is that default gradient descent optimizer mod I talked about notice that the extension is dot PB that means it's a protobuf that's or at least that's the reason that it has a PB extension by default or it contains proto buses probably mark accurate to say we spin up a graph just like we did before we spin up a connection called a session to that graph and the first new thing we see is import graph def so this takes that default gradient descent optimizer I created and loads it into our graph okay now this next line I know I said if you remember anything remember the thing about the red the red arrow or red the red warning we had before I'm gonna change my mind that would be the second thing the number one most important thing your members of this line here because this without this everything will work fine and development the first time you run this in production on a server that stays up between training cycles you'll get bizarre behavior you won't be able to explain this is tensorflow isse for reset everything to the initial values so you know sort of my hard luck story on that is the first time I loaded tensor flow onto lambda and I tried to do multiple trainings between it because lambda instances will stay up for a little while you know you can keep the machines warm I started getting bizarre behavior with that so this is a very important line at the start of any training then we have a helper method called print' variable so let's take a quick look at that so print variables goes into the session and it fetches two tensors with special names W read and B read that's your slope and your y-intercept then we get the float values out of them and we print them out so this helper method just lets us it's going to let us look into our graph periodically during training and see where our line is at so then up here how are we going to do the training we're going to do it in five batches of five hundred each and basically what I'm going to do is I'm going to randomly create an x value and then I'm going to compute what the Y value should be in order for it to be along a sort of idealized line with a slope of three and a y-intercept of 2 so basically you know if you imagine that all of our data points are going to be along this this hypothetical line that we want to convey to tensorflow hey this is really where you should move the line in order to be aligned with our data this is I mean you'd never find something this idealized in the real world but for this demo it allow us to see exactly how the line moves in the speed and all that sort of stuff there are two other specially named tensors input and target for this which correspond basically to the x value and the y value so we're gonna call the feed operation on the runner that's how we actually push our training data in in this case and then after every 500 examples we're going to take a peek and see where our line has moved to as a part of training when with when we're done with training i call to prof def which takes my graph and puts it back into a binary array and then i just use standard java to write that out into another file so this is how you can take an untrained model train it up and then put it back into a file to put it on other systems or to give it to co-workers or whatever all in java last thing we do remember I talked about the rule of thirds here well this isn't a third we're just gonna do it with one but basically I'm going to pass in the value 1 I'm gonna ask for a prediction using basically fetching this output tensor and if we feed in a 1 and if the model is properly trained it'll be 1 times 3 plus 2 in other words bring it five so let's give this a run see what we get okay so my starting line that I downloaded had a had a w of five and a B value of three it needed to get to a W value of three and a B value of two in order to match what we said up here was the data that we were going to be poking in so it does a good amount of movement with the first 500 data points it gets to you know three and a half and one and a half rounded and then in the subsequent five hundred batches it refines that position until finally after 2500 data points if I pass in a one I get back a five essentially all right so let's see so that's demo – how are we doing on time 11:06 so yeah we're doing pretty well alright so the next concern that you should probably have is not everything as a line and I gave just one example here of something not being a line participation in the labor market you know you start off in school then you're in the labor market then you retire so that's clearly not aligned linear regression is not enough for that reason training to recognize addition versus subtraction so I tried to think what's the next sort of you know level that we could take this – it seemed like passing in data and having it recognize whether something was addition – subtraction might be sort of the next next step in this and it does not work with linear regression basically what we're gonna do is we're gonna pass n values a B and C and have it predict whether a and B were added or subtracted in order to produce C this is sort of like one of the examples I showed at the start I've used the example of health surveys many times if you give you know if somebody's you know they're 22 years old you know they drink once a week they smoke 10 cigarettes a day and you know it turns out they've got diabeetus or whatever feeding in that sort of input data with that sort of an outcome you've got three data points you're feeding in it's not going to work great with linear regression recognizing images how would you put an image you know if you're trying to get it to recognize images of birds for example how are you going to put that on the line it just it doesn't make sense so the key to this and the next thing that we almost always study and machine learning is something called neural networks so what is a neural network and this is this is one of these instances where the concept has been around in academia as something that could be done for decades and it's only fairly recently that we've got the computing power and the data assets to actually be able to do it it's mild against the human nervous system you start off with 86 billion neurons in your brain you end with maybe 6 billion neurons in your brain – you know whatever you've killed off with lifestyle choices and that sort of stuff what changes over time is the ability of neurons to connect to each other so for example the neurons you have in your brain that and in fact this is true of all mammals that are specialists and smell very good at making connections with the neurons in your brain that are responsible for things like memory which is part of the reason why you know everybody sort of associates the sense of scent with one of the things that triggers memories most strongly so what how does this modeled in tensorflow well basically in tensorflow and in machine learning in general this is modeled as every neuron winds up being a data point tensorflow what winds up being a tensor these tensors have sensitivities for the various inputs so let's say that you're looking at a 10-question Health Survey you might have 10 inputs tensors each of which is attuned to each of those questions on the Health Survey and the answers to it this is sort of like you know to put it in physical terms you've got nerves and your fingers and some of them are you know for heat some of them are for pressure or some of them you know sewing and so forth so different nerve endings for different input capabilities then you have biases which are constants that are added to that output so once again to put in terms the human body you know there are some things like sharp pain that sort of take a short circuit to the brain there are other things that take a you know longer passage passage there that would be sort of more like biases this is sort of graphically what that would look like as an artificial neuron that gets in a set of inputs and then it weights them and it has a bias that it uses to feed to what's behind it notice I said what's behind it because nothing is ever this simple in the real world you can have a case where you have a set of neurons for all of your inputs and then immediately behind that is a set of neurons just for your outputs which are the possible outcomes so if we continue on with the Health Survey example let's think of the outputs as like you know perfect health diabetes cancer you know different things obesity different things that can happen is a real result of different health decisions but in actuality it turns out that if you just have those two levels you don't get as much accuracy as you're likely to want so the most common model that you see is a convolutional neural network that has layers of neurons aka tensors in between the inputs and the outputs and what these neurons are responsible for doing is looking at the data in different kinds of aggregations and they evolve their aggregations over the course of different training data also so you know if you think about it I talked about how you have neurons you know for pain and for pressure and all that sort of stuff well one aggregation of that is your sense of touch right and then you've got neurons in your eyes that are specialists in everything but they aggregate as a sense of vision in tensorflow the best example is with image recognition you can create a neural network that just has neurons for each of the pixels and then an output which says this is a mammal or this is a fish that'll work but it won't give you much accuracy what turns out to work more accurately is if you have an input layer that looks at each pixel and then you have some layers behind that that aggregate it and they wind up specializing like if we take the example of the handwritten digits some of those neurons in the middle will look like at the upper left-hand corner some of them will look at the upper right-hand corner some of them will look at the bottom the ones that are looking at the upper left-hand corner maybe those become specialists in recognizing the number 7 because the 7 has a very distinctive sort of horizontal line in the upper left-hand corner the ones that look in the middle bottom maybe those become specialists in spotting the number 1 so it's basically a sort of evolutionary process setting up all of those tensors by hand if you were to do like we did in our first demo where we actually manually poked in the tensors you could do it in Java it would take you thousands of lines of code but you could do it so maybe that's a challenge for you as listeners you can do that your free time but there's a DNN classifier that does all that work for you and it does it very very simply so I don't know if anybody can guess what comes next in this story but it's not in the 20% either so in this case what we do typically is will is we take a pre trained model like the output of demo 2 and we read that in and that's what we actually wind up using to do our prediction now this fits the common real-world team structures and I guess that sort of speaks to a question I was actually just asked one of the sort of speaker conversations uh earlier a couple of days ago which was you know why do I think it is that we've got this 20% eighty percent split when how do they choose the twenty percent really I think the person I think that the thinking was that the Java piece is going to be used more in the prediction side than on the training side because typically you know the training piece winds up getting done by folks who are more like a data analyst data scientist role and the thinking is that you know they might do the actual training then give you the model and ask you the predictions against it so this does fit a common real-world team structure just FYI in case anybody is interested the code to actually do the training tends to not be complicated I've boiled it down for this example as much as is humanly possible I can show it to you in the discussion area afterwards I feel like I just didn't want to be pulling up a bunch of Python code in a Java conference how did I previously train it just in terms of a you know theoretical model you pass in input a input B input C so a and B are the two things that were either going to add or subtract C is the output of that and then I put a label on it so basically I poked in a million examples of subtraction a million examples of addition all of them labeled push that through a neural network and at the end I said oK you've seen all these in examples you tell me now when I pass in a B and C whether that's addition or subtraction this you know sort of the next level evolution of our ability to do trainings so here we're going to look at a new class called saved model bundle so let me sort of compare and contrast this against demo two in demo two we load it in a graph that had a gradient descent optimizer in it and then we trained it within Java now we're taking the next step which is we're actually taking a model that was already trained and we're loading that in and we're going to use it to do a prediction so having said that let's check out demo three oh and I should say all of this code is available in the github repo which is on the slide at the end of the deck so if any of you want to grab this code and take a closer look be my guest so there that's yeah us importing the new once again notice it's in the org tensorflow package so nothing difficult there and we use that saved model bundle and we call load and we load in the point in the file system where we've unloaded the model which has been in this case theoretically given to us by a data scientist coworker then on runner we call the feed operation again but something very important to note in this case you see these names these are sort of like the default names for input values assigned by this DNN classifier class so I'm passing in 75 25 and 50 and I'm gonna ask it to tell us if you see 75 and 25 and 50 do you think that 75 and 25 produced 50 by doing addition or subtraction that's basically what this is going to be this is another special name this is where you get your answers but there's a hint in the name here of probabilities when you get into neural networks remember they can have multiple answers so in this case it's like addition or subtraction I could have made an addition subtraction division multiplication in that Health Survey example I gave it could be a whole set of possible health health outcomes and maybe in some different kinds of aggregates like you know this person got diabetes and cancer or this person you know is completely healthy or you can have different aggregates so what you get out of a neural network and the DNN classifier in particular is an assignment of probabilities for each of those outcomes and we're gonna see with this output that the probabilities wind up basically being certainties but just want to explain why that structure is the way that it is so we wind up getting back our answers we put it in an array of and the reason for those three positions is that position 0 will be the likelihood that it's neither addition or subtraction one is the likelihood that it's addition and two is the likelihood that's subtraction so if we pass in 7525 with an output of 50 I think we can all agree that that's subtraction let's go ahead and run this and hopefully see that this assigns the probabilities that we would hope for okay so here's our output possibility of it being neither addition or subtraction a really really small number you see the e -38 they're minuscule possibility of it being addition another minuscule number less minuscule but still nothing possibility of being subtraction a hundred these are all fractions of one so one is 100 so it's basically said boom yep I know you've trained me that's subtraction let's go ahead and just make this a hundred which is addition go ahead and we'll rerun it and what we want to see is that that 100% moves yep so zero percent very small number point nine nine nine nine eight eight one which you know more often than not is as close as you can get a machine learning model to say that something is a certainty all right so that's demo three let's go ahead and proceed to our final demo and then maybe hopefully we have the time for a few questions so deploying trained models so now let's think about the case where we want to put this in production chances are we're not going to run this we're not going to run our predictions in the same infrastructure that we run it ran our trainings right you can but it's unlikely so before Tim sir flow 1.8 you had to basically delve into Python and you had to use something similar to a library called flask those that was your best option since tensorflow 1.8 there's something called tensorflow serving which is just built into tensorflow and there are a few things in machine learning that could be simpler than this basically show of hands how many folks have worked with docker before all right it looks like about 60% to me so I'll give a short explanation so basically what we're doing here is we're building a very thin down virtual machine and we can build it on the basis of a very simple text file that describes everything that we want to bundle into that virtual machine now although it's not really a virtual machine it's something little bit more complicated than that but let's think of it in those terms so this is the docker file truly these four lines are all that you need to deploy a trained model to make predictions you do from tensorflow serving which means grab the base tensorflow serving model excuse me image off the off docker hub copy the place where I have in my local file system put my trained model which is exports slash in then that long number and put it in my docker image under slash models slash model okay then you expose port a 500 which is for protobuf calls and port 8501 which is for rest calls boom you're done that's that's your docker file to build that docker build st give it a name package up just that docker file and yeah that will grab your model and put it in to run it you do that command and you're up and running with something that can receive protobuf requests on 8500 and it can receive ruffs requests on 8500 and one you want to run the Sun on more than one machine its kubernetes friendly out-of-the-box so spin up in number of images on that on kubernetes put a service in front of it that exposes eighty-five hundred and eighty-five hundred and one now you've got a farm ready to do your predictions so this is this is believe or not this is actual easy part one thing this is the only thing in Python that I just had to show as a part of this this line here saved model CLI dot pi showed er than the directory where you put your model – all that will give you the exact rest signature that you need in order to invoke your model using rest so that is how I got the message that we ran pass in in this final demo which basically uses rest to call our predictor running in docker okay final demo I want to show you in this case the most important thing is to understand I don't have any tensor flow in this application this is just your your pure HTTP stuff I feel bad that I didn't make this asynchronous after having seen the talk on Java 9 10 11 yesterday but I'll do that in the next version of the demo and all this does is it takes this class I've created which has three parameters with names that I got out of the model by running that Python command I saw earlier passes in the same values as before gets back the answer and prints it out I am running tensorflow serving in this window here so let's go ahead and run this oh hang on a second it's very proud of myself but then I realized I ran the ran ran the wrong demo all right so it gives us back a little bit more in long gated JSON message but if you look at probabilities this should look familiar ninety-nine point nine point nine nine nine nine eight eight percent all right so in conclusion what we learned we learned how machine learning works we learned what tensorflow is we learned where to get the required libraries how to obtain trained and invoked tensor flow models all from Java what comes next giving more of the missing 80% hopefully advanced model creation unsupervised learning we didn't even touch on but that's the way you can do learning without labeled data now remember this whole thing is open source so if it really bothers you that the gradient descent optimizer isn't there the DNN classifier it's just C++ if we go and you create G&I calls to wrap that whole thing and contribute it's not a problem I think we have time for a few questions I did want to say you've got my email addresses below we've got my my github instance there thank you very much Thank You Derrick I think we have some time for several questions yes but uh yesterday is my process the Maluku yeah but they do doctor I told them everything they could possibly want to know can you please tell us something about the projects where you used learning I can tell you about projects that I have seen out there the diffused machine learning I'm sort of under NDA and all that sort of stuff so I think one of the example so there are a few examples I gave but I'll sort of organize them into broader categories recommendation engines are huge uses of machine learning we tend to think about it in terms of because you're interested in watching this on Netflix maybe you're interested in watching that or you know this song or that song but it really winds up becoming a source of marketing for a lot of organizations to say we notice the customers of this type tend to be interested in these products so it becomes a source of marketing there's stuff out there that we'll take a look at you know it can automate some financial processes to figure out you know this this sort of customer might exhibit these sorts of behaviors therefore here's like the sort of funding that will sort of best match their needs I've seen that sort of done out in the industry Health Research is huge for it and something I didn't touch on at all but which is another huge you so this is semantic analysis basically taking bodies of text and a Simula that's been tagged properly looking at those those pieces of text and saying alright this looks like angry text this looks like happy text you can do a whole bunch of stuff with automating customer support through that sort of thing and most organizations have ready access to that anyhow because if they're on Twitter if they're on if they're doing any sort of tracking of a call outcomes versus inbound communications they've got okay yeah this customer was angry this customers be so on and so forth so they can do all those sorts of trainings so those are just a few of the things I've seen out in the industry thank you thank you I think of all the talk I want to ask if you think there is a demand in the industry of a Java SDK for the machine learning and is there any push to get the 80% to where they should be or is there is just nobody needs it it's an excellent question um I think it really has more to do with the way that teams are set up right now that you know the training winds up being done by people who we're at the data scientist hat and I this is just my sort of take on it I think that this SDK is looking at from a perspective of lots of organizations have existing Java code bases which are out there and working and in no need of replacement so I think the Java is still thriving and still growing all that sort of stuff but if you think all the context in which you're going to do a prediction that's most likely going to be in the context of an actual customer interaction or some sort of business process on the back end what you really need in that case is the prediction capability and so that's what they've really focused on and this 20% it's not necessarily the Java that there's any sort of job at a Kay around there it's just they didn't thing that was the focus the one thing I didn't get a chance to show as a part of this was doing a protobuf based invocation across the wire didn't seem to me like it was terribly necessary but there's a lot of stuff in the 20% that focuses on how you serialize and deserialize protobuf messages to get predictions from some system that's rang out you know elsewhere

Leave a Reply

Your email address will not be published. Required fields are marked *