Developing a Production Model

Codesmith

Lesson Description

The "Developing a Production Model" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:

Will discusses the process and challenges of getting a model into production. Creating a more robust neural network design with increased layers and a refined evaluation framework helps prepare the model. Once in production, the model must be scaled across GPU, provide an API, and include periodic monitoring and retraining.

Join Now

Preview

Transcript from the "Developing a Production Model" Lesson

[00:00:00]
>> Will Sentance: Let's get it to production. And actually, we don't need to dwell too long on this cuz it's much like what we did with our, well, same extension we did with our fraud model. So step one is what we spent much of our time doing here. Data exploration and developing a prototype model, there it is.

[00:00:16]
We played with a small sample. In practice, we might have 100,000 images labeled, and they might be actually not 3 by 4 images, but 1000 by 1000. I'm gonna squeeze it in here, 1000 x 1000 pixel images. Ooh, that's a lot of data. They might be RGB with each red, green, blue having a value 1 through 256.

[00:00:54]
That's a lot of data. They might, as we just heard, be pictures with,
>> Will Sentance: Funny orientations. Step one, understand the nature of the data and get making it easier to work with. Convert our 1000 x 1000. Maybe we don't need all that data. We saw we got beautiful smiles just from our 3 by 4.

[00:01:25]
Convert this 1000 x 1000 into, I don't know, 100 x 100. Maybe we don't need three colors, maybe we get much of the shapes by grayscale. And maybe we only need a 100 different values from dark or shadow to light. And then certainly, we could run an initial stage where we orient all of these faces to all be, I didn't do that very well, [LAUGH] to all be, that really bad [LAUGH].

[00:02:03]
That's really bad, neither of them went to their exact spot. Well, they got better, but to get them all being in the same, that is horrible, sorry, no, all being, it got worse, all being in their same orientation and size, just as Maria highlighted. That's our pre-processing. We did dimensionality reduction.

[00:02:23]
There are techniques to do that. That is reducing the dimensions of the image to make it easier to work with. We did normalizing or augmenting our data to align it each input. Now what that's gonna require is that when we're using this converter, this model in inference for our population as a whole.

[00:02:44]
For when we want to recognize an image in production from a new picture, it better come in at the same orientation. We better as engineers be saying, hey, excuse me, how have you pre-processed the training data? I wanna make sure that the images that I'm passing to the model in production are of the same dimensions as the training data.

[00:03:09]
Does that sound less intimidating a phrase now than it did before? I think that's hopefully more approachable at this point. There's a huge amount of creativity that goes into developing what this neural network or set of multipliers might look like. We can add layers. Adding layers would be the output of these, which was 2, 2.4, or -2.4, -2.3 and 5.2.

[00:03:40]
Those outputs can actually be inserted into a new set of multipliers. In fact, we could have this set of pixels or 12 go through 12 multipliers, and then 12 different multipliers. And then we could take the output of both of those and put them through two multiplying values.

[00:04:01]
And then we can take the output of those and feed them all the way back in again. That's called a recurrent neural network where the output of one layer gets fed back in as the input of another. That's a feedforward neural network where the values only go forward.

[00:04:18]
A recurrent one, particularly useful for what's called sequential data, where the order of input matters, like text. Very well may want to have the output of multiplying all of these values, sorry, multiplying all our pixels by these multipliers. The output, we may wanna feed it back in in some way.

[00:04:37]
Now obviously, we couldn't feed it back indirectly here, because one number can't be fed into a 12-multiplier grid. But suppose, as an example, we had 12 grids of 12 multipliers. So 12, one, another one, another one, another one, another one. And had all 12 of these pixels go through one, output number, go through the next one, different set of multipliers, output number.

[00:05:07]
Next one, different set of multipliers, output number. We've now got 12 numbers as our output. We could have them all go back through or go through another layer of 12 multipliers. This is adding layers and layers of complexity. That's what's behind convolutional neural networks, CNNs there, that are particularly tailored for image recognition.

[00:05:31]
And also transformers that we're gonna see in a moment are profoundly important for large language models. We could also refine as we go through this process our model design via metrics on how they're performing via hyperparameters. Hyperparameters are the numbers that go into the making of the model.

[00:05:51]
Not the actual model itself it gets used to make predictions on new pictures, but rather to come up with these multipliers, like the learning rate. We took a big old step of 1. Do you remember we went, make this one 1, -1, 1, 1, -1, -1? Big old step in one go.

[00:06:11]
That was a huge step towards finding the right conversion, the right weights that maximize converting our pixels into the associated labels. But the learning rate is how big a step we take each time, cuz we don't wanna overshoot and end up missing the best possible set of multiplies for our sample.

[00:06:36]
Then we go into deploying the model in production. And here is where we have to scale up to handle,
>> Will Sentance: Every edge case in our, in this case, smile recognition model or smile recognition neural network. We might have not 100,000 images, but we might have a million or more images.

[00:07:03]
To do this, we have to distribute the steps we took. And remember those key steps, try up and down, up and down, up and down, up and down, up and down, each weight independently. See whether for each of our sample images, does it lead to it being converted better or worse to the target?

[00:07:27]
If it does, then go up for that, go down for this, go up for our weights. Do that all in one go, and then do it all over again for our entire sample again, and then do it all over again for our entire sample again. Maybe we'll be dealing with trillions of different checks that we have to make, trillions of calculations, quadrillions, beyond even calculation, of calculations have to be made.

[00:07:59]
The chips that are perfectly tailored towards doing that are our GPUs. Our originally graphical processing units, they're chips that are tailored to doing lots of tiny, very basic math. Multiply this by this, this by this, this by this, this by this. Then make a little tweak to this one.

[00:08:21]
Try this by this, all of these the same, this by this, this by this. That's very basic math, but so many times you have to do it [LAUGH]. GPUs are perfectly tailored to that because they were designed for building accurate 3D environments, make it feel real. The key thing for that is, where's the light bouncing off???

[00:08:43]
The physical space, something called ray tracing, where you look at, in modeling a 3D space in a game, whatever, where's all those individual beams of light composed together as a light on? But each of those can be considered an individual photon of light. And if you're modeling a space, you're tracking each of those lights photons in a 3D game, and you're going, hey, it lands here, then it's gonna bounce off here.

[00:09:16]
Do a little bit of small calculations, work out which direction it bounces next. Bounces here, bounces here, bounces here. And you do that for all of those rays of light, and you get a 3D space that looks realistic. Little equations, multiply two numbers together, but trillions of them.

[00:09:34]
How lucky was NVIDIA that the same type of math was needed at scale for training neural networks. Little math calculations, but infinite number of them. They use GPUs and tensor processing units, which are Google's kinda proprietary version. It's even more tailored, particularly for working with matrix or similar data structures.

[00:10:01]
Get it production-ready. As before, get this set of multipliers. A trained neural network will be a set of values, 12 in this case. There's your trained neural network, maybe stored as a transferable Python object, a pickle out of Python. And then have the API deployed that's going to take in the pixels that you are ensuring fit the same shape as the training data.

[00:10:30]
So the model's expecting them and return out the prediction, smart or not. And crucially, monitoring and periodically retraining this neural network, because that same idea of the drift that we had for our fraud detection model as our sample and our population fall out of sync. And there are techniques like SHAP, S-H-A-P, Shapley, who is a famous mathematician.

[00:11:03]
Additive explanations that are designed to try to, even though here, even with these 12 weights, it can be a bit of a pain in the ass to try and work out what they're doing. I guess you can see the shape, right? The happy shape has a higher positive numbers, sad shape has not.

[00:11:22]
But for anything more complex than this, not a chance. And to be honest, even here, why do the eyes have like a negative weight? It's hard to know. And it works, imagine any degree of complexity. Shap or S-H-A-P is an effort to improve interpretability of neural networks. If you wanna use a neural network, for example, deciding whether to give someone an insurance claim or not, or claim it's fraud.

[00:11:49]
You will need to be able to track at least some sense what are the inputs, what are the features that mattered in the decision? And it's trying to, even in a varsity complex set of weights, cuz one layer you could work it out. Ten, you can't. There are techniques, and it's a massive part of machine learning research right now, to improve opening up the black box of how the prediction is made.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now