Hard Parts of AI: Neural Networks

Extending the Principles of Machine Learning

Will Sentance
Codesmith
Hard Parts of AI: Neural Networks

Lesson Description

The "Extending the Principles of Machine Learning" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:

Will summarizes the key components of building a neural network and extends those principles into further scaling these models beyond what's imaginable. However, at their core, the success of any model depends on its ability to generalize and capture patterns from a sample population and assume that population is representative of the larger world

Preview
Close

Transcript from the "Extending the Principles of Machine Learning" Lesson

[00:00:00]
>> Will Sentance: You've seen the core principles of machine learning and neural networks. We've had probability and at the core making predictions about our larger population of, in this case, three by four images in the previous case, refund requests, making prediction using a model that converted our background data. In this case, pixels into the associated data.

[00:00:34]
The label smile yes, smile no, smile no, smile yes, the tool we use towards the neural network. It's generalizing that pattern we found in the sample that these 12 pixels 100%. These 12 pixels 0% these 12 0% these 12 100% to the population these 12 pixels who knows?

[00:00:59]
We'll get 100 percentage fromit. That presumably would be, helpfully, yes, it would be quite a -1, wouldn't it? If anyone's tried this one, it will be quite a -1. It will come out at probably, I don't know, three or 4%, hopefully. We can apply, someone try, we can apply the model, the converter, to our broader population.

[00:01:20]
And as long as the sample is reflective of the population as a whole, which if it's a decent sized sample, we can feel confident it will be. Then we're able to make decent predictions converting just like we did here, where we wanted to get as many of these converted to the actual label as possible.

[00:01:40]
So, 2 we're able to convert this new 12 pixels values and get a good decision, 1 or -1, small or not. How do we do it? Via the neural network converter, the neural network model that gives layers of multipliers, layers of multipliers. Weights known as the parameters of the model that capture the association between the input values of features of pixels and their label small or not.

[00:02:12]
The class, the label small or not, we then used the training approach of gradient descent to tweak from the 0s initially, which gave us a terrible conversion. Steadily closer and closer, until we were converting our whole sample with an accuracy of 93% and an error of only 37% using gradient descent.

[00:02:34]
Then we refine our model, we could add, we could improve our learning rate, jump faster, but it's a risk. We added an activation function, in this case sigmoid, that not only made things easier to interpret, 100% it's a smile, 0% it's not, and somewhere in between, more or less confident.

[00:02:54]
But also ensured we actually found a set of multipliers that did a decent conversion for our sample. And then finally, integrating all of that into a model pipeline. That's the fancy word for the process by which we take all of that training data, develop our set of multipliers, our model, then, make it available to be used for inferring.

[00:03:20]
Making a bet, making a prediction about a new image. And then, by the way, if we look at that, and over time, realize it's not doing a very good job refining and maybe retraining our model, designing the API to do so. And isn't that a gorgeous little pet dog in the corner there, designed by our wonderful illustrator Kieran.

[00:03:41]
So extending these principles to an unfathomable scale, which is where we move into large language models, all AI and ML models you develop and deploy build on these principles, much of the power, though, comes from the vast scale of training data. We had a sample of four, but we were able to follow every single one of these pixels.

[00:04:02]
In practice, it's beyond comprehension, but it's just an extension of just like programming is just save data, do stuff to it, but my God, there's a lot of data you can save and a lot of stuff you can do with it. Same here, it's only ever take sample, identify pattern, to convert within the sample and apply to population where in the population you don't have all the data, so you don't know if there's a smile or not.

[00:04:30]
If you worked out how to convert it in the sample effectively where you did know, you can use it in the population. But throughout, remember, it all depends on our ability to generalize from a sample to the population by building a model that converts or captures the patterns.

[00:04:44]
These patterns of pixels associated with smile, these with not these with not these with smile in the sample. And assuming that reflects the population as a whole, we can make predictions about our population. It just happens that population in large language models, as we'll see here, is something akin to all possible use of human language, not just three by four images with 0s and 1s, and that has some pretty profound consequences.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Start a 7-Day Free Trial