Applying Gradient Descent

Codesmith

Lesson Description

The "Applying Gradient Descent" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:

Will applies the Sigmoid function to the results and introduces gradient descent, making smaller and smaller adjustments to the weights to improve conversion accuracy. While there is no perfect target result, a Sigmoid of 2 (or 90%) is a typical goal.

Join Now

Preview

Transcript from the "Applying Gradient Descent" Lesson

[00:00:00]
>> Will Sentance: Let's do so, and just note people you might be well, hey, what do we look for in the population when we run this new converter, followed by the sigmoid function? We look for a percentage that hopefully is close to 100% and it's a smile and close to 0% and it's not.

[00:00:16]
Okay, so, expanding our sample and applying gradient descent, small adjustments to weights improve the conversion accuracy, but actually we didn't manage to find a perfect matching. So instead, we're running the output of those weights, which was respectively 1, -1, -1, and 3. Through our sigmoid function, through our sigmoid function, through our sigmoid function, through our sigmoid function, and
>> Will Sentance: Getting probably not immediately a great outcome, at least not for all of them.

[00:00:59]
But we're on a better path, because now there's gonna be a value here and a value here that will get us close to 100%, there was no 3 was never gonna be 1 and 1 was never gonna be 1. So what do we think sigma 1 was roughly, anyone remember, Jamie, do you remember what it roughly was sigma 1?

[00:01:18]
>> Jamie: 73.
>> Will Sentance: 73%, it climbs fast, doesn't it, right, which is quite good, is that 100%? Nah, too far away, but we'll get there, we'll get closer with our next round of tweaks -1, anyone remember this 1, it was the reverse?
>> Jamie: 27.
>> Will Sentance: 27%, spot on, it's not close enough, too far away, -1 again 27%, too far away, this was quite good, what's the output of 3 Joe, do you roughly remember what the output of 3 is?

[00:01:47]
>> Jamie: Is it 95?
>> Will Sentance: Yea, spot on 1 under Joe exactly, so that 1 actually is the only one that's roughly right, let's just say these ones are also still not close enough, that's the only one that's actually roughly right. So we've actually got accuracy already of roughly 1 out of 4.

[00:02:06]
After one iteration of weight tweaks, we get this output known as one iteration of training using gradient descent. Gradient descent is as we say, the ability to tweak weights independently due to partial differentiation through calculus. And then do it all at once and know that that was the right direction to go towards more accurate conversion for this model, for this sample.

[00:02:31]
After one iteration, even with our converter it's all right, after 1 before we use sigmoid it was really good, it was 3 out of 4, but we were never gonna get to 4 out of 4. So we switch to applying sigmoid, which took us back down to 1 out of 4, but I promise you there's a route now to get through this number going up.

[00:02:52]
This one going up, and these two going down, there's a route to get to 4 out of 4. Joe, go ahead.
>> Joe: I was just curious, why did we choose 2, is there a specific reason?
>> Will Sentance: Why did I say 2 would be happy with? Just because we haven't got 2 yet, we haven't got those, that's just I'm saying if this reached less than -2, and this reach less than -2, then the output will be somewhere below

[00:03:24]
>> Joe: 10%.
>> Will Sentance: 10%, and at that point-
>> Joe: And that's a [INAUDIBLE] sigma.
>> Will Sentance: Yeah, exactly, it's does at that point, that's roughly good enough within industry, you are roughly happy getting a 90% conversion, you might wanna go higher. If you know it's not a smile, then your output of your converter to be, I don't know, 8%, doesn't need to be 0 but 8%.

[00:03:54]
In fact of course in practice, you may wanna have more like a -5 in there, which would give you an output of do you remember Joe? -5 would be an output of about-
>> Joe: 1%
>> Will Sentance: About 1%, right, that would be even better probably, although maybe it gets too perfect for the sample, and might be too suited to the sample but not the overall population.

[00:04:14]
I'd be over fitted, but yeah, I just know that by the time the sum weight of these pixels with these multipliers comes to below -2, right now it comes to, what was it? Right now it comes to -1, right, yeah, -1, right now comes -1, by the time these weights have been tweaked enough for it to come to -2, then we know the output of running it through sigmoid.

[00:04:39]
Will be less than 10% and I will be roughly happy with the conversion through that set of weights of this element of the sample? After 100, I get to look at my numbers, these weights, these multipliers give me an output of not 1, -1, -1 and 3 but 2,- 2.4, -2.3 and 5.2.

[00:05:04]
And that gives me sigmoid adjusted really damn good, we're gonna use them in a second. But Joe, I could keep iterating and steadily, steadily, steadily, I'll be getting closer and closer to 99.9% sorry, 99.9%, 0.01%, that's the trade-off. I could keep tweaking, keep checking, is it converting the sample a bit better?

[00:05:32]
And for a sample of 4 we could probably try and get there more quickly, right? We could work out, just keep increasing these three, keep decreasing these three until they're really big numbers, and we would end up with a really good conversion. But for more complex examples, there's gonna be some cases where it's tweaking this one keeps going up, and it's always gonna make it better.

[00:05:52]
But this one if you tweak it down too far, it starts to make it all worse. And so for millions of these weights which is maybe what we'd have sometimes, you don't know which way to go for all of them, does it continue that direction? No, you need to each time check all of them, okay, good, that was the right step then, then again.

[00:06:12]
And you've got to run your whole sample through all those weights and make sure that it's better than it was the last time, and then do it again, and then do it again, and then do it again. That is very compute costly, very compute costly, meaning part of the job of machine learning engineer and in this space is to go, do I want to keep the iterations of this training going?

[00:06:37]
I might get closer but it's costing me a lot of compute time and money. But let's see these ones here, after 100 iterations our weights get close enough, the average error now is only 7%, that's pretty good. We've got 1% off on one, we're 9% off on that one, we're 8% off on the second image, we want 0%, 0% for our two respective.

[00:06:59]
If not smiles you want 100% for our two smiles 89% and 99% that's pretty good.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now