Lesson Description

The "Sigmoid Function" Lesson is part of the full, Hard Parts of AI: Neural Networks course featured in this preview video. Here's what you'd learn in this lesson:

Will introduces the sigmoid function, which "squashes" values into percentages that better reflect the overall confidence of the system's prediction. Rather than evaluating a target value for each prediction from the model, percentages generated from the sigmoid function better categorize samples with a wider distribution.

Preview
Close

Transcript from the "Sigmoid Function" Lesson

[00:00:00]
>> Will Sentance: So let's actually come to that problem. After one round of change to the multipliers, our accuracy got better. There's our weights and multipliers after one round of improvement, our accuracy improved. We've got Smile 1, Smile -1 is the output, -1, and up 3, uh-oh. But we're in the right direction.

[00:00:18]
So keep tweaking until we have a fit. There's actually no set of multipliers or weights that get perfect 1 and -1 output. Even for our tiny sample. That fourth image, it's converted output is growing way too fast. Even just at the values of 1s here, we all saw this added up to 3, Maria showed us, but this one is still at 1.

[00:00:44]
So this was perfect. So we don't wanna tweak the other weights too much more to try and get this one down to 1, because if we do so, it's gonna start affecting all the other ones.
>> Will Sentance: So we could do exactly what Ben said before the break, we could look for how positive and, that's my little +ve, show for positive, and negative the converted output is.

[00:01:09]
More positive, maybe we're more confident it's a smile. More negative, maybe we're more confident it's not a smile. But how positive and negative should be confident?
>> Will Sentance: By the way, let's remind ourselves to be able to take a new image of 12 pixels and generate out, using these multipliers, a -1, a 1.

[00:01:36]
A -1 means that this is not a smile, a 1 means that this is a smile. But what does a 3 mean? Does it mean we're really confident to smile, a bit confident?
>> Speaker 2: Super smiley.
>> Will Sentance: It is super smiley, to be fair. It is super smiley. Introducing sigmoid function that is going to enable us to come up with a set of multipliers that actually work for this sample.

[00:02:03]
Just to stress one more time, there is no set of multipliers. You might be like, well, let's bump these ones down a bit, let's try, there's no way to separate the data, that is, these four in our sample into 1s and -1s of that output. There's no set of multipliers that will do that.

[00:02:25]
We need to add non-linearity, we need to add a sigmoid function that enables us to have a set of multipliers, set of weights, that work for our sample in its completeness, not just for three out of four of them. Now we look really fancy with our math with the sigmoid function.

[00:02:48]
The sigmoid function, a function takes in an input and converts it into some output. So, a multiply by 2 function takes in the number 7. And, Paul, what's the output of the multiply by 2 function if the number input is 7?
>> Paul: 14?
>> Will Sentance: 14, spot on. Sigmoid function takes in a number, and for all positive numbers, it outputs something between 0.5 and 1.

[00:03:17]
Which is to say, between 50% and 100%. 100% is the number 1, and 0% is the number 0. So the sigmoid function takes in all possible positive numbers up to infinity and returns out something between half and 1. That is to say, something between 50% and 100%. There it is.

[00:03:39]
At 5 already, it's really, really, climb fast, at 5, it's already at 99%. It's outputs already 99%. At a billion input, sigmoid functions output is 99.999999%, doesn't reach 100% ever. Well, with infinity it might. At 0, it's a 50%. So all positive numbers, the output, is greater than 50% more certain.

[00:04:08]
All negative numbers, its output is less than 50%, less certain. By -5, it's reached 1%. By negative a billion, it's reached no point, or 00 point, 0000001%. But still never reaches 0%. Yeah, Ben, go ahead.
>> Ben: Just to clarify, you said then, at a 0 value, our confidence is 50%?

[00:04:33]
>> Will Sentance: Yeah, in this case, exactly. Sigmoid function allows us to find multipliers that get us a decent conversion by moving us off the dependence on, it better be 1 cuz the output is 1. Getting closer to Ben's idea, let's have more positive be just more confident. Yeah, but how positive?

[00:05:00]
3 or 100? Sigmoid says, don't worry, if your number is greater than 5, it'll be about 99%. If your number is 3, it'll be about 95%. If your number is greater than 2, it'll be about 90%. If your number is less than -2, it'll convert to, well, let's draw it up a bit.

[00:05:22]
Let's make sure we have, I guess we can probably squeeze it in here. This is a bit naughty to get in the way of our converter, but let's just draw it in here a bit. Yep, there we go. There we go, beautiful. So, there it is. So it's input, let's try a few.

[00:05:54]
So this is a symbol you use for it, and you input, let's say, 1. Sigma 1, its output, anyone know? [INAUDIBLE]
>> Will Sentance: It's very quickly climbing to 73%. Sigma 0, its output is, do you remember, Ben?
>> Ben: 50%.
>> Will Sentance: 50%, at 1, it's already up to 73%. At -1, Jamie, fan of symmetry, what do you think n-1s output will be?

[00:06:30]
>> Jamie: Minus the delta times.
>> Will Sentance: Yeah, exactly, which was, so, 73% was the 1, so what's 100% minus 73%? 27%, people, by 2 it's already at roughly 90%. By 3 it's at 95%. By 5 it's at 99%. And same in this direction, by -2 it's at, what is that gonna be, people?

[00:07:00]
>> Speaker 7: 10%.
>> Will Sentance: 10%, what on, by -3 it's at?
>> Speaker 7: 5%.
>> Will Sentance: 5%, sweet. By -5 it's at?
>> Speaker 7: 1.
>> Will Sentance: 1%, exactly. Okay, it's kinda clever.
>> Will Sentance: What does it say, in the sample, switch the labels from 1 and -1 to, instead 100% for smile and 0 for not, 0 for not, and 100% for smile, then run the sigmoid function on our output number To get percentages.

[00:07:35]
Now the conversion doesn't need to be exact. Now, if we had a 2 on this one, or anything greater than a 2 on this one, anything greater than 2 on this one, the total is gonna be about, with sigmoid run on it. Anything greater than a 2 is gonna be what percentage?

[00:07:54]
>> Speaker 7: 90% or more.
>> Will Sentance: 90% or more. And now this one here is gonna be exact to -1 if we have our target instead be a 100%. So we relabel our sample, Smile 100% here, Smile 0 here, Smile 0, I've only got three of my sample here. I need to draw the other one, don't I?

[00:08:20]
Sorry.
>> Will Sentance: Here's the rest of our sample. There it is. There's the rest of our sample.
>> Will Sentance: Smile, that's 100%, right? Plus this one here, 100%. So we change our label and we target instead, 0 and 100%, or 0 and 1.
>> Will Sentance: I got 0%, and this was 100%.

[00:08:55]
Now our target is,
>> Will Sentance: Now our target is not 1 and -1, because whatever we set our target for, we're just gonna make sure we come up with multipliers that convert to it. But by switching to 100% and 0%, by switching to running our output out of our conversion through a sigmoid function.

[00:09:25]
We're able to get working multipliers because we don't need exactly 100%, we only need 90% or so. We don't need exactly 0%, we just need 7, 8 %, 9% or so. We don't need 0%, we only need 7%, 8% in here. Well, let's say, what therefore would be our target converted output to get roughly good enough conversion whereas before we said it had to be -1 to -1, -1 to -1, 1 to 1, or 3 to 1?

[00:10:05]
That's pretty far apart. But if we are converting these numbers through a sigmoid function, they're getting squashed to be between 99% and 1%. And almost all of them, beyond a little bit of here, end up being quite close to the yes or no, the binary of 100% or 0%.

[00:10:25]
So knowing this, roughly, and it's a bit of a, kind of up to us, but typically about 90% is good, knowing that, which number do we want, really, this input to sigmoid that's gonna be converted through. In a moment, what would we like our initial output that's gonna be run through sigmoid, our initial total output to be, Jamie?

[00:10:50]
What do we think, if we wanna make sure that the output of running it through sigmoid is close to 100%?
>> Speaker 7: 95.
>> Will Sentance: So 95 would be good, so what would we wanna have?
>> Ben: 3.
>> Will Sentance: 3, I'm gonna be even a bit more conservative and just say a 2 is probably good enough, I reckon.

[00:11:04]
And then here where we want our output to be, of running through sigmoid in a second, something like 10%, what do we want our sum of the multipliers to be, roughly?
>> Speaker 7: -2.
>> Will Sentance: -2, exactly. And then the same thing here, right? Roughly, -2, and then the input of this one, again, or less than -2.

[00:11:27]
It's less than -2, greater than 2. I mean, once it's beyond that it doesn't really matter. Here, again, maybe just greater than 2. We just want our multiplier, but we don't care if it's greater than 2, or if it's 10, or if it's 100. It's all gonna come out roughly, 100%.

[00:11:45]
Whereas, when we have before the output here being 100 and the output here being, I don't know, 1, where our goal was 1, that's not very good tool for prediction. We only need to match in our sample to the label we gave our sample. So that we can then go into the population and apply the same converter and see if we get 90% out and know that 100% is smile, we're like, okay.

[00:12:11]
But if we got out an output of 3 or 10 for one of the images we generate and, I don't know, 1 or 2 for another, they seem pretty far apart. Sigmoid function squashes all big numbers and it allows therefore, us to have much more extreme multipliers. Imagine here if we tried to get this one, I don't know, down in some way, it would have pulled this one down too far.

[00:12:38]
We'd have to have had the weights, there was no way of getting weights, this one was growing too fast. As soon as we had 1s here, we were always gonna have, 1 + 1 + 1, adding up to 3. And you could say, well, there's not really any other places it could be.

[00:12:53]
We could have some weights on the eyes, but the eyes are the same on every one of them, happy or sad. So that's not gonna work. So there was no set of weights that successfully were gonna convert 1 to 1, -1 to -1, -1 to- 1, and to 1.

[00:13:08]
It was gonna be too big a number. So instead, don't panic, convert it into and through a sigmoid function that converts our, I don't know, let's say we get to, I don't know, let's say we have the output of this one be 2 or a 3, and this one be a 5.

[00:13:26]
We would have an output through sigmoid at 95% and 99%. That feels all right. Those are both roughly 100% smile found, meaning our model as our converter is successfully converting to roughly 100% and roughly 100%. It maybe 95 and 99, but that's a heck of a lot better than 1 and, where we're trying to target 1, cuz I don't know what 3 is.

[00:13:51]
And so, this is absolutely standard practice. Especially for classification, that means saying, does it belong to smile or not? Where you have a binary, a yes or no, to use the sigmoid function to squash your output values. That if you wanna get a fit, it's impossible to get them all in the right spot.

[00:14:11]
Because some of them get really big and some of them, to squash them all to make sure that, in this case, our smile and our more smile. Because it doesn't have any of the stuff that's characteristic of not smile, which is these bits here, right? Are still both roughly the same output value.

[00:14:30]
Which we'll see will be, well, we'll see in a second. Let's see what it gets to.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Start a 7-Day Free Trial