Transcript from the "Neurons & Neuron Layers" Lesson
>> How we transition from machine learning to deep learning is this next step. So, let's talk about deep learning next, what is deep learning and where this deep or deep is coming from? I'll have to tell you a little bit about the neural networks and neurons first. So let me introduce those concepts and then we will return back to deep learning.
[00:00:29] So in 60s, 70s, well, and actually in 50s as well, people come up with idea that we can probably try to reproduce what's happening in our brains. What's happening with the neurons. And the way how neurons operates, if they have a lot of input signals, right? Some data coming into the neuron.
[00:00:53] Neuron somehow can figure out what is important and what is not. It can be activated or it can just simply ignore the input signals, right? And you have the whole chains of those neurons connected to each other, right? And at the ends, we're somehow processing all of that input data which is coming to us in form of visual signal, for instance, right, or sound.
[00:01:19] And all of those, something happening with those chains activations or de-activations. And we're doing something with this information. And people figured out, we can probably try to replicate this process in the machine, or through the algorithm, right? And let's say our algorithm, our model, and I'm gonna call it neuron.
[00:01:46] First of all, what it's doing, what's kinda the input information? Well, it can be just some channels of data. So let's say, we can call it X1, X2, X3 and X4. In computers, this data will be in form of provided numbers, right? It can be integers or floats.
[00:02:08] Let's say just our floating point numbers. And, for example I will assign 0.1 to X1 and then -4.7 to X2, 0 to X3. I don't know, 500 to X4, something like that. So all of that is our input data, right? Then we need to do something in the neuron to basically aggregate all of the data and maybe ignore some of the channels, maybe amplify some of the channels.
[00:02:40] How can we do it mathematically? Well, we can assign weights to all our channels. So W1, W2, W3 and W4 is gonna be our weights, which we simply associate with the corresponding input channel. And for instance, if we want to completely ignore the signal coming from one channel, what we can do, we can simply multiply X1 by W1, but W1 can be equal to 0.
[00:03:13] So if our weight's equal to 0, it will nullify, it will completely dismiss any data sitting in x1, because anything multiplied by 0 is equal to 0. That's one of the ways to simply ignore any input channels. If, for instance, W2, other weight, is pretty big, let's say, 500, multiplied by X2, it will simply amplify the signal coming in our X2 channel, right?
[00:03:50] And the same thing, for instance, if I want to deamplify the signal, for instance, in our X4, I have 500. I can multiply it by weight, which is really, really small, like 0.001. And in this case, it will give me just only half. So that's the way how we can simply modify the signal amplitude, right?
[00:04:17] But then, all that information coming from all different sources, and I need somehow to shrink it to one information piece, how can I do this? Well, I can simply just add all of those numbers together. I forgot X3 multiplied by W3. Right, so in this case I simply saying which channels I wants to ignore, which channels I want to amplify.
[00:04:49] And adding the signal together, right? And that's becoming one signal, which I can then process. So mathematically, all of that, all of what I just draw, can be easily written as summation symbol Xi multiply by Wi. That's our input signals multiplied by all of those weights, right? And right now we're not talking about how weights should be selected.
[00:05:18] I'm just saying they are there. They describe what our neuron is doing. And also, just for reasons which will be clearer later, we're adding some additional constant, bias to all of this distribution. It will allow us, for instance, if we have constant signal going, we can then normalize it or shift it.
[00:05:48] So basically, I'm adding bias to our neuron. And all of that is one data. So now in the neuron, we have just this one data channel. Which is represented by this simple formula. And I hope it's not too mathy, right? Those concepts are pretty basic. We simply just somehow modified the input signals, right?
[00:06:16] And now we want to do something with it. So what can we do? In the neurons in our brain, simply some neurons are activated, and in this case, they can just send the signal to the next neuron. Or sometimes the signal is just simply ignored, or a neuron is do nothing.
[00:06:39] So mathematically speaking, we can say that if the neuron is activated, it can just output 1. And if it's not activated, it can just output 0. That's it, right? How can we do this in our data, we can have pretty much with all this summation, we can have any numbers from minus infinity to plus infinity, right?
[00:07:05] And we can probably just do something with those by modifying weights and biases. But it's still pretty large distribution. So to kind of do this transformation from minus infinity to infinity into a space of 0s and 1, we can use sigmoid function, sigmoid. It's easier to explain what sigmoid is doing by just drawing it.
[00:07:34] So let's say our axis, or, no, we used axis before, so let's just call it data, or input, input is better. So in our input axis, we have everything from minus infinity to infinity. And on our output, we want to have something between 0 and 1, right? Or sometimes minus -1 and 1, depending.
[00:08:08] These rules, like don't have zero once, it's what we define, right? And basically, sigmoid is the function which is equal to pretty much 0 or something really, really small at infinity and crosses the 0 at 0.5. And then once, again, can go to pretty close to 1 at the infinity.
[00:08:35] So that's what sigmoid function is. And no matter what you provided as the input, it will shrink the distribution of the outputs from 0 to 1.