Transcript from the "Understanding the Model" Lesson
>> Right now, if I just run this codes and rerun the summary, now you can see that's for instance after flatting our two dimensional image into one dimension. So basically it became kind of one layer, right? We then connectied to 256 neurons, each pixel intensity of each pixel will be provided to all of our 256 neurons.
[00:00:26] So each neuron basically will see the whole image, and then some of the neurons will be activated and the last layer will contain only ten neurons but they will be connected to all of our 256 neurons in the previous layer. So that's what's fully connected.
>> Did our simple mention that?
>> Yes, although I'm not gonna be drawing all 256 of them. But sure, I can draw something simple. So let's switch to diagrams, and So as I said, we have three layers, flatten, then dense with 256, and then dense with ten. So basically you can imagine that two dimensional image Sorry, 28 by 28 pixels, right?
[00:01:26] That's our input data provided to the neurons. But our neurons in, actually, we can use even different colors. So D256, that's our second layer, right? So imagine we have kinda 200 neurons like this. So what flattened layer is doing is just simply taking first 28 pixels, and that's gonna be our first 28, Neurons, or actually signals provided to the neurons.
[00:02:08] But you can think about it as the intermediate step, right? And then the next 28 pixels will be flattened or put right next to it. So basically that's your first 28, next 28, and like this another 28 times. So the last layer will go here. And what's happening next just because I said want to have dense layer, it means that all my signals so the information from the first pixel, right, its intensity will be provided as the channel one for all the neurons.
[00:02:50] And yeah, I can basically a lot of lines like this, right? And second pixel will be also connected to all the neurons. And all the Pixels will be connected [LAUGH]. So that's the dense layer, right and then for the lowest layer where we have one, two, three, four, ten pixels, or sorry, ten neurons will have exactly the same thing.
[00:03:18] So first neuron from the previous layer will be connected to all ten neurons, second will be connected to all ten neurons, and so on and so forth. Basically, it means that in our connections, we'll have to have ten multiplied by 256 weights, right? So 2,560 Ws, and then there also should be biases, right?
[00:03:52] So plus 10 biases and that's how many numbers will be used here to preserve the state. So for instance, whenever pixel is activated here, right? So with all of those fully connected layers, we have all these signals about each pixel available to all the neurons in the first layer, right?
[00:04:24] So that's our dense layer 1, and this will be our dense layer 2. So it basically means that, yes, we have to keep track of all of those weights and biases. And for instance, how the training will look like for this scenario, let's say we simply provide its image of 0, handwritten 0 right?
[00:04:56] And we say in okay, you should look at 0, it means that this neuron should give us the highest probability, right? Because it correspond to, 0, 3 so that's going to be neuron, which should be activated when it's looking at 0. So it will simply, if the prediction was done correctly, right?
[00:05:17] So for instance, the intensity of this particular pixel, or let me draw with something else. We took this pixel, right, which had some intensity, just because it was the ink in this pixel and let's say this was provided for all the neurons, right? But remember that we're activating our weights randomly, right?
[00:05:42] So, when we did multiplication of intensity of this particular pixel by corresponding weights, some neurons will not notice it because weights was simply equal to 0, right? And some will actually get really strong signal if the weight was pretty high and they will propagate a base to our next layer, right?
[00:06:06] And for instance, there is a chance that prediction will be done correctly, that we're looking at 0, and that will only kinda Increase those weights even more. If we miss predicted, for instance, although we were looking at 0, and some of the pathways were activated but the prediction was made that we are looking looking at 2, jist because of the randomness of the weights, we'll try to actually get rid of those strong connections by minimizing the weights and reinforcing the weights which should have gave us 0 instead.
[00:06:42] So that's what backpropagation is doing, right? And basically if you do it's if you train intense in our model, if we look at for instance, what can we do? x_train, and let's look at shape, you can see that we actually have 60,000 images. So you need to have pretty a large dataset to train your model.
[00:07:13] And yeah, this 28 by 28 does the resolution of our original handwritten digits. At the beginning, you're starting with the random initialized model, right? But eventually, similar to how we started with this line right, and iteratively got to the solution, we kind of doing the same here. We starting with all of those random weights and biases, but telling our model what answer we should get, we simply modifying those weights and biases to get to the right solution.
[00:07:48] But we need to do a lot of iterations like that, and our data set should be large enough with all different variations. So you can imagine, we have what, ten different classes and 60,000, yeah 60,000 images, it means that we have 6,000 images per each digit, which is pretty huge.