Transcript from the "Comparing Different Models" Lesson

[00:00:00]

>> We can start with a slightly different model, right, by adding those additional layers. So right now, we have less neurons, right? Remember that in the original one we had 256, we try, it's just only one layer with 16. Now there are 3 layers with 16, but it's all about those interconnections.

[00:00:20] Well, let's first try run it, right. So, exactly the same optimizer. And if we train it, so briefly, I think we're getting to almost 97.5% with our 256 neurons. It was about 92%, as far as I remember, for 16 neurons only. Right now we already at 95% So basically, yeah.

[00:00:59] Now it's up to us as the machine learning experts [LAUGH], yeah. We can try different architectures, right? And by architecture, by typology, I mean just try different number of layers or different number of neurons in them. And it still pretty common scenario where you taking relatively big number of neurons in the first layer, then reducing them, right, and getting to the final solution.

[00:01:30] But the thing is, if you have multiple modules like this, right if you trying different number of layers, different number neurons, what is the proper way to compare the accuracies of those solutions? And that brings us to another question how to splint your data, right? And right now, I'm only demonstrating into splitting into two sub-data sets.

[00:01:58] We have our x_train and x_test. But commonly it is split actually into three parts. And it can be something like 70% for your training data set 20% for test data set, and 10% for validation. So what is this validation? It's basically a data sets in our case images, which our models have never seen before.

[00:02:24] So it's not part of the training data set and not part of test data set. And when you're trying different models with different topologies with different number of layers, you will be relying on this final number, validation accuracy, right? And that's basically gives you an approximate value in our case it's like 95%.

[00:02:46] That was the accuracy of the prediction from this split from those, let's say 20%. And you can just have different models and those numbers in one column. And the way how you choose which model is the best between those, is not by just looking at those numbers, but rather using the validation data set.

[00:03:11] This new 10%, our models have never seen before. And providing them to calculate the accuracy, right? And that basically why you need to do this, not to rely on these numbers alone, that prevents from cost contamination, so to speak. So if your test data set is somehow biased, it might affect the accuracy you will see on the model.

[00:03:37] But if you use some new part of the dataset which our model haven't seen before, it will sink well it increase the trust, or so to speak, improve the accuracy of the accuracy prediction. So it's kind of a new layer of obstruction on top of what we already did, and, all right, let's get back to our history and plots.

[00:04:03] Yeah, you can see that although we are not getting to 98%, it's just because our number of neurons is pretty small, I think I can fix it by just putting 64 neurons here 32, for instance, and 16 at the end, and that might give us better model. So once again, I'm kinda recreating this triangle almost.

[00:04:28] So let's recompile, okay? Just out of curiosity, I'm kinda curious what's the final accuracy I'm gonna get there. Almost 95% from the first epoch. At the highest accuracy we've seen so far was 97.5 about, right? Well, we already at 97%, which is good. Yep. So, you can almost imagine that you just taking all of those degrees of freedom 700 of them right think about each pixel almost like independence degree of freedom.

[00:05:14] Right, because our pixel can have the intensity from 0 to 255, right? And we're simply taking this information and trying to reduce it first to 64 degrees of freedom down to 32. But at the end we get into 10 neurons which are activated. There is a chance you can lose some information.

[00:05:40] For instance, if I simply reduce the number of neurons in our intermediate layers to something pretty small like for instance four neurons, there is a chance that this four neurons now will reduce the inflammation so much that we will actually not be able properly recognized between our digits.

[00:06:04] So let's just out of curiosity compile this one and see, by the way final accuracy was, yeah about 97 and a half. So, but if we have only four neurons, so once again 64, 4, 16, 10. Wow, okay, actually, we still can get 92% am I recompile? Yeah, I recompiled it, okay.

[00:06:36] Well technically four neurons probably contain information about ten different classes. Maybe if we reduce it to one neuron [LAUGH] I'm going for something really extreme something to see the big difference. Yeah, 95% not bad. But what if we put only one neuron, in one neuron basically at the end we get activation or not activation right?

[00:07:00] So there's pretty strong chance that we will not be able to recognize between all ten different digits. And the accuracy is 37%, okay. [LAUGH] 36, interesting. The reason why I'm doing this right now, I just want you to think about what information is in this case, right? So we have this 28 pixels by 28 pixels, and that's all of our input channels.

[00:07:39] And then we as well transforming this information, reducing the degrees of freedom so to speak, right? And interestingly enough even having four neurons still is enough to propagate all the useful information. That actually brings us to the next example. But this one I will not be typing. I will just, run the notebook itself.

[00:08:09] It's called

>> Question about that with the four neuron. So if you think about four neurons, either each of them is either activated or not.

>> It's almost like

>> Essentially 2 to the 4th power. So you can see it's impossible values, which is enough to accommodate the 10 digits

[00:08:25]

>> Exactly that's why having, I'm still surprised that we getting almost what 40% accuracy. He said, we've having only one neuron. But it seems that.

>> Yeah, that makes sense to right? Cuz it's about on or off right, with one neuron and so you have about a 50% chance of guessing.

[00:08:46] [LAUGH]

>> But at the same time, we don't have only zeros and ones we have everything in between. and there is a chance that our neural networks somehow recognize that if the signal is somewhere, let's say between 0.4 and 0.6. It should be number eight, something like that.

[00:09:07] And that's the thing, yeah. To some degree, what neuron network is doing. It's almost like a black box. Well, you can try to reason and say what it might do. But what it's actually doing is just modifying those weights and biases and yeah, it's difficult to say what exactly happened there.

[00:09:28] But with a convolution neuron networks the one we'll be using for image recognition, you can do some of the things which will show you how the neuron networks is working and that's actually gonna be an example or demo in a few minutes.