Transcript from the "Diagramming Convolutions & Pooling" Lesson
>> Now let's actually debug our code. NoneType and float, I think I didn't initialize something properly again. Okay, callbacks, callbacks, callbacks we define as, Hm, all right, that should be working. And then callback. Log, accuracy, all right. There's no mistake. One second, okay, what if we, for now, just ignore callbacks.
[00:00:42] Let me just make sure that it's actually callback, which, no, it's not the callback, it's something else. Shape mismatch, the mismatch shape label, I see, I see, I see, okay, so.
>> I don't get the same error if we add back in the flatten on the dense.
>> You're getting the same layers or different one?
[00:01:04] The same?
>> The same, cuz you deleted the two lines.
>> I did not get the same. Because I got the same error initially when I deleted them, and then when I added them back, I no longer got the error.
>> Let's see. So the error happening after the first epoch training is done.
[00:01:36] Or not, let's see. No, okay. So let's see, what we're doing here. So we did convolutions first. We're doing pooling, and we always need to flatten, exactly. So basically, our convolutional operations, right, Conv2D, that's what 2D is standing for. [LAUGH] We have 2D images provided as the input, right, and that's what we even have in the input shapes.
[00:02:12] And our max pooling 2D, also operating on 2D, well, images. But then, to our dense layer, we need to flatten everything, right, to have these connections. So we can probably get rid of this. But we need to leave flatten so that the operations from the pooling can be then fed into our dense layer.
[00:02:36] And this will work. So it will take about three minutes to train our simple model. And quite often, after pooling the operations, you want to have couple of more convolutions, as well. And the idea, as I said, so in the first convolution layer, just because our filters are pretty small, only 3x3 pixels, it will be learning some really, really simple features, which can be represented in those small 3x3 pixels images.
[00:03:14] So, for instance, it can be diagonal line, right, vertical line. Or, for instance, if you're working with different colors, it can be a specific color. So we're paying attention more to red channel, for instance, right? But then, as soon as you do pooling, it means that your image shrink.
[00:03:31] And if you apply a new convolutional layer, even though, for instance, you will be using another layer of 3x3 filters. But now, if you think about it, your each filter will have access to 6x6 original pixels, does make sense? So, during pooling, we simply shrink the image, right?
[00:03:56] But the information, just because we kept the most intensive pixels, we kinda think that it will be propagated. The image will be smaller, but the most important bits and pieces will be preserved. And then the next convolutional layer will have kinda slightly bigger point of view, although it will be processing still those 3x3 pixels.
[00:04:20] So what we can do, let's see, with convolutional neural network, we actually get slightly better accuracy, even on validation data set. Remember that we were getting to about 97% with fully connected neural network. With this one, it's actually significantly smaller. So, for instance, if we just print model, I need to change the Cell to Code and type model.summary.
[00:04:52] Basically, to preserve the information from those 32 filters, we only needed 320 floating point numbers. And our dense layer with 128, so it was, well, significantly larger, right? So with convolutions, we can save space. And also, due to those convolutions, because we kind of working on pixels sitting right next to each other, we're taking into account the information about kinda spatial information.
[00:05:34] So if we are looking at the object remember, with a fully connected neural network we have to flatten our image. It means that we kind of don't know which pixels were sitting right next to each other in the next rows. And convolutional operations, actually taking this into account.
[00:05:54] Just because this filter, 3x3, will be looking at those 3x3 pixels on the original image, if it makes sense. All right, so let's do the following. I already commented out our fully connected neural network. I can actually put several convolutional layers, I don't need input shape anymore. It will be automatically tracked.
[00:06:19] And for fun, let's actually do it once again. So what I will have, I will have convolution pooling, convolution pooling, convolution pooling, layer, okay. And let's see how quickly we can train our model and how big the model will be. It seems, well, it's 60,000 images, hm.
>> So what does it do when you do the convolutions and the poolings multiple times?
>> So, okay, I think I know how to explain, but I need to go back to my drawing board. So that's kind of the basic operation of the convolutions. But if we take a look at the multiple convolutions pooling, convolution pooling, our original image, 28x28, after convolutions and after pooling, will be reduced to something like 14x14, right?
[00:07:29] And then, for instance, if I apply another 3x3 convolutions, it will be now looking at particular features in this area. But this area on the smaller image will correspond to 6x6 original one. So that's after convolutions and pooling. After next convolution and pooling, now the image will be 7x7.
[00:08:02] And if we look at the 3x3, it almost correspond to half the image. So what's happening, the convolutions, weights which correspond to the filters on the lower level, will learn bigger features. So, for instance, if on the smaller scale, kind of on the first level, we learned where we have diagonal lines, where we have vertical lines.
[00:08:33] For instance, on the middle layer, we can learn something, I'm sorry. Filters can, for instance, look at, I know, almost like circles. Right, or maybe not the full circle. Because probably on the last layer, it will learn something like this. So what I'm trying to say is that, imagine how much information or how many useful things you can see from the image if you just divide it in small parts of 3x3 pixels, maybe not so much.
[00:09:13] But still, something interesting can be found there where you have kind of those lines. On the next convolutional layer, you will just combine where you observed those vertical lines, where you saw horizontal lines, and a combination of those features will be propagated. For handwritten digits, it's kind of difficult to explain.
[00:09:35] It's better to explain, for instance, with face recognition. So let's say we have a image or a picture of some person, his face, right? And what can happen on the lower convolutions, we might find, once again, those horizontal lines, vertical lines. And maybe on the next convolutional layer, we will combine those lines and say, okay, having two horizontal lines and maybe a couple of diagonal lines, most probably we're looking at human eye.
[00:10:09] Okay, so probably somewhere on the image, where convolution will happen, that's gonna be one eye, there's gonna be another eye. And then if we have a couple of vertical lines, most probably that's the nose and so on and so forth. And then on the last layer, it will take into account all of those kind of recognized horizontal, vertical lines and say, okay, since we have those two eyes together right next to the nose, we're definitely looking at the face.
[00:10:36] So, think about it like features, those filters are filters. We're looking for something really simple at the lower levels, right. Remember with the example of style. Think about it like this as the simplest styles, like the way how we draw. And then on the next layers of convolutions, we are looking at combinations of those small details into something bigger.
[00:11:10] And then combinations of those things, for instance, like with face, yes, we recognize eye, eye, and nose. So now we can say okay, we have those, then probably we're looking at the face in this area, right, and that will be recognized by the next layer. And then, maybe if you even want to have multiple faces recognized, right, so maybe there will be another layer where we'll find, okay, there are several faces on the image, and so on and so forth.
[00:11:37] So convolutions is basically doing that, it's kind of looking for those small features. Let's see, have we done our, yeah. Now it takes five minutes. We get into 98.6%. And the thing is, our model is not taking that much of a space in the memory, it's really, really small.
[00:12:01] It takes slightly longer to train. So, basically, there's a trade-off between how much memory you're using, right, and how quickly it take to train. Although it's actually supposed to be faster, I think the way how it's implemented. So TensorFlow 2.0 right now have some bugs, and that's been confirmed by Google engineers.
[00:12:26] For instance, if you're using batch training, that's what I'm doing right here, it's actually using not the optimized code from the computational graph, but rather using eager execution. I think I can, if I disable eager, I can make this run significantly faster. Let's actually try it, tf, no, I should do it after, and I, let's see, I don't remember what's the exact command, but I've used it in attention, yeah.
[00:13:06] Mm-hm, right, so basically, remember, I said that in version 1, it was the behavior by default. So to use it, we need to use compatibility module from v1, disable_eager_execution. And our training previously was five minutes, I'm just kinda curious, seriously, I don't know if it will help or not, but we'll see in the second.
[00:13:33] No, I think it's still gonna be running pretty slow, 25. All right, so with convolutions, up here actually, yeah, you can pull up the. Validation, accuracy, and training, as well, and it will look like this. So with training, the more we show the same images, the better accuracy you will get, right.
[00:13:58] But our test training set might actually get better at the beginning. But then most probably is going to be stabilized and most probably is going to show lower accuracy than the training data set. And you need to be pretty accurate, well, you need to be careful with the validation accuracy.
[00:14:20] Because sometimes, instead of just stabilizing and being normal, it might start going down, and that's the indicator of overfitting. It means that our model is so drastically changing weights to recognize the images from training data set, so it completely starts ignoring something new. It means that's what the overfitting is, right?
[00:14:49] It basically, just because you're showing this training data set over and over to your model, it's almost like fine tuning for those specific images, for those specific features. And yes, if you will see, I think I have the example with text, next I will show it to you.
[00:15:08] When this happen, your validation accuracy will stop dropping. And the best way is to basically find where the validation accuracy was the highest, or the same thing you will see with a loss function. Loss function will be decreasing, it can get to the minimum, and then will start increasing back again.
[00:15:25] It means that usually you need to stop your training at the point where the loss function for the validation data set was at the minimum. Just to avoid this, as I said, overfitting for the models.