Transcript from the "Convolutions & Pooling Definitions" Lesson

[00:00:00]

>> In this notebook, MNIST and ConvNet, ConvNet stands for Convolution Neural Network, and that's different type of layer. So for instance in this example I'm using instead of just adding to the model, another way to initialize your model is to provide the list of layers, when you simply creating your model, right|?

[00:00:29] So on the sequential you will initialize it with and also I'm specifying explicitly tf.keras.layers.Conv. Conv2D actually pretty tricky one, what it's doing the way how you initialize it, you need to specify how many filters convolutions you want to have, and the size of the filter and what should be the input shape.

[00:00:57] In our case, we don't need to flatten our images we'll be using 28 by 28. But on top of that, we will be providing several images simultaneously as the input. So it's called mini batching. And probably before I go really, really deep I need to explain what convolution is, and what max pooling is.

[00:01:21] All right, so let's switch to drawing board again and talk about two things. Convolutions and Max Pooling or just Pooling. So those two layers or operations are used in image processing a lot. And Alex net, the topology the model, which was image net competition in 2012 introduced those big time.

[00:01:52] So pretty much all of the layers they had was those convolutions and poolings. So what is convolution? Let's say we have image, our handwritten digits 28 by 28 pixels. And in convolution you can specify a filter, for instance, 3 by 3, right? That's the one I will be using in my notebook, here we go, right?

[00:02:22] So when I'm saying 3 by 3, it's basically telling me what size of the filters I should be using. But I need to explain the operation itself, what convolution is doing. So it just taking first 3 by 3 pixels from your image, right? And the result of the operation of applying convolution in homedics.

[00:02:46] Convolution is just taking the intensity of the first pixel and multiplied by the intensity of our filter. Then taking the second multiplying by the second pixel and adding the results to the multiplications we got from the first pixel. So basically if you think about it for every pixel in the image, we're taking the corresponding pixel from our filter, multiplying them to each other and adding the result together.

[00:03:19] So what that means if you have for instance and this image I'm just putting it here, if you have large number here, kinda 255, And you have pretty large number in your filter. This particular pixel will be significantly amplified, right? And if your filter is equal to 0, It will kill it down.

[00:03:49] So it's still the same kinda weights, right? But what is interesting here when you add this summation, if for instance, you have pretty high intensity pixels here and let's say here. And you have exactly the same high intensity in the same cells of your filter, the operation of convolution will be pretty large.

[00:04:19] So it means that your filters can look for some very specific feature on the original image. So when you did this operation, you are simply switching to the next pair or next 3 by 3 pixels, and you just propagate this window, all lower the original image. And at the ends, for instance, if you had, let's say handwritten digit 4, right?

[00:04:50] And your pixel, your filter looked like this. So it was only pretty much looking for parts of the images where we had vertical lines activated or if we for instance puts like this. So operation of the convolution will have pretty high intensity in those regions, where we had, actually let me write like this.

[00:05:20] It will have pretty high intensity here. Pretty much everywhere where we had vertical lines. And will be not that large in our horizontal lines. It will simply remember that all this multiplications and additions will give you less numbers for horizontal lines just because your filter is not aligned with this one.

[00:05:43] So convolution operation, simply doing a couple of things. So you can think about fully connected neural network. All the pixels from the original image were connected with all the neurons in your next layer. With a convolution, you only connecting first three, well actually nine pixels, right? To your convolution neural neurons.

[00:06:12] So it means that instead of having fully connected neural network, you will now have partially connected neural network. But also this convolution operation, it's got the physical meaning. We're looking for some features on the original image. And second operation pooling, is also pretty interesting. Max pooling is the operation where we're just taking let's say 2 by 2 vision of the image and we have different intensities here, I don't know, some numbers.

[00:06:47] We will just simply grab the largest value. And that's gonna be the output of our operation. So we will get rid of all other pixels, all other three pixels. So if you had, let's say, the original 28 by 28 pixels, by applying a pooling operation you will reduce the dimensionality of your image from 28 to only 14, right?

[00:07:14] Because you're getting rid of all of those additional pixels. So but the most intensive ones will still propagate. So in image processing, it makes a lot of sense and it is very useful.