Transcript from the "Visualizing Convolutional Operations" Lesson
>> I think next I want to talk a little bit about the explainability of deep learning and convolutional neural networks specifically. So far we played around with fully connected neural networks, right, so we went through this notebook. We talked a little bit about information theory and played around with principal component analysis, looked at the convolutional neural networks.
[00:00:28] So yes, by the way, if you want convolution kernels on pooling probably will explain what convolution is doing better. So, let's actually run this notebook very, very quickly. So let's say we have filter which looks like, This, so actually let's put 1s here and -1s here. Right, so it's gonna positive numbers at the top than just 0s and negative numbers there.
[00:01:05] And in my notebook I just have couple of helper functions which will do the convolutions and will do pooling for me. And I can simply visualize what this particular filter will do with this image, right. So imagine that we just grab some random image. Well, in my case, it was the image of one of our model.
[00:01:31] And. Okay, ignore this for now. So what we will do we'll just take this original image and apply convolution with this filter the one we defined. Can you guess what it's gonna be doing? It actually looking for the vertical lines specifically, right. So remember we had boxes here, but all the horizontal lines simply disappeared and only this bright white color means we have high intensity, so only those vertical lines were highlighted.
[00:02:13] So basically filter defined like this will find all the vertical lines on our image, right. And if we do the pooling, Now you can see previously we had 200 by 200 image now we reduced to 100 by 100 but still, the general idea preserved, right. It's gonna more blur, I guess.
[00:02:43] But we can still find where we had those vertical lines. We can find, for instance, areas the same thing with horizontal lines. Or, for instance, if your filter is defined like this 0s everywhere except one point in the middle. Can you guess what's this will do with our image?
[00:03:09] All right, let's just see. It only will show the areas where we have a lot of color, the same color. So, for instance, those areas they will have the highest, well actually quite the opposite. Those will be ignored and the rest remember that white color is the high intensity, right?
[00:03:30] So basically everywhere where we had white will be highlighted and the black will be black. So it helps us to find basically those two points where we had the black color the most so to speak. So it's gonna really simple demonstration of how kernels actually applied. But once again it's not your job to choose those kernels, those filters.
[00:03:59] They will be automatically created when you do the training of your convolution neural network. So once again, you will be providing the original images and what the labels, right, and then those 3 by 3 filters they will be readjusted. So during the training phase the model will figure out itself what filter should look like to recognize a particular features to get the correct prediction.
[00:04:25] All right, so now the fun parts, and we can experiment with this-
>> Can I interrupt with a question-
>> Yes, please.
>> Real quick? So if we go back to the first convolutional neural network workbook you have there.
>> First? You mean fully connected or convolution?
>> That one.
>> And then go to the layers. Okay, so you have the convolutional 2D-
>> Feeding and max pooling 2D.
>> So the convolutional 2D is identifying a 3 by 3 pattern if you will-
>> 32 of them, mm-hm.
>> 32 of them over the 28 by 28 pixel image or-
>> 32 different-
>> Permutations or whatever of filters for that 3 by 3 pixel.
>> Yes. So basically we will have, As the input we will have this 28 by 28 image, right?
[00:05:30] But then during the convolution we will apply 32 different filters and each of them will be 3 by 3 pixels. And basically some of the filters as I said probably will learn how to distinguish diagonal line. The next one probably vertical line and so on and so forth.
[00:05:49] So or if we will be training on color images-
>> It will learn even to distinguish different colors.
>> So we're essentially looking for the 32 strongest 3 by 3 patterns?
>> Okay, and then feeding that into the pooling 2D, so we're taking each of those 3 by 3, sorry, we're taking the original 3 by 3-
>> Okay, let me stop you here because I think I will probably just Google [LAUGH] for a nice picture of demonstrating convolution. On the first layer we simply recognize those different patterns, right? Like horizontal lines and all of that. But if you combine those particular patterns you can come up with something more complex like a particular facial parts, right.
[00:06:43] So when you're doing convolution this 28 by 28 you're just taking this filter and just going through all of the original image, right. And a convolutional operation, as I said, is just multiplying the values of the pixels and value just adding, so the result will be 9, sum of 9 multiplications.
[00:07:07] But if you filter, you can think about it like if you filter look exactly like these three pixels. Somewhere in the image this will return, so the layer is not gonna be looking like the original picture. It will only highlight the places on the original image where this particular feature was found.
>> [INAUDIBLE] the filters matches strongly?
>> And basically, if you now see that the result of convolution from the first filter showed you that you have one kinda really high intensity in this area but second filter when it went through all the pixels and highlighted this area but that's gonna be different results.
[00:08:02] But we can take the inputs on the next layer and say okay, those areas are close to each other, and the convolution will actually see because they are connected. Yeah, probably just drilling the architecture would be easier to explain. But yeah, highlighting the areas on the higher level it will kind of recognize that we have a particular features recognized, and they are sitting next to each other, so we can progress to more kind of complex recognition if it makes sense.
>> I think so. So I think what you're saying is like the convolutional layer will say okay, right here we have a strong match to filter one-
>> Right next to it we have a strong match to filter two.
>> And then in the pooling, it'll say one and two together that weighted highly, and so we have a feature there.
>> With the pooling so, okay, you recognized a particular feature when you did the convolutional operation, but the main task of pooling is just reduce the dimensionality. But spatially it means that most probably their recognized feature now gonna be really close to each other. So the next convolution layer now can say okay, this particular filter from the previous operation recognized horizontal lines this one recognized vertical lines.
[00:09:38] So having this and that together it looks like something blah, like eye, for instance, right, or a nose. And then once again you kinda okay, we recognize nose here we recognize eye here. We're doing pooling and once again it's gonna shrink the dimensionality of the input data to this next layer.
[00:10:00] It's done to basically reduce the amount of computation you're doing. You can probably just ignore pooling half times of convolutions, but the problem is your filter is pretty small, and we want to keep it that way. But if you're doing pooling, the idea is that you are reducing the size of the image you're processing, right, so you're doing less and less operations.
[00:10:25] But the idea is that the most useful information will be still propagated, right? Because we picking the maximum intensity out of those like four pixels for instance. And it means that your next convolutional layers will have larger and larger view, so to speak, on the original image. So the view of your filter to some degree increases.
[00:10:49] But it's not looking at the original filter it's just looking on already recognized patterns. And that's how you kinda build a complex list. That's how you can interpret what the convolutional neural net is doing. It's looking at something really, really small and really detailed at the beginning but then trying to make sense out of those small details into more larger objects.