# A Practical Guide to Machine Learning with TensorFlow 2.0 & Keras Principal Component Analysis

Transcript from the "Principal Component Analysis" Lesson

[00:00:00]

>> Let's go back to our notebook called MNIST PCA Compression. PCA stands for Principal Component Analysis. And what this technique, it's a mathematical technique. What it's actually doing is, let's say you have multi-dimensional data, whatever that is. So for instance, with our images, 28 by 28 pixels, as I said, each pixel can be some sort of this independent dimension.

[00:00:27] What PCA is doing is just basically looking at all of the provided data, in our case, those 60,000 images. And just saying, okay, for that particular pixel, for this particular degree of freedom, we have huge change. For instance, it's changing from 0, to all the way to 255, right?

[00:00:50] So we're covering all the spectrum. And the numbers are not only 0 and 255, but all in between. So it's kinda looking at how different our particular degree of freedom, in our case, pixel for the whole data set. And what it's doing is basically saying okay, for instance, top pixel, top right pixel, always white.

[00:01:14] So we don't care about this information, and we can probably drop it. And Principal Component Analysis doing exactly that, or at least can do that. It can just see how much useful information contained per degree of freedom, or per pixel, and can give you a suggestion, if you can get rid of this information or not.

[00:01:36] So let me actually [LAUGH] demonstrate what I mean by this. So I will use Principal Component Analysis implementation from scikit-learn. scikit-learn is an amazing Python package, which contains a lot of statistical, and let's call them classical machine learning algorithms. PCA is one of them. So we will be fetching MNIST once again.

[00:02:00] But it's already throughout this part of the scikit-learn data set, for instance. Or it's still running, it's probably trying to download files to my local machine. All right, so we're splitting our test into, yeah, data and target. And that's basically the ratio, how it will control my demo.

[00:02:26] I will say how much information I want to preserve. So for instance, I'd say keep 95% of the information. So that's why percentage set to 0.95. I will get back to you, it's in the next iteration. So and here, I'm just doing Principal Component Analysis. I'm saying to which dimensions can our model be reduced to preserve 95% of the information?

[00:02:58] And it's saying out of 784, that's how many pixels, original pixels we have. It's enough to have only 154 dimensions. We have 90% [LAUGH] accuracy to preserve the information about this whole data set. So if I just plot how our information depends on number of degrees of freedom, right?

[00:03:25] Or dimensions, you'll see that basically, 100% will be at 784, right? But if I want to lose some information, for instance, if I want to preserve only 50% of the useful information. It means that I can shrink, or compress my data set up to only something like 10 useful degrees of freedom.

[00:03:51] It's very abstract, what I'm explaining right now, but this, pretty easily, it can be visualized. So for instance, with a PCA, I'm just doing recalculations. Once again, computing the exact percentage of information, which will be propagated if I use only that degrees of freedom, and doing recalculation, kind of a recreation of the digit itself.

[00:04:18] And now I can plot those digits. So there are a couple of things. So that was the original digits on the left, right? And I said I'm willing to lose 5% of useful information, right? And have only 95% propagated, recreate again. As you can see, we can still recognize those handwritten digits, right?

[00:04:44] 1, 0, but there's some artifacts appeared, and also the background became grey. It's because our PCA model decided that's not useful information at all, and get rid of all the background information for us. It only kept the most interesting one, kinda the whole digits. So the reason why I'm showing to you this is if we change our percentage to, let's say, 5, so only 50% of the useful information can be propagated.

[00:05:22] You will see that we only can have 11 dimensions, or you can think about 11 neurons. They can still have all the information about the digits, handwritten digits with plugging. So with all that change, let's see how it will affect the images. All right, so I'm just recalculating the PCAs.

[00:05:50] Yeah, 51% actually will be propagated. So you can see that image is blurred significantly. And some of them actually not that clearly recognizable anymore. And so for instance, this was supposed to be two, but for me, it looks like almost nine, right? So Principal Component Analysis, the reason why, once again, I showed it to you, is two things.

[00:06:20] First of all, think about all of this data you're doing as the information, and that's your neural network. Just taking this information, available information, and to some degree, compressing it, do something. The ends, we just have those 10 neurons, which will be activated, depending on the label, right?

[00:06:41] But it also can give you an idea what should be the smallest number of neurons to actually get useful information propagated through? So once again, let's play around with really scenarios. For instance, if I only want 1% of the useful information propagated, Through the model. With 1%, that means [LAUGH] only 1 neuron will be active.

[00:07:18] With 1 neuron, let's see. Okay, it's calculating right now, cool. So we definitely can recognize 0, that's perfect. [LAUGH] the rest is somewhat blurred, right? But it's only kind of fun example to demonstrate that you can play around. And there's definitely a connection between information theory, machine learning, deep learning.

[00:07:45] And to better understand how neural networks operates, you might dig into information theory. And Principal Component Analysis is one of the way to actually figure out what's the most important channels of the information? And if you modify them, how it will affect the information at the other side of the model, so to speak.