Transcript from the "Training a Model" Lesson
>> Basically our solver is defined, our model is defined previously. Now, we actually need to do the training itself. And the way how you do it is just saying, model.fit, and then just basically providing what input data you have. So in our case, it will be just x_train.
[00:00:27] You also need to provide all the labels, right? So our y_train is the array of all of those going zero, ones that up to nines. And what else do we need? You need to specify the amount of epochs. So what is epoch? Epoch is the event where you show to your model all the input dataset once.
[00:00:57] So when we say, one epoch happened, it means that we went through all the images of my, in this case, 60,000 handwritten digits, right? But what we can do, we can just show them once. Then remember, with learning rate, we just gonna slightly getting to the result. So, I can simply repeat this process several times.
[00:01:22] For instance, I can say epoch should be equal to 20. And what will happened also, when you will be going through those training datasets, your data can be also reshuffled to prevent any biases. So, okay, let's see model.fit. We've got our images, we got our labels, we gonna do training for 20 epochs.
[00:01:48] Anything else we need to specify? Well, for now, let's just keep it the way it is. Let's execute and see how the training is happening. So, our first epoch, finished. So we went through all the 60,000 images, right, and it took only four seconds. And our loss function which is sparse categorical cross entropy is at the end of this epoch was 0.2.
[00:02:14] And as we can see, it's only decreasing with each epoch. But I also like to look at the accuracies. So even after the first epoch, we get to 93% accuracy, 97 on the second epoch, so that's pretty cool, right? So with a 99% accuracy, at least on the training dataset, we recognizing those numbers from the training dataset.
[00:02:41] But it's not exactly fair when you training your model on training datasets, and then asking, what's the accuracy of the prediction? Of course, it will get to 100%, right? Because your model is trained to recognize those particular images you using as you train dataset. So let's be fair.
[00:03:02] We have test dataset. That's the images which our model haven't seen yet. And we can actually use those to affiliates the true accuracy of our model, right? By the way, we can probably reduce the amount of epochs to ten or something like that, right? Because you can see that our model been around 99% accuracy for a pretty long time now.
[00:03:29] So what we can do, we will simply, yeah, the easiest way to restart everything, let's redefine the model. Recompile our solver and now in the model.fit, that's where the training is happening. And by the way, another really nice function in Jupiter, if you just type two percents and put time, it will print out the wall time, how long this particular cell was executed.
[00:04:06] And all right, I said that we will reduce the amount of epochs to ten. And to provide these test dataset, we can say, Where was it? Validation data, yes. So if I type validation data and then I need to specify x_test, right, because that was our images, the one our model haven't seen yet, and y_test.
[00:04:36] So in our, below. Let's just quickly take a look at what we have in x_test.shape. So in x_test, we have another 10,000 images with the same resolution. So basically what will happen, our model will be used x_train and y_train to be trained, right? But then after each epoch, it will simply go through all the x_tests and y_tests to do the estimation, how accurately now we can recognize those x_tests and y_tests.
[00:05:18] All right, so let's actually run the code. So right now we only printing the loss function and accuracy for the training dataset, but as soon as we done with the epoch, we doing the calculation of accuracy for validation dataset and loss function. So true accuracy of our model is actually higher than the training, which is pretty interesting.
[00:05:41] But that's happening sometimes, right? It's just the numbers in our validation dataset is so obvious for instance, right? You can get slightly higher accuracy. But you can see that we somewhat normalizing close to 97, I'd say almost 98% accuracy. So only 2% of our handwritten digits are misrecognized, which is pretty awesome.
[00:06:10] But this data set, it is kinda unique dataset. And this dataset considered to be hello world for computer vision, because it's really simple and it's really difficult to overfeed. So there is no kinda different objects overlapping with each other, right? There is no images where zero and five are sitting right next to each other.
[00:06:32] It's individual digits, right? So it's really clean, really easy. But still, it gives you an idea of what we should do to train our models. All right, so with all of these, what else can we do? We can now actually access history. So if you go to model.history, and for instance, history, once again, what it will print out is how our loss function was changing and how our accuracy was changing.
[00:07:06] That's for the training dataset, right, and how the validation was changing as well. And that might be quite helpful if you want to print it out. So for instance, what we can do, we can say, do I have plt? Yes, I do. I can say please plot, We can see, let's h be our model.
[00:07:31] So h, if I run this, is just the dictionary. So I can say, h Accuracy. So if I do this, It will print out to me how the accuracy was changing with a time, right? And I can do the same thing for instance. There was validation, val_accuracy. And now, two plots.
[00:08:05] So that is really good, right? So basically, we gets to more or less stable accuracy of 97.5% even after, let's say, third epoch, which is pretty good. And once again, our model is fully connected in our network. So, let's just play along with it. What if I reduce the amount of neurons?
[00:08:35] How many do we want? Let's say 16. I don't know, right? So it basically means that, all of these 700 plus pixels, we'll be talking to only 16 neurons. That's only one hidden layer we have, and then the less layers still are 10 digits. So if we just rerun those, we see that's the number produced here, perfect.
[00:09:01] We can still use the same optimizers, right, and just do training. First of all, you can notice instantly that the training is a little bit faster just because we need to, we're playing with lots of data basically, right? And let's see, the accuracy kind of dropped, right? So 92%, remember that even after the first epoch, we got to almost like 97% instantly, and now it takes a little bit of the time to get to even 95%.
[00:09:36] And yeah, we can probably add additional layers. So although, we can have less, Less neurons overall. But just because we have multiple layers like this, you will see in a second that it might give us actually pretty good solution.