Machine Learning in JavaScript with TensorFlow.js

Building Training & Testing Datasets

Machine Learning in JavaScript with TensorFlow.js

Check out a free preview of the full Machine Learning in JavaScript with TensorFlow.js course

The "Building Training & Testing Datasets" Lesson is part of the full, Machine Learning in JavaScript with TensorFlow.js course featured in this preview video. Here's what you'd learn in this lesson:

Charlie finishes the Node.js module, which will build and test the model to detect the drawn image. Methods are created for getting the training and testing data and returning Tensor objects with the images and labels.


Transcript from the "Building Training & Testing Datasets" Lesson

>> So if you go back to your code, you can remove this, we don't need to call out we just wanted to show you what's the buffer data the way that it's coming back. So under your buffer variable here, we're going to create an image tensor out of this buffer data.

So we can create a variable called ImageTensor. And because we're in node here, we are going to call tf.node. And then on top of that, we're going to be able to call some specific methods to decode our data and also resize it. Because at the moment, our drawings are in 200 by 200 thresholds, but we don't actually need to feed into the model data that is that big, we're gonna resize it to a smaller size.

So it's usually, it will just go faster and we really don't need to be looking at every single pixel in a 200 by 200. So after tf.node, we're going to call the method called decodeImage and we're passing it the buffer. And then the resize method here that we're going to use is resizeNearestNeighbor.

So there's two different methods to resize, I'm actually not quite sure why one would be better than the other. In this case here, nearest neighbor works for me. And when we're calling it we need to pass the shape that we want our data to be in. So we're basically transforming each of our images from a 200 by 200 to a 28 by 28, you can pick another number later on.

As long as the number that you pick here is going to be the same as the number that you going to pick when we're going to actually build our model. If you use a different shape here than the one that the model is expecting, then you'll get an error of being, the model was expected shape.

I don't know 24, 24, and this image is 28, 28. So as long as you keep the shape the same, we we should be good. So, if you're following along, I picked 28 so maybe use the same value so that it's easier. And I think that, we just need to call the expandDims() function as well.

So this will expand the dimension of the tensor, because I think some errors before where the image had an array of three numbers and actually was expecting four. So I need to double check exactly why. But when you get an error, you run your code and it says, the model was expecting a certain shape, and it has another one.

And you can see that the number of dimension is different. Sometimes you can just call expandDims and it will add, is there a no value or a zero to add a dimension to your tensor? It doesn't really do anything more than that. And now that we have our image tensor, we're gonna push it into our images or so images, push an (imageTensor).

So our images array is going to contain all of the tensors that represent our image data. So that is for the images themselves, but we still need to deal with the labels. So under it, what we're going to do is so the way that I did it we're going to create a variable.

And because we drew circles and triangle, I'm gonna do constant circle. And for each file again, so files, (i) so this for each file, just to be sure to just turn it to lower case some of you have written circle with a capital C or anything. So just to really make sure that our data is formatted all the same, just call to lowercase, and then we're gonna do endsWith.

So we're checking that each of our file is ending with, here (`circle.png`). So we're making sure that all of the each file that we're that we're trying to load is either a circle or a triangle. And that's why, when I asked you to name your files this way, 7-triangle, 8-circle, it's to make it this easier here.

It's easier to actually figure out if the current file is a circle or a triangle. So we're gonna do the same for the triangle as well. So if you have decided to draw a square instead of a triangle, just make sure to also adapt your code to that.

But here we have variables here and what we need to check now is that, well, is the current file a triangle or a square? So there's probably other ways to do this, but here I just have okay so if, not square if circle is true. So if at the moment the current font that we are going through in our loop is actually a circle, we are going to push into our labels.

So labels.push, and we're going to push 0, so that's gonna be, let's say that squares are 0 or circles are 0 and triangles will be 1. So then we can say, Okay, else if then triangle is true, we are going to do labels.push and 1. So the way that TensorFlow or in general tensors work is that you can't really pass it pieces of string.

First of all, here we're passing an integer, and then we'll also transform that into a tensor as well, so that it will work with the model. So at this point here, we have looped through all the files in our folder, and we've created tensors from images, and we've also kind of pushed some label, either 0 or 1, into our labels array.

And what we need to do after that is just return our array of images and labels, so you can just return that. But at this point, we have not called our load images, and because we need to call it for both the train images and the test images.

Let's just call a new function called const loaddata, and it's just gonna be an arrow function. And inside there, we're just gonna make the other console log just so that when we run this in the terminal later, not load console.log. And we're just gonna give us some little hints of what's going on.

So just console.log, no loading images, and then we're going to assign to the variables that we created here train data and test data. So, if we call trainData, we're going to call our loadImages function, and we're passing it the path to the folder. So trainImages there, here and do the same for testData.

So, testData reloading, calling our loadImages function and we're passing testsImages directory, and after that, I just want to just to make it a little bit easier for us as well. We are going to just call images loaded successfully but actually I'm not awaiting, so maybe it won't even matter I think it might just console.log at the same time.

So maybe it's not that helpful but let's leave it like this for now. Okay, so we started to put it into arrays images and labels we have our data, but at the moment we kind of need to have two different functions. Because we have loaded our trainData and our testData, but they're not really yet returned as thing tensors the labels, at least, are not written as tensors yet.

So to make it easier in terms of structure of like requiring the data later, we're just gonna have a function that we're gonna call, get trainData to only get the training data, and then get testData to only get the test data. So if we call const, getTrainData, that's an array function as well.

And we're just going to return an object here and we're gonna have our images. And these images we're gonna concatenate all of our training data, so tf.concat, and that's going to be trainData and the first element of it. So if you ever later on decide to console.log what trainData is, what's happening is that trainData, the first element is the bunch of tensors from our images, and the second is actually the labels for these images as well.

So we just want to concat the tensors, and labels here is going to be tf.oneHot. And again, if you remember yesterday, it's kind of modifying categorical data into numbers. So we already have 0 and 1, but we want to actually transform that into a 2 times 10 as well.

So, okay, tensor 1 dimensional, we don't need more than one dimension here. And we're going to call trainData in the first element cuz that will be the labels of our images. And we need to transform that into, so the second element of tensor 1d needs to be either float 32 and 32.

So here in this case, we will, actually maybe I think yesterday I used float32. So let's use float32 and see what happens. The second argument to the one-Hot is going to be the number of classes, and here we only have two. It's either square or no, circle or triangle.

So I'm gonna write 2 and I have an issue with, sorry, I forgot the comma here so after images because we're returning an object with just, if you have been following in you have the same error as me, don't forget the comma here. Okay, my bad. So float 32 needs to be passed in as the second argument of, tensor1d not, Yeah, okay, that's it.

Okay, so just to recap, we're calling TensorFlow oneHot to transform whatever is passed inside, and what's passed inside is a one dimensional tensor of the training data labels, kind of transformed as floats. And the second argument is the number of classes. So here I had coded 2, yesterday I think we did labels.length, but here I didn't actually have DND clear variable labels.

So you could do it both ways, it's up to you. And here this is only to get the training data, but we're gonna create another small function to return the same thing, but testData. So if you want to not have to rewrite everything from scratch, we can just copy the same block and just change the name of the function to get testData.

And on the images, we want to then concat the testData, the first element of the test data array. And the labels will do the same, we are just going to change the trainData to testData, but otherwise it should be the same. So that means that the way that we're transforming our data into tensor is going to be the same for the training set and the test set.

And then because we're in node, we're just going to do module.exports. And we're going to export the loadData function, cuz we gonna call it in another file and our getTestData and I getTrainData. Okay, so at this point, everything should be good. So just to recap, we imported the tools that we wanna use.

We have the path to our training set our test set, we have a function to loop through all the files in our directory, transform the images to tensors. Making new checking if our current image is a circle or triangle, pushing an integer representation of our label into our labels array.

Then we have, just a load function that's going to load either or both the training set and the test set. And then we just have two utils function to actually return these data as a tensors with images and labels, okay.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now