Check out a free preview of the full Machine Learning in JavaScript with TensorFlow.js course

The "Collecting Shape Training Data" Lesson is part of the full, Machine Learning in JavaScript with TensorFlow.js course featured in this preview video. Here's what you'd learn in this lesson:

Charlie builds a dataset of images for two different shapes. The dataset is split into images used for training and testing. She also begins coding a Node.js module to load the images and prepare them for processing by TensorFlow.

Preview
Close

Transcript from the "Collecting Shape Training Data" Lesson

[00:00:00]
>> So at this point, I think the download is already working. But what we're gonna do, and it's going to be a little painful, but we are going to, you can pick two different shapes. I'm going to draw a triangle and a circle, and we're going to do that 40 times each.

[00:00:19]
So, we're going to probably spend maybe the next 10, 15 minutes, we're going to draw and we're gonna download every picture. And what we'll do after is, you can create, let's create now a data folder. So let me just check check, a second, where exactly, so yeah, the Data folder can be outside of the Public folder, that's fine.

[00:00:45]
So here at the root of the project, create an empty Data folder. And for now, what we're going to do is, so again, I can do one just to show you to get started. So, let's say I draw a triangle, so I have a triangle like this, and I'm going to download it, and it will be this one.

[00:01:09]
And here I'm going to start by renaming it, and I'm gonna do 0 because this is the first one, and then triangle, like that. And then I will, I'm gonna stop my server and open my folder here. And in my data folder, I'm going to drag this zero triangle drawing, and I'm going to do the same again and again.

[00:01:33]
So I clear, and then I draw a new triangle, and then I download, and why do I call it, 1-triangle. And I'm gonna do that, and you're gonna do that 40 times for each label. So if you want to do a square instead of a circle or whatever, I'm gonna do triangle and circle and I'm probably, I think we could be able to do in 10, 15 minutes.

[00:01:57]
So, if you want to get started now, I'm gonna do it as well. All right, so let's do it. All right, so, hopefully you're done with drawing 40 circles and 40 triangles. If you haven't had the time or if you're not ready to have 40 but you have at least maybe 30, that's right.

[00:02:18]
If you only have 10, that's gonna be not enough. But at the end, your data folder should be kinda full of files that have the format like 0- and then your labels, so the name of your shape. So I have 0-circle.png, 0-triangle, and then, 1-circle, 1-triangle, all the way to the end.

[00:02:37]
So it looks kind of cool like that. So you should have about 80 files, that would be a good start. In general, obviously, if look at open source datasets, they have like thousands of images, we do not have the time for that. So what's gonna be interesting is that as we create our model later on, we'll see that our accuracy is probably not going to be 90% or anything, it's probably gonna be lower because-.

[00:03:00]
Because of the amount of images that we have. And then we'll be able to talk about how you can improve your accuracy by drawing more triangles. And obviously here we did it by hand, we drew circles and we drew triangles. And in the end, if you get tired, your circles end up looking more like squares.

[00:03:18]
But yes, usually, there are ways where you can actually automate some of that work, some of you might have done it. And what I've done in the past is you can do, I think it's called either data augmentation or data transformation, is you can write a Python script, and you can take one of your image and resize it in a different way.

[00:03:35]
So you can flip it, for example, or you can reverse it, and you can then end up generating maybe five more images from a single image, and that's like a quick way to end up with having a lot more data. So, if you want at the end of this workshop, you realize that your model is not accurate enough and you want to data, then you can go and write a Python script that will take.

[00:03:55]
For example, your first circle, and then you are rotate it 19 degrees, and then again, again, again. So you have like all the different angles and then you can also flip it, and that will give you about six images from one circle that you drew. So you end up 80 drawings times 6.

[00:04:11]
I'm not gonna count here, but you can end up with maybe a couple hundred drawings. So that will hopefully then improve the accuracy of your model. But for now, I wanted to do it the manual way, just to show you how it's done. And now that we have our data in the Data folder, one thing that is good to know when we are working with machine learning is that you need to kind of split your data set between a training set and a test set.

[00:04:41]
So what we can do here is, inside your data folder still, you can create a new folder. So I'm gonna do it outside. So I'm gonna create a folder here that I'm gonna call train, and I'm gonna drag it into my data folder. And I'm also gonna create a new folder called test, and again, I'm dragging it into my data folder.

[00:05:04]
So in the end, your data folder should have all your images and then a test folder and a train folder. And what we're gonna do in here is we are going to maybe copy, let's say, if you have 40 drawings of each shape. Maybe we're gonna copy 30 of each drawing into the train's folder and then 10 into the test folder.

[00:05:27]
So you kind of maybe want a split of 80% of your images into a train folder and then maybe 20% into a test folder. So depend, if you don't have 40 images, kind of like, see how much you can actually drag in there, but if I have, so here I will go until my, or maybe let's do either 25 or 30.

[00:05:49]
I think it should be fine. So if we select until our 30th drawing and we drag it into, okay, well, I can just copy. So we can select them, Copy, go in your Train folder, Paste. You should paste them all, okay? So we should have 0 to 30, and then we are going to maybe, well, let's select from 31 to 40.

[00:06:15]
So actually, it's not more gonna be 31 9, but that's fine. So we can copy here and put them in the Test folder, Paste. Okay, so now if you want, you can delete the images I started for them. We're gonna keep them, it's fine, we're not going to load them into our app.

[00:06:34]
But it means that if you look at your train folder, you should have a bunch of images, and in your test folder you should have a bit less. So the way this is going to be used is that we're going to use all of the images in the train folder to actually train our model.

[00:06:47]
And then at the end of the training step, we'll use the test images, so the images that the model would have not seen before and be able to check if its prediction match the label that's in the name of the file. So at this point, all we have is data, kind of, hand drawn data.

[00:07:04]
So the first thing that we need to do is to, start to load that data in our app. So if we go to, the getData file, I kind of split my files into the three steps that we're going to take. So this is going to be written in Node.js.

[00:07:24]
So if you have not really written Node.js before, it's just very similar, it just doesn't run directly in the browser. So instead of using import, though, we're going to use const tf, and we're going to require @tensorflow, and we're gonna use node GPU. So, @tensorflow/tfgs-node-gpu, and t TensorFlow here is going to be used to create tensors out of our images, so that then they can be fed into the model.

[00:07:59]
But then because we are working with a file system, we also need to require fs. So const fs require the file system, native node module. And we also need to declare a path variable. So require and then path. So, in terms of things that you need to require, that should be it for now.

[00:08:24]
Then we're going to create a variable to indicate the path to our data. So we can do train, images, directory, so trainImagesDir, and this is just going to be ./data and /train. And we need to do the same to also indicate the path to our test file. So you can copy your line, and instead of trainImagesDir, we're going to write testImageDir, and we also update the path from ./data/train to ./data/test.

[00:09:05]
And then what we're going to do is we're going to just create two arrays to hold our data. So we create, let trainData, it's gonna be an empty array, or, actually, let me just check. I think we can actually just not even give it a value for now, cuz we'll just assign it directly later.

[00:09:29]
So we can have, let trainData and testData, so that will contain our tensors with our data later on. So at this point, we just have like a basic setup but we have not yet loaded their images. So we can create a function called const loadImages, and it's going to take in our data directory as parameter.

[00:09:56]
And we're gonna create, oops, my arrow function is not good here, okay. And we're going to create two arrays, one that's gonna actually contain the images, the image data at least, and one that is going to contain the labels. Now, the next step is to actually fetch the files or read the files from the local folder.

[00:10:20]
So we can have let files, and we're going to use the file system method. So fs.readirSync. So, careful here, readdir, like, dir doesn't have like a capital letter. Sometimes I get confused. So readdir with a lowercase d and sync with a S. And we are passing it the path to our directory.

[00:10:47]
So what this is going to do is going to read the files that are from the directory, but then we need to loop through these files to actually get the images and the label. So we can create a for loop, so we have for(let i = 0, and then until the number of files, so i is less than the files.length and i++.

[00:11:11]
So this loop is going to loop through all of the files that we have in our folder. And what we're going to do is get the file path. So here we, maybe not, but let file path is going to be, we're going to use the path module that we required.

[00:11:31]
We're going to join and then we do data directory. And the second argument will be the name of every single file, so files, and then you get the particular file that we're looking through. So this is going to be the path for each individual file, and then we're going to be able to read the data that's from that file.

[00:11:52]
So we can create a buffer, so let's buffer as a variable, and we're going to use file system as well. So fs.readFileSync, and we're passing it our file path. So this is going to return the data that is contained into each of our file, and it's returning a buffer.

[00:12:10]
So even here, if we wanted to look at what's going on, what we could try to do is, if you want to console.log the buffer, we can try to call this function here. So if you take the load function and you call it and you pass it, the train image variable, if we want to run this, it should actually then just show you what the data actually looks like.

[00:12:46]
So if you have called and then in the terminal you write node, and that's getData.js, it should actually return a bunch of yes. So you can see for every file that we are returning, the data that we're getting back is like a buffer. So this is important, because then there's some methods from TensorFlow that are going to decode this buffer.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now