Machine Learning in JavaScript with TensorFlow.js

Object Detection with TensorFlow.js

Socket

Lesson Description

The "Object Detection with TensorFlow.js" Lesson is part of the full, Machine Learning in JavaScript with TensorFlow.js course featured in this preview video. Here's what you'd learn in this lesson:

Charlie uses the coco-ssd pre-trained module in TensorFlow to build a simple application that predicts the content of a selected image. The prediction results can vary widely based on the chosen image and the data set used to train the model.

Join Now

Preview

Transcript from the "Object Detection with TensorFlow.js" Lesson

[00:00:00]
>> We talked about pre-trained models, so let's actually start to do it in JavaScript. So if you've cloned the repository, if people have not, I'm just gonna mention the URL quickly. So github.com/charliegerald/fem, for Frontend Masters, -ml-workshop. You can clone everything in there. And if you then navigate to the folder called Exercises and then Project 1, if I open the folder here, so I'm gonna go through the files that are already in there.

[00:00:34]
So I have code that is pre-written, and then we're going to write the parts that are specific to machine learning. When you work with TensorFlow in JavaScript, TensorFlow is just a library, the code that you're writing is not completely just machine learning. So here I have a utils file in which I wrote a few JavaScript functions that have nothing to do with TensorFlow.js, but just to help in the UI.

[00:01:01]
So here I have a first function that's just showResults(). Because when you're returning the output from the model, you can just log it to the browser console, but sometimes it's nice to see it on the screen. So it's just some dumb manipulation. I'm just getting an element, and then I'm appending the labels to the screen.

[00:01:20]
So this is just a helper function, you don't have to use it. If you just want to see the outputs in the browser console, you can do that. And then I just have a handleFilePicker function because what we're gonna do in the first exercise, so if you go back to the terminal and you run npm watch.

[00:01:37]
It should open, and at the moment if you click, nothing is gonna happen because we didn't write the code. But it's a simple UI with a button, Choose File, and when we're gonna click on it, it's gonna open the file picker. So we're gonna choose an image. If you have already an image ready in your desktop or if you can go to Google and download an image of a cat, or a dog, or a car, feel free to do this.

[00:02:02]
But we're gonna open the file picker, we're gonna pick an image that we have saved locally on our desktop. And that's going to be given to the machine learning model, and we're gonna get the prediction on screen. So at the moment, the UI is very simple to use the pre-trained models.

[00:02:17]
So if we go back to our code in the utils file, the handleFilePicker function is some normal like vanilla JavaScript that uses the file reader with API. So again, that's not directly related to machine learning. If you have implemented something that was using the file picker before, you might have even written this code, but not with machine learning.

[00:02:42]
So just to go through what it does, I'm just getting an element. And on, oops, onChange, I'm just checking that it's an image then I select it, otherwise, if it's not, then we return. You could add an error message here to the user saying, make sure you select an image and not a PDF, for example.

[00:02:56]
And then here with your extension and with your file reader, and onLoad, we are gonna create an image element and we're going to kind of load the image that we selected onto it. And this is the part here, that's just like a callback function and that's going to be in Javascript or in Tensorflow.js, our predict function.

[00:03:18]
So now, this is just the two details function that we're gonna use. But let's write some TensorFlow. Okay, so if you open the part1.js file, so there is nothing in it right now, and we are going to add our kinda like image detection logic. So the first thing that we need to do is import TensorFlow.js.

[00:03:39]
So if we import, oops, tensorflow, and we're gonna just import the whole TensorFlow.js library. And then we also need to actually import the model. So the model that we're going to use here is called CocoSSD. It's an image detection model. So we import everything in the CocoSSD, yep, from, and then coco-ssd.

[00:04:06]
So I forgot to mention I had already done it before, but if you have not run npm install in the terminal here, oops, not here, in the terminal her before running npm run watch, you need to do that to be able to actually use the packages that we just imported.

[00:04:26]
So just to show you here, I had written them before, so this one we'll use later, but first, coco detection is the one here. And so if you run npm install, it is going to install them and we're gonna be able to use them in the code. So at the moment, we just imported TensorFlow.js, and we just imported the pre-trained model.

[00:04:45]
So what we can do now is just create a model variable. And we're going to use it in two different places. So we just declare it here, and then we're gonna use it in the rest of the functions. So before you can actually use the model to predict something, you actually have to load that model.

[00:05:01]
It's like kind of the way you instantiate an object. So we're gonna just write a function in it. That's gonna be an async function. Oops, and yeah, so the first step, we need to load it. So the way that you can do that is on CocoSSD, and on a lot of TensorFlow.js models, you can call a load method.

[00:05:24]
Sometimes there's a bit of a difference as other methods that you can call as well. But for CocoSSD, you're gonna load the model and we store it into our model variable. And what we do next is that because we want to use this model when the image from the is loaded, we actually need to call our handleFilePicker util method.

[00:05:46]
So file picker, and if you use VS Code it probably or might have been imported directly. Otherwise, it's you write it here, so you import the function handleFilePicker from the utils. And if we go back quickly to what this does, as I was mentioning before, it takes in a callback function, because this callback function is going to be used when the image is loaded.

[00:06:10]
So once we've selected it from the file picker, we're directly going to feed that image to the predict function of the model. So here we're just gonna pass it a function we have not written yet, but it's gonna be called predict, and we're going to to write it right now.

[00:06:25]
So now if you write the function predict, also an async function, and we pass it the image element. When we call the function on the model, it'll produce predictions, so we're gonna have a variable called predictions. And what we can do is that because we have the model available, we can do model dot, oops, and then detect is the method that you can call on the model.

[00:06:50]
And we're going to pass it the image element. And here at this point, you can either console log it, console log the predictions, or you can also, if you want, import the showResult function that's in the utils as well. So here, again, if you use VS Code, it might have done it automatically, otherwise you just need to add showResult on this line here at the top.

[00:07:15]
And at this point, all we need to do then, is actually call our init function. And now, if you have installed and if you've written this code and if you go back to the browser and you refresh, what should happen? Is when you click Choose File, it opens to the file picker.

[00:07:34]
And then here I have an image of a little cat. I have prediction cat and probability 0.98. So the predictions or the probabilities come back as a number between 0 and 1. And 0.98 just mean that the model is 98% confident that what's in the image is a cat.

[00:07:54]
So in this [LAUGH] case, that is basically all the code that you need to just use a pre-trained model in TensorFlow.js. And in general, even if you use other tools, that's the real purpose or the real benefit of using a pre-trained model. It really reduces the amount of code that you have to write yourself to actually start and get a result and get predictions.

[00:08:16]
So as I was saying before, using this model here, I have a picture of a cat and it was right, but there's a lot of cats on the Internet. So if this model was trained with Google images, then it's pretty nice that you can recognize the cat. But if you have other images on your desktop and the model wasn't trained with that, you might end up noticing that either the label is wrong and then the probability might be lower as well.

[00:08:40]
And this is what I was mentioning before, is that I did not find what CocoSSD was actually trained with. I could have maybe done more research, but I think at some point, I don't have the image here, but I had a cooking class with my work team and I took a picture of the, it was a risotto.

[00:08:59]
We made a risotto and I just tried it with it, and I think it detected that it was a truck or something, which is not at all that. So it means that, obviously, CocoSSD was not trained with images of risottos. So it just tried to find whatever is closest in its training data to what it was trained with.

[00:09:16]
So I don't know how risotto can be close to a truck. But maybe all the rest of the images were so far from that, that it thought that it was a truck. So again, depending on what you decide to build, if what you want to do is build an app that detects animals, then you should be fine.

[00:09:37]
Or, you have a question?
>> Yeah, so the CocoSSD is 7K? And I'm assuming if you wanted to get more accurate models, it would be far, far larger, cuz that's a pretty small model.
>> Yes, so it's definitely possible, but again, it depends on what you want to do.

[00:09:59]
Because then if you make your own and it's more tuned to the images you want to detect, then you might not need that much data. So it really depends on what you wanna do. And with pre-trained models, what's difficult is that, obviously, if you want to do your own custom thing, it might be very different from the data that was used.

[00:10:16]
So, of course, more data is better. But again, the model is just the output of the training data. So the model does not contain all of the images. The images were fed into the algorithm that then updated a model, and some models are heavier than others, but it's not going to be as big as the gigabytes of data that you trained it with.

[00:10:40]
So it's possible that maybe it's the size, but it's also optimized for it to be used in the browser. So it really depends on what you want to do. Cuz there's another model called MobileNet, and I don't know how much bigger it is, but actually felt that when I was using MobileNet, I had lower accuracy than with using CocoSSD.

[00:11:03]
So it could have been that the data that was trained to use MobileNet was even further than what I needed, maybe there were no cats in MobileNet or no risotto or whatever. But sometimes I just actually didn't look at the size of MobileNet. So maybe it is related, but also it's like, I mean, yeah, gzip is 2.8.

[00:11:23]
So I would say it depends as a classic senior engineer. [LAUGH]
>> It's just amazing to me that it's that small that you could easily put this in a web app.
>> That's the whole point of Tensorflow.js, is to be able to actually build machine learning applications in JavaScript.

[00:11:42]
So again, as I said, if you use an image and it was tuned to it, then it's great. But again, it will depend on what you actually want to predict. Maybe sometimes it might not know what a pin is, or a glass, or a phone. So it's gonna be a bit of trial and error and see what works for you.

[00:12:01]
But in terms of steps that you have to take, you import it, then you load the model. Then here I have some utils, but you don't even have to write a file picker. If you wanted, you could just add an image tag here with the source to a local image or a remote image hosted somewhere, and all you would need to do is get that.

[00:12:25]
Don't write that because it's not gonna work, but if you're just doing a document, getElementsByTagName like image, something like that, then you would be able to pass that in and that would be all you need. So I just tried to be a bit more interactive so that you can let the user pick an image that they want to detect.

[00:12:42]
But it could be just as quick as having a tag with an image, then getting it in JavaScript and feeding it to the model.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now