Machine Learning in JavaScript with TensorFlow.js

Using Webcam as Input

Socket

Lesson Description

The "Using Webcam as Input" Lesson is part of the full, Machine Learning in JavaScript with TensorFlow.js course featured in this preview video. Here's what you'd learn in this lesson:

Charlie expands the object detection project to use the webcam as the input source. Buttons are used to start the camera, capture an image, and run the prediction code.

Join Now

Preview

Transcript from the "Using Webcam as Input" Lesson

[00:00:00]
>> So that's for the part one of the first project. But then I thought I wanted to kind of show you a little bit another twist to that thing as well. Let's say that, for example, your user does not have an image in the desktop, but they want something a bit more dynamic.

[00:00:15]
So what you can do is you can use the Webcam. So what we're gonna do here, if you just comment out part 1 and below there is part 2 and 3. There's gonna be as well, in part 3, we're gonna use a different model just to show you a little bit how that works.

[00:00:34]
But here in this section, so in part 2 of the first project, we're going to use an image content from the Webcam. And the last thing we need to do is to change part1.js to part2. Some of the code is going to be similar but I thought that just having a new file might be better so that you can keep everything that you've worked on.

[00:00:56]
So, if we now maybe we might be able to copy some line. So let's open part 2 will be an empty file. So we're going to use cocossd as well. So you can copy your import of TensorFlow.js and of the model so we can copy that. And then let's think a little bit about our logic.

[00:01:16]
So again, if we go to the utils file, and you scroll down a little bit, I have some helper functions to cover the part of the code is not related to TensorFlow, Jas. So you might have used that before if you've if you've built a web app that had access to the camera, so we're using the Get User Media web API.

[00:01:36]
And the way that it works is you call navigator.media.getusermedia. There's some, then parameters that you can pass. So we don't need the order in this case, so order is false. And then video here, we just pass it a width and height for the size of our Webcam feed on the screen.

[00:01:51]
And then when it's ready, we have a stream and we can get different tracks. We're just gonna get the first one and when the video is loaded, we're going to actually play it in the browser. So you're gonna be able to see yourself in the camera. And then, so for the webcam feed, it is like a video element, but the COCO-SSD model is like an image detection.

[00:02:13]
So what we're gonna do, basically a video is like a sequence of pictures. So what we're gonna be able to do is to click a button and it's gonna take a picture of the current live feed. And then we're going to give that to the machine learning model.

[00:02:26]
So here, I just wrote like a take picture helper function as well, where all it basically does is, there is like a predict button that is in IHTML here, and we're going to basically predict Webcam and capture. So here, the button Webcam is to just start the webcam, then pause is just pause it and capture, and predict is going to actually predict what's on the screen.

[00:02:55]
The video is going to be the Webcam and Canvas is actually going to display the image that I captured from the camera. What we're gonna do is once we call that function to take a picture, we're going to draw it on the canvas. And here when we call the predict button, also I'm gonna pass a callback with the actual image from that canvas and it's going to call the machine learning model.

[00:03:19]
So we're gonna write that. This is just a helper function, it will make more sense once we actually write some code. So as I said before, we have imported TensorFlowjs and the cocoSsd model. So first of all, because we need to actually enable the camera we're going to get the camera button.

[00:03:36]
So if we right camera button, and we're just going to get the element, element by ID Webcam. So here we have our webcam button, and at the same time, let's also get the capture button, and given the idea of pause, but maybe calling it capture would have made more sense, but it was basically that post the video and take a capture.

[00:04:02]
So we have our webcam button, we have our capture button, and we are going to need our video element as well from the webcam. So we have const video, and then document, getElement. I mean, I'm probably easier to do querySelector A video so that's gonna be our video element.

[00:04:24]
And as we did before, we before we can use the model, we need to load it as well. So we can recreate a function called init to kind of get started everything. And here, let's redeclare a model variable as well. So, that's very similar to what we did before.

[00:04:43]
And it's kind of oops, nope, nada, no cookies. To show you that if you get started with one way of doing things, you can actually update your code with minimal changes. And I'll show you that as well in part 3. But here, when we're gonna call that init function, the model is going to load, but at the moment we have no interaction.

[00:05:06]
So the first thing that we need to do is actually enable the camera on the screen. So we have a webcam button, so we can have an onclick event. So webcam button, then onclick, we're gonna have a function And we are gonna call the start webcam util and we're gonna pass it our video element.

[00:05:26]
And this time it did not auto import so I'm gonna import it from my utils file okay so when we click the webcam button it's gonna start the camera maybe at this point we can even check in the browser that it works fine. So if I refresh, you shouldn't have to rerun the terminal like it should be live reloading.

[00:05:49]
So if I click on the webcam, it might ask you for blocking or allowing the camera, so you have to allow it and then no camera. Okay, but at the moment capture, there's no event associated with it. So at the moment we should just have the webcam trans on when you click the webcam button.

[00:06:10]
So I'm just going to go back to the code, so what we wanna do next, as I said, the image detection model takes in an image as input, so we need to actually capture that image. So we can add an event, an onclick event to the capture button as well.

[00:06:25]
And then we have on click. And if you remember the take picture helper function as well take picture and then we're gonna import it from the utils. It's a function and if we go back to the utils file, this take picture function gets the video element, and our callback function, that's going to be the predict function.

[00:06:47]
So we have our video element, and here the same way that we kind of did it in the previous exercise. We're passing predict, but we have not kind of written it yet. So here, we can now move on to writing our predict function is gonna be very similar actually, if you want to go faster, you can even go back here, copy the predict function from part 1, and then copy it over in part 2, and that's the exact same code.

[00:07:16]
So all we need to do at this point is call our init function here, actually I mean, we might not have necessarily needed the init function here. I just kind of kept the same structure as I had before. I wrote show result, but I did not import it from and I don't know why there's a pink.

[00:07:34]
Okay, so I turn on the Webcam again, capture, and if I could predict, I should be a person? Yes, prediction, person, and then probability 0.91. So it's 91% sure that on the screen I am a person. So then you can have a bit of fun with trying to predict something else.

[00:07:54]
So I do not have much with me right now and I have a phone. Maybe I can try to put the phone on camera and see if it will detect that it's a phone. If I move myself out and then I put my phone, [LAUGH] I don't know what's going to come out of here.

[00:08:10]
So a remote. I mean, it's not too bad. A phone can be a remote and probability is 0.92. So technically, it kind of looks like it could be a TV remote. But that's basically as well how you can use a webcam feed to take a still of the video and give it to a machine learning model, like the exact same machine learning model, but find different ways of interacting with it as well.

[00:08:34]
>> There's collapse and chat.
>> Really? Well, thank you, [LAUGH] I think if you remember them, I'm just gonna stop the camera here. So this is what I used for this project. It was also like what I showed you at the beginning, it's a website. Turn on the camera, I go around and I take pictures, it gives me a label.

[00:08:53]
And then I would decide, it's kind of asking me, what is it made of? And it's like, well, it's plastic. So then it will tell Tell me plastic goes in like this bin. But again, it depends on which country you're in cuz the bins. I think when I was in Australia, the time the yellow bin was the recycling bin.

[00:09:08]
But here I think colors are different.
>> And what model did you use for that one?
>> It was either Mobilenet or Coco-ssd. COCO-SSD, actually I have both here. May I have tried with both them and then picked It's one. But if we look, I built that years ago, COCO-SSD.

[00:09:27]
Okay, so I built I built exactly the same. And here, I don't know what to look at this code is very old. [LAUGH] So maybe let's not look at six, seven years old code. But that's the type of things that you can do with images. So, in the context right now, you might not be thinking right away what you can build with this.

[00:09:44]
But there was another application that someone else builds where, for example, you had a tool to generate alt tags for images kind of automatically. So it could do a first pass for you and then as a human is probably veterans to still kind of check if the alt tag that is generated is correct, and if you just add some details, but that's one idea of a tool that was built.

[00:10:07]
And at some point there was a tool that was built that was called nsfw.js, like not safe for work. I will not Google that right now, but it was a tool basically to kind of tell you if an image that was uploaded somewhere was not safe for work.

[00:10:23]
And that could be when I stumbled upon this project, it would actually would have been useful in my previous life when I was in advertising and sometimes you have brands that run competitions online and they tell users, I put pictures and win soap or whatever. And obviously people pick pictures, but sometimes you can't control what they're going to upload.

[00:10:42]
You only have to do the check manually later. And a tool like that could have done a first pass as well and for a human to not have to necessarily see pictures that they don't want to see. And what was interesting with that model is that it wasn't telling you necessarily a label about what was in the picture, but it was telling you more of, is it obscene?

[00:11:01]
Is it neutral? Is it a fun or something like that? And you you could build your application so that if what comes out of the modal is obscene or nudity, then you could kind of like have a little flag in your application that was maybe this image, isn't quite what we want.

[00:11:18]
Yes question.
>> Kristen and Cat said he got it working and took a picture of his cat. And his cat is 72% sure it was a cat.
>> Maybe it's only half a cat. I don't know. [LAUGH] Well, maybe it was, I mean, because what happened is that sometimes depending on how much space the cat is or the person or whatever object is taking in the picture, as well as sometimes if you're closer than technically in terms of pixels that you're taking more space in the picture.

[00:11:48]
If I was a person but I was far away, then in terms of the way that the model looks at the picture, it's just really a long range of pixel data. And if you're very far, then it might look like, okay, what's the biggest thing in the picture and what is that?

[00:12:04]
So depending on where the cat was or maybe you just didn't like that particular cat. But it was still you know 70% smaller than 0.5?

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now