Check out a free preview of the full Machine Learning in JavaScript with TensorFlow.js course

The "Using a Face Detection Model" Lesson is part of the full, Machine Learning in JavaScript with TensorFlow.js course featured in this preview video. Here's what you'd learn in this lesson:

Charlie uses a model to detect faces from the captured webcam image. When faces are recognized, the model provides an array of data for each face. The data includes the coordinate locations for the face and other key0 points like the eyes, nose, and mouth.


Transcript from the "Using a Face Detection Model" Lesson

>> So for this part 1 and part 2, we use the same model, so cocoSsd, but with different inputs. And now, I want to move on to part 3, where the piece of the code that we're going to keep is we're going to keep the webcam, but we're gonna change model.

And instead of an object detection model, we are going to use a face detection model so that then you can see that with the same input of an image, you can actually also start to build applications around seeing where a person's face is on the screen. I don't know if it does multi-people detection, but we're going to do at least one.

And I use face detection, because actually I think it can detect multiple faces. And we're gonna see what comes out of this model as well and see what we can do on screen. But in terms of HTML, the only thing we're going to change is move from part two to part 3.

So in terms of what we import, we start again. We have TensorFlow.js, but then the rest is going to be a bit different, so @ensorflow, and tensorglow.js. And now, because we're going to use a different model, it's already in the package.json, so don't worry about installing it, it should have already been installed.

That is going to be called mediapipe/face_detection. So mediapipe is a kind of suite of tools. It's still part of TensorFlow.js and it's still developed by Google, but they do a lot of detection of body parts, so it could be face. They also do the pose detection model for more of key points around your body.

I think one is specifically for hand movement. So I think when I built the plugin for a Figma, it was using the hand detection model, and it's pretty good. You have a lot of key points about different parts of your hands and stuff, and then you can build cool things.

But for now, we're just gonna keep it simple, just use face detection. And when you're using the models in the MediaPipe kinda bundle of tools, you also need to import more TensorFlow.js stuff. So we have tensorflow and then tensorflowjs-core, and we also needed to import backend for that.

So we have tensorflow.js, and I think this is backend webgl. So there's a lot of imports going on, ooh, actually give me a second. I might have been wrong about that one. Let's comment it up for now, because I think I tried a couple of different ways. The way we're going to import the model instead is everything as faceDetection cuz if changed, oops, from, and then that's TensorFlow.js models.

They've kind of changed a little bit sometimes the way that they put things together, it's under MediaPipe or TensorFlow model. We're gonna see which one works, I forgot. I think it might be that one instead. But it's the same thing, Tensorflow and MediaPipes are kind of, Did just work the same way.

And then to make it easier, we can go to part 2 and copy our Webcam Capture and Video button, because in terms of interaction from the user, it's the same thing. We're gonna still click on the Webcam button to turn on the webcam. We're gonna still have the Capture button to take a still picture, and we still need our video element to pass it to the start webcam function.

Okay, so again, to keep it consistent, we can have s init async function, and just gonna declare a model variable. And in here, we are going to essentiate our model. And you'll see here that the way we instantiate or we load the face detection model is a bit different from the syntax for coco Ssd.

So for this one, it's faceDetection.SupportedModels.MediaPipeFaceDetector. So I think that's probably why they've changed the syntax. Maybe before I was importing it like this, and now they change, they move the [INAUDIBLE] js models. So the way to start, oops, yes, so at first, we haven't really loaded yet because there's a few parameters that we need to pass.

So here, we can do, okay, I'm gonna declare a variable called detector. So they just have a different syntax for this model. And here, we're going to do detector and we're going to await the faceDetection createDetector to detect what's in the face, and then we're gonna pass it the model.

And then we have some configs, and it's just going to be an object with the runtime is going to be, yeah, tfjs as a string. So I know that in terms of the syntax, this is a little bit different. So depending on the model that you're using, make sure to check the docs are what's in the README.

I wish that you could just do faceDetection.load, but I guess that maybe it's just because it's like a MediaPipe model that they do a little bit differently from the other ones. And if you start to use maybe the post-detection model, I think that's also a MediaPipe model, so you might have to also create detectors and things like that.

And do the syntax is a little bit different, but it's not that much more lines of code. So we have the buttons and we have our init function, but we did not actually start the webcam or take the picture. So as we already wrote these functions, we can copy them here.

Okay, so we have our onClick to start the video again, the webcam, and then we have our function to take a picture. And here, again, we're passing the predict as the callback, but we did not declare that function. So this 20 is going to be a little bit different, so we're gonna write it.

So we have a predict function, predict. It's going to be async function, and we're going to pass it an argument called photo, or image, or whatever you want to call it. And as I was telling you before, that once we're actually calling either predict, or detect, or anything that is detected from the model, we get back a prediction.

So here I called it faces, because that's what we're predicting when you could call it predictions. And we're going to await, oops, the detector, and we're going to call the method estimateFaces. So instead of being more general as predict or detect as this model is particularly to detect faces, then they have a method called estimateFaces.

And we're passing the argument that we passed into the predict function. So here, it was photo, but there's a little bit more. I think that by default it might flip the image horizontally, so we just need to pass it or flipHorizontal, false, just to make sure. And at this point, you can either, oops, log the faces.

Or one util function that is probably gonna be, okay, let's start with that just to show you what's actually in what's returned from the model and then we'll actually look at the util function to draw around your face on the screen. So the last step that we need is to actually call the init function.

So if I recap, we imported the tools that are gonna be necessary. Then we have our elements from the UI to be able to click on them. So we have a Webcam button that's gonna start the webcam feed, then we have a Capture button that takes a picture, and then here we have an init function that loads the model.

And whenever we capture the webcam, we have a predict callback that is going to call estimateFaces that's gonna return some faces and we can look at what's gonna be shown on the screen. So if I go back here, so don't worry about the pink border. So if I enable the webcam, start, I forgot to actually import from the utils, so apologies for that.

If you have picked up on that before, then it should work for you. But we are importing startWebcam and take picture, cuz that's the two functions from the details that we're gonna need. So now, if I refresh, that should work. So if I click on Webcam, cool, and then I'm going to capture.

Okay, so [LAUGH] not a good capture. Anyway, so this is resized, because in general, I mean, I could look at this. But I think the model might have been trained with pictures that were more like square, so you just resize it. In the UI, it doesn't look great, but the model doesn't really care if the proportions are not right.

Then if I predict, yes, okay, it took a little bit of time. But what we get back here is an array of faces and it has recognized only one face. I haven't actually tried it with more people, but if you are building things with more people, It would be kind of fun to see if it detects multiple people.

But what you get inside here in the output of the model, only if I can show, so you have a box here that is basically the coordinates of the face on the screen. So if you want to draw a box that we're gonna do right after, it basically tells you that the height and width and the xMax in terms of x and y coordinates of where your face is in the picture taken.

And then in terms of key points, you also have access to different key points. So here in the following part of the exercise, I'm just gonna draw a box that if you wanted to put things specifically where the rightEye or where the leftEye is, then you also have access to the x and y coordinates of this as well.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now