A Tour of Web Capabilities

Shape & Face Detection

Maximiliano Firtman

Maximiliano Firtman

Independent Consultant
A Tour of Web Capabilities

Check out a free preview of the full A Tour of Web Capabilities course

The "Shape & Face Detection" Lesson is part of the full, A Tour of Web Capabilities course featured in this preview video. Here's what you'd learn in this lesson:

Max demonstrates the Shape Detection API, which provides methods for detecting barcodes, text, or faces using a device's camera. The Advanced Control Camera API gives the application control over pan, tilt, and zoom and can be used to adjust the input stream when live streaming or taking photos.


Transcript from the "Shape & Face Detection" Lesson

>> Okay, so we've seen then how to use the Web Speech API for both recognizing what the user is saying and to let the web app speak. So now it's time to talk a little bit about the camera and visual. And we're going to talk about shape detection.

Shape Detection API, it's actually today, let's say, between yellow and red. One of the parts is light green, it's Chromium-based only. The idea is to actually detect things with the camera without actually opening the camera and analyzing the image by ourselves. So for example, we have a BarcodeDetector.

It works with 2D barcodes or QR codes, Chrome on Android only, if you wanna try this. It's actually pretty simple, three lines, and you have a QR code reader. It's not just barcode, we have FaceDetector. That's the API that I was mentioning that you have today. That will give you something, in this case, the object that you receive.

Will give you coordinates within the image of the face as a rectangle. And each part, the nose, the mouth, and each eye. And you will get x and y, width, and height for where the eye is. That's all, okay, that's the face detector. And also we have a TextDetector that is basically OCR.

So it takes text that you have in an image, and it will give you the text representation as a string. It's actually pretty simple to use, the problem is that it's not on iOS or desktop, it's just on Android. And how do we know which part is available or not?

Well, the API is called Shape Detection. So let's, for example, go to Chrome status, And look for the feature, Shape. And here we have the Face Detection API, the Text Detection API, and then Barcode. If you go to Barcode, it says, enable by default. So that is shipped, so it's ready to use.

Text Detection, in developer trial, so we need to enable the flag. And Face Detection also in developer trial. So in this case, if you wanna use them, it's pretty cool. Cuz I use that for create a quick web app for a kid. It's actually a kid that is not reading yet that can point to a text, and using a speech synthesis, it can read aloud what's in there, it's whoa.

And it's just five, six lines of JavaScript, that's all you need. But in this case, it's a feature that you need to enable, okay, makes sense? So it's not going to work for the rest. For the rest, I mean, there are a lot of open source libraries that can open a camera.

Or take an image and do the analysis for a QR code reading, face detection, or OCR. But it will take more CPU, and you will need to use a library, okay? So we're here have an example, that the one that reads the text, think. Look at that PAN AM, so hopefully, That's Google Flights a few decades ago, [LAUGH] and it will give you the text immediately, almost, on Android.

So Media Devices got to do with the camera. I'm not going to spend too much time talking about these, because this has been for a while. There are a lot of content around how to open the camera. But Media Devices was known as a getUserMedia for a long time, because that was the name of the function, okay?

But now it's officially Media Devices. It will let you open the microphone and camera, and/or a camera, and get the stream of that. You can use the stream that is coming from the camera to, for example, feed a video element. Or you can take each frame and parse it with JavaScript and do something.

And that's all, the API ends there. The thing is, what do you wanna do with the image? If you want to take a picture, if you wanna analyze something. Something that is kind of new. It's another API that is Chromium only. It's called Advanced Control Camera, that is also known as PTZ, pan, tilt, and zoom.

That will let you manipulate pan, tilt, and zoom for a camera. So then you can adjust the input stream, you can have some controls over your page, adjust that. For example Zoom is using that, or StreamYard, these web apps that let you stream over Twitch, YouTube, Facebook Live, or Instagram.

It's useful for mixing it with UI controls, where the user can actually zoom in. It's difficult to use this from your own code, for you, as developers, to decide which value to use. Typically, the user will pick the right value.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now