This course has been updated! We now recommend you take the Complete Intro to Real-Time course.
Transcript from the "RTCPeerConnection, Signaling, and DataChannels" Lesson
[00:00:00]
>> [MUSIC]
[00:00:04]
>> Kyle Simpson: It's not as interesting if all we can do is have that stream on our self. It really starts to get interesting when we can share that stream with others. And so the next piece of Web RTC is what we call RTCPeerConnection. This is an object that you instantiate inside of your browser.
[00:00:20] And in somebody else's browser, they also instantiate an object, and these objects have a special behavior that they're able to create a connection between each other. And that's what allows this peer-to-peer connection between browsers. Now what's we have the RTCPeerConnection established, which I'll explain how that it gets established in a moment.
[00:00:39] But once it's been established, we very simply just attach the video and audio stream to the object. We literally just say attach a video stream. And magically those two objects figure out how to transmit that video in real time between each other. So I can be seeing your video and you can be seeing mine and that's how we have real time conferencing in the web.
[00:01:00] Totally peer to peer, no middleman doesn't cost anything whatsoever, same thing with audio. [COUGH] So how do we establish this magical peer connection objects, how do we create those? More high level architecture diagrams. We start out [COUGH] we start out by having this webcam access and this audio access, we get those streams.
[00:01:21] We can optionally add them to a video or an audio element as we've already talked about but we don't even have to do that because really we wanna attach the audio and video streams to the RTCPeerConnection object. And then in somebody else's browser on the other side of the Internet they have an RTCPeerConnection object that doesn't yet have an audio or video stream.
[00:01:42] But as soon as you connect to them they magically have an audio and video stream that shows up and they on that side can take those stream objects and attach them directly to videos or audio elements and do all the same things in terms of writing them to canvas and saving them out.
[00:01:59] And the magic happens inside of the cloud. So let's talk about what's that magical cloud like. That's everybody's favorite part of any diagram is when you put the little cloud up there. Everything happens inside of the cloud, all right? So what's gonna need to happen, is this process that we call signaling.
[00:02:15] So my best metaphor for this is before you are married. If you're not married and some friend of yours wants to set you up on a blind date. So they ask you and they ask the other person. They say would you like to show up at xyz, Olive garden or whatever restaurant at 7 PM on Tuesday night?
[00:02:35] So they ask you and you say yes. Then they go to the other person and say, would you like to show up at Olive Garden at 7 PM, Tuesday night? Yes, okay, both of you need to show up there. Now your friend, if there are any kind of a good friend at all, they're not gonna actually show up on the blind date with you, right?
[00:02:51] All they did was do the scheduling, they did the signaling between the two of you, they were acting as a go between as a middleman. But they don't show up on the date, it's just the two of you that show up on the day and that's much like how Web RTC gets established.
[00:03:06] We need a signaling server. We need a server that is able to know about both you and me so that our laptops are our web clients, our browsers can be introduced. And then once they've been introduced our browsers establish a direct connection and we cut the middleman out.
[00:03:26] There are some experiments into what would the web architecture look like for what we call a server lists web RTC. In other other words without requiring any kind of signaling how could two browsers discover each other out on the open Internet? And there are some very sophisticated protocols that people are experimenting with.
[00:03:45] But right now what we know about web RTC is that it requires an introduction through this signaling process. So I'm gonna walk you through real quickly how that signaling process happens but then don't worry cuz it's gonna feel a little complex but don't worry because you don't actually have to do most of the work.
[00:04:01] There's just a little bit that you have to do, okay? The first thing that you do have to do it is really important but it's actually really easy for you to do. Is that you need to distinguish two different browsers that are connected to your application like you have a child application or whatever, you need to distinguish one of them is the caller and one of them is the receiver.
[00:04:25] And there's no requirement how you make that decision. But they do different things in a different order so it's important that you distinguish between caller and receiver. The most likely thing is if it's a game, the first person into the game is the caller and the second person in the game is the receiver.
[00:04:43] Or if it's a chat application, whoever does the double-clicking on the other person's name is the caller, and the person that got clicked on is the receiver, or whatever. It's totally up to you in your own business logic how you distinguish, but you have to pick a caller and a receiver.
[00:04:58] And you need to indicate to each of them which role they're gonna play in the signaling process. Now you need a server, and by the way almost always people implement, not always but almost always people implement the signaling server, the communications here through web sockets. So it's a nice addition on to what we've already been talking about.
[00:05:21] We already know how web sockets work so now we can send these signaling messages in very fast and efficient manner to a signaling server using web socket technology. So what's gonna happen is, the caller is going to create what's called an offer. Creates an offer SDP, it's just this packet of information that indicates who he is and what kind of browser he is?
[00:05:43] And what sort of video codecs he works on? What his network looks like that sort of thing? And he sends that information through the signalling server so he sends it as a web socket message to the signalling server who then doesn't care whatsoever what the messages. He completely ignores the type of message and then he turns around and sends it right back out.
[00:06:04] Yes.
>> Speaker 2: This may be jumping ahead a little bit but is asking will this work with multiple users or how would it work?
>> Kyle Simpson: Web RTC is fundamentally a peer to peer technology and that you can create a daisy chain peer to peer, peer to peer, peer to peer create a circular thing.
[00:06:24] You can have everybody connected to everybody else. In reality, what we've seen in terms of testing is that most of the time you need to employ a more sophisticated network topology if you want to have lots and lots of people peer connected. Most of the time products like Skype and those other VoIP communications that do these peer to peer connections.
[00:06:49] They have built into them these sophisticated algorithms where they elect broadcast peers. So you may not know that your system is acting as a broadcast peer but rather than everybody being connected to everybody else. Everybody connects to you and your system does the broadcasting to everybody else. So those sorts of sophisticated things are usually required when you're gonna have any more than three or four people in a peer to peer connection.
[00:07:15] Because the amount of connections that are created and the amount of network traffic that's happening just takes down a network whenever you have that. So when you go beyond three or four connections, you are gonna need a more sophisticated management mechanism for it. But they're working on that sort of thing already.
[00:07:33]
>> Speaker 2: So like electing to be the host of a multiplayer game [CROSSTALK].
>> Kyle Simpson: Yes, it's exactly the same analog, exactly.
>> Speaker 2: Sweet.
>> Kyle Simpson: [COUGH] All right so we send the offer SDP over [COUGH] the signalling server just as a dumb relay client. He literally just sends it on to the receiver, the receiver gets that offer SDP and sets as it is a local thing and then there is as remote SDP and then he creates an answer SDP, which has all of his information in it.
[00:08:01] What kind of browser he has? What sort of video and audio codecs he has available to him? What his network looks like? And he sends that back, which just gets dumb relayed back to the caller. So they've now done a handshake where they've exchanged business cards and each one knows about the other one.
[00:08:17] And there may be some further negotiations that they have to make but for the most part they now know everything they need to know about each other in terms of which compression protocols to use and what encryption keys to use and all of that stuff. It's all completely handled entirely inside of the SDP.
[00:08:33] And for the most part you don't need to worry about any of those details. There's another thing that you should be wondering if you know anything at all about networks. What happens if the caller is behind a fire wall and the receivers behind a totally different firewall. How on earth could they possibly establish a direct network connection?
[00:08:51] While there's a solution built into this, it's a fantastically complex solution but there is a solution built into Web RTC which for the most part has been successful at creating direct connections even behind firewalls. There are some firewalls out there which are just so secure. They're never gonna get a direct connection and so we end up doing these media relays.
[00:09:11] But for the most part what you end up doing is this thing called, there's a turn server, and a stun server. And the stun servers the one that gets used first. What it attempts to do on the color side is poke a hole in the firewall nodding and establish an outbound connection to a public stuns server.
[00:09:33] There are currently two out there right now. One is run by Mozilla alone one is run by Google. There will be lots more, they'll be like the route DNS servers eventually but currently there are two out there. So your client is automatically configured to punch a hole out to that public stunned server, make some sort of connection with it and then figure out everything he needs to know about his Internet location and his network routing and all of that stuff.
[00:09:57] He figures all of that out and that's part of what he sends along as part of his offer SDP. And the receiver does the exact same thing, you context a stun server figures out all his network and sends it along. Sometimes depending upon your network conditions you might need to do several rounds of negotiation in terms of how you communicate over the network.
[00:10:19] And that is handled by this third Item, which is called ICE. These ICE candidates are additional pieces of information that are discovered about the network protocols that are involved in communicating directly. And so your signaling server also needs to just opaquely just pass those messages on through. Now that sounds like a fantastic amount of complexity.
[00:10:42] You don't have to do hardly any of it. You just have to spin up the objects and make sure the messages go through. And once you've established that connection, now the audio and video streams will go automatically they'll just go directly peer to peer. But that would not be interesting if that was all web RTC had to do because there's just only so many video and audio conferencing solutions that we gonna be able to build.
[00:11:09] What's really interesting, what's really gonna push the envelope is peer to peer DataChannel. In the same way that we can send arbitrary bits of data over web sockets, we will be able to establish an arbitrary two way data channel between two peers and they can arbitrarily transmit information to each other.
[00:11:27] Like files for instance or moves in your game if you're playing your games. There's all thousands and upon millions of different ideas that we might be able to use with DataChannels. But how do we use a DataChannel peer to peer. This is the most recently evolving part of the spec but it's actually working already now.
[00:11:46] So the way audio and video streams work is their single flex. There just one direction. So I can be listening to your audio and video. You don't necessarily have my audio and video although you can. But with data channels they're built in duplex, they're built in encrypted and they're built in compressed.
[00:12:03] So all that's already taken care of, you don't need to do anything about that. You just establish a data channel and the other guy hears about the data channel through the RTCPeerConnection object and bam they're able to transfer their data. Now here's some code that should look really crazy and ugly and scary to you but it's actually not all that bad at all because most of this ugliness has to do with all the vendor prefixing going on.
[00:12:28] Vendor prefixes suck, this is the sort of thing that needs to be hidden behind facade so you don't need to worry about it. But I just wanted to show you it's as simple as create a peer connection object, create a ICE candidate, create a session description. And then hear down here is the most important part.
[00:12:43] All you need to do is be able to signal to a web socket server, you just need to do init and on. We already learned how to do web socket communication so that should look incredibly simple to you.