C Fundamentals

Open Socket & Listen for Connections

C Fundamentals

Lesson Description

The "Open Socket & Listen for Connections" Lesson is part of the full, C Fundamentals course featured in this preview video. Here's what you'd learn in this lesson:

Richard discusses how to open a socket with C. The socket is bound to a port by passing the file descriptor, address structure, and size of the address. Once a socket is open, the program can listen for connections with the listen function, specifying the number of requests to queue before an error is thrown.

Preview
Close

Transcript from the "Open Socket & Listen for Connections" Lesson

[00:00:00]
>> Richard Feldman: So we've just done some file I/O, and now the final thing that we need for our web server to possibly be able to work is we need to be able to do network I/O. In our case, we're only gonna be going over localhost, but the same basic concepts apply.

[00:00:12]
Okay, so to do network I/O in C with no third party libraries, we're gonna first open a socket, then we're gonna listen for connections on that socket. We're gonna read from the socket, and then, finally, write to a socket. So all of these have different interesting considerations around things that we haven't seen so far in C.

[00:00:27]
Okay, the first thing that I should mention is that LIBC, which is the C standard library, loves file descriptors. [LAUGH] We've seen them be used for files. But let me tell you, if C can solve a problem with a file descriptor, it is very happy to do that.

[00:00:40]
When we saw the write function earlier on, that was a file descriptor for a standard out as file descriptor. We use file descriptors for files, and we also use file descriptors for sockets. This is one of the cases where Windows is a little bit different. Windows uses a very similar API, but it actually doesn't use literal file descriptors for its sockets.

[00:00:58]
It uses a thing that looks and works almost exactly like a file descriptor, except that it's technically a different type. And they kind of have their own integer space for those. But basically, for our purposes, for focusing on the UNIX/POSIX flavor of C. Basically, when we call this function called socket, which, these arguments we'll get into this in a sec.

[00:01:19]
It gives us back a socket file descriptor. And this file descriptor is just an integer, just like the file descriptors we were using for open and fstat and read and all that. This first constant here is basically just saying, hey, I wanna do IPv4 socket. So you actually have to open a different socket type if you wanna handle IPv4 addresses versus IPv6 IP addresses.

[00:01:43]
We're only gonna deal with IPv4 for this web server. But if you did want to handle IPv6, this would be the place to start, is to give a different constant there. Basically, we're saying TCP instead of UDP. So UDP is a protocol that's commonly used for game servers, certain really high performance networking things.

[00:02:00]
But it has a downside of, if you send the packets out, you don't get a response. You don't know if they made it to their destination or not, you're just kind of spamming them into the void. Which, for games, is often actually what you want. But for our use case, we definitely want TCP to talk to a browser.

[00:02:13]
And then finally, there's a protocol number, we're not really gonna get into what that means. Also, there's kind of some more boilerplate that we need to do here. This is basically just an interesting demonstration, well, A, of some C boilerplate, this is kind of what boilerplate in C looks like.

[00:02:28]
But B, it's also a slightly different C API pattern that we haven't quite seen before, so we're just gonna talk through it very briefly. So we're calling this function called setsockopt, which is basically, I'm gonna configure this socket that we've opened. So the first argument is the socket file descriptor of the socket that we want to work with.

[00:02:45]
And we're gonna talk about these constants. This is the interesting part here, this &opt. So basically, what this &opt is doing is that, remember, we talked earlier, opt is an integer. But anything that you want in C, if you wanna get the address of this variable, you can just always say &.

[00:03:03]
And so what this is doing is it's saying, I wanna pass in the address of this integer, and then setsockopt is actually going to mutate that integer. So this is the same kind of pattern as what we saw earlier with memcopy. Where we were like, hey, I'm gonna give you the address of these bytes, I want you to stomp all over them.

[00:03:18]
Or what we saw later on with fsat, where we're like, hey, here's this struct. I'm gonna give you the address, I want you to stomp all over it. Now we're doing that same thing, but we're doing it with an individual integer. So there's really no limit to how much C [LAUGH] libraries will use this pattern.

[00:03:30]
Once again, this is C standard library, ships with the operating system. And it's like, yeah, give me the address of this integer, I'm going to stomp all over it when you call me, that's how these things work. Or, sorry, it potentially will stomp all over it, it doesn't necessarily have to.

[00:03:44]
And then finally, sizeof(opt), because that actually wants to know how big of a thing is in there, for reasons we'll discuss later. So why do we need to do this setsockopt thing? So if you want, you can omit this. And I considered leaving it out of the course, but I decided to put it in for two reasons.

[00:03:58]
One is, it's an example of boilerplate, it's an example of this sort of unusual pattern here. But two, it's kind of an ergonomics thing if you're just trying to run the web server locally. If you don't do this, I don't actually know the exact reason for this, I haven't dug into it all the way.

[00:04:12]
But the symptom that you see is that when you're stopping and restarting the server a lot, which I was doing as I was trying it out while building the course. Sometimes you get an address in use error, even when the server isn't running. So it's like, hey, port 8080?

[00:04:24]
That's in use, I'm like, no, it's not, I stopped the server, what are you talking about? Then you just wait for a couple seconds and run it again, it's like, yeah, it's not in use anymore. So there's something about the address can sometimes kind of linger, unless you set this thing which says reuse address.

[00:04:38]
Which I guess means, I don't know, don't keep it in use longer than my process. I'm sure that someone could explain the design decisions that led this to not be the default behavior. But at any rate, this is some boilerplate that you need to do [LAUGH] if you want to get it so that when you stop the local server and restart it, as long as you're not actually using that port for something else, you don't get an error.

[00:04:59]
Okay, so we talked about this, and [COUGH] basically, yeah, so I forgot about this. Opt is an integer, but it's always gonna be either a 1 or a 0. And then yeah, sizeof returns, basically, how many bytes are in whatever variable you give it. So this is statically known, so because I gave it an opt, this is not actually passing in the number 1 here.

[00:05:23]
This is actually a language keyword, and what goes inside the parentheses is the name of a variable. And then what sizeof returns is how many bytes in memory does that variable take up. So this is kind of like typeof in JavaScript, it's a fancy separate thing, it's not actually looking at the value itself.

[00:05:41]
Okay, yeah, so this is just kind of a weird API for how they decided to do that. [COUGH] Okay, moving on. So the next thing we're gonna specify is essentially, what is the actual port that we want to bind on, so when you start up the local server.

[00:06:00]
I chose localhost:8080, you can choose whatever you want. Some of them use like 3000 or 1337 or 8000 or something like that, just went with 8080, it's one of the common ones. Basically, we're making this struct. Once again, we are initially declaring this struct as just empty plain old bytes here.

[00:06:19]
So this is all initialized memory, and then we're just gonna go through and initialize that. We're gonna say address.sin_family = AF_INET, this is IPv4 again. [COUGH] Then, within this thing, we're setting the s_addr. Worth noting that this thing right here is another struct that's uninitialized, we're only setting one field in that struct.

[00:06:38]
Again, [LAUGH] this is C, sometimes the APIs are kind of strange. And this is basically saying, I don't care, I just want any address, which means the other fields are just gonna be ignored. So we don't even bother to initialize them. And then finally we have the port.

[00:06:50]
And this is the actual interesting part where we're saying, okay, this is port 8080. And then that's htons, does some sort of conversion to convert this from the integer 8080 into whatever the type is that sin_port wants. Okay, so again, none of this is necessarily anything that we haven't seen before, other than initializing a nested struct [LAUGH].

[00:07:11]
[LAUGH] This memory is just all total memory garbage. But this is pretty much just boilerplate to set up the socket with the parameters we want. Next thing we do is we call bind. So now that we've essentially said, okay, open the socket, we got our socket file descriptor.

[00:07:26]
We called setsockopt to configure that socket to don't say address is in use spuriously. We made this address, we set it up, and then finally, we're calling bind, passing on the socket file descriptor and then the address. So this is actually gonna say, okay, now port 8080 is sort of bound to that socket.

[00:07:48]
So the socket before we did this bind is essentially the sort of communications channel. We're not gonna get too deep into how the operating system deals with sockets. But the basic idea is just that until we've done this bind operation right here, this is just sort of a communications channel we can send data to and receive data from using manual other read and write type stuff.

[00:08:09]
But when we do bind, now we're actually connecting this to the actual port 8080. And now we're saying, okay, whenever we receive data on port 8080, I want it to go to this socket. And then we have that same pattern before, we have the ampersand of the address.

[00:08:25]
So it's giving him the address of the address, or, I guess, a pointer to the address, [LAUGH] I don't know, the word address makes it hard to say this out loud. But the ampersand, as usual, is like, give me the memory address of this thing. We do need to actually cast this in this case.

[00:08:38]
Because, yeah, of some weird API decisions that they made, where, basically, bind will accept one of several different structs. And you're supposed to make whichever struct is appropriate for your use case, whether it's IPv4 or IPv6. But whichever one you choose, they do have a couple of fields in common.

[00:08:56]
And so bind wants you to essentially say, hey, tell me about the actual subset of the fields in this struct that I care about. And the rest you can just store however you want. And then sizeof(address) is basically saying, bind also wants to know, okay, what's the full size of the struct that you're passing me, even though I'm only gonna look at a subset.

[00:09:14]
So again, a little bit of a strange API there, but nothing that we haven't seen before. It's just might feel a little bit strange and kind of unergonomic. Yeah, so this is just the the casting to say, okay, we're treating this memory address as a sockaddr address, rather than a sockaddr_in address.

[00:09:31]
As opposed to sockaddr, I think it's in6 is the IPv6 version of that, in6, that's what it is, right? So basically, what this is doing is saying, okay, we have the first two fields of sockaddr_in, which is sa_family_t, and uint16_t for the port. Same kind of basic thing right here, in6 has these same two first fields.

[00:09:51]
And sockaddr, the sort of generic struct that is like, I'm sort of compatible with either of these. The first field is the same, so this is the subset that we care about. The sa_family refers to either the sin6 family or the sin family, depending on it's IPv6 or IPv4.

[00:10:08]
And then this thing right here is essentially just [COUGH] covering the bytes for multiple fields. So it's basically just sort of like a padding, something like that. And so, whereas these both have 2 bytes, and this is just, yeah, there's some more stuff going on here. So because this is the address of an array, this is gonna be actual 8 bytes instead of these 2 bytes.

[00:10:31]
So yeah, this is covering those 2 bytes and then some, and so reserving space for, I guess, potential future things that might be in these structs. Okay, so again, we don't need to spend a ton of time on this, just if you ever see this pattern, you're wondering what's going on here, why it's asking me to cast this?

[00:10:48]
The reason is that it wants you to say, okay, I know that you locally are gonna have one of these two addresses and I wanna just get a subset of these things. To be honest, I'm not totally sure why they didn't just say like give me the sa_family_t, since that's the only thing it's actually using.

[00:11:00]
Maybe it reads this data for some reason, I'd be kind of surprised about that, but maybe it is. But at any rate, this is how you sort of deal with that sort of [LAUGH] strange API that they've got here. And this cast is basically sort of confirming that I'm saying, look, I know that this memory address is addressing memory that's got this shape to it.

[00:11:23]
But trust me, I know what I'm doing, I want you to [COUGH] treat it as a memory address of one of these. And that's gonna be fine, it's gonna work out, trust me. This is sort of a general theme of casting in C, when you do this sort of parentheses in front of the thing, is you're basically saying, I know that there's these 1's and 0's here.

[00:11:40]
I know that you think, based on what I declare them as, that they should be treated as these types of 1's and 0's. But trust me, it's gonna be fine if you treat them as meaning something else. The same 1's and 0's are meaning something else. It's gonna be okay, even though it looks like it's gonna be a mistake.

[00:11:54]
I am promising that it's not gonna be a mistake, just trust me and go ahead and use them in that way. Okay, and then, yeah, right, so then the last thing here is basically saying, okay, well, I don't know how many actual bytes I giving you because I'm casting it to something else.

[00:12:08]
So bind wants to know how many actual bytes does this address point to. How far can I go if I want to read into this thing? Which I assume is so that, because this thing covers multiple bytes, is it wants to know, well, did you give me something that was too short, and I need to not even attempt to read this whole thing.

[00:12:25]
Because then I would be reading off the end of whatever you gave me, and then reading into memory that I'm not responsible for. Okay, so that's bind [LAUGH]. So now we've our socket set up. We've bounded to port 8080 using this kind of strange API. And next, we're gonna call listen.

[00:12:43]
So listen basically takes one argument that we care about, which is basically how many different connections can we sort of stack up before we have to start refusing to accept new ones. So the way that you can imagine this is, I've said, okay, I've got my socket, I am by binding it to port 8080.

[00:12:59]
And whenever something comes in on port 8080, of course I'm gonna run some code that's gonna be like, okay, let me read that data and handle it. And that takes some time, and I have to actually wait for running the processing on the data that's coming in. But what happens if in the operating system I'm getting stuff in, it's like, okay, here's some more.

[00:13:17]
Hey, I'm processing, just chill out, I'm processing this, it's like more requests is coming in, they're starting to stack up. What this number says is how many requests are allowed to stack up before they start erroring out. And whoever's sending the request, whether it's the browser or whoever else, just starts getting an error saying, like, yeah, I'm sorry, the server is too busy right now.

[00:13:34]
Now you might say just, I don't know, why would you ever want that to error out? And the answer is, well, the higher you make this number, the more memory it's gonna use. So this is just like it's reserving space for, yeah, stack up a bunch of these.

[00:13:44]
So if you set this to a billion, then it's like, yeah, no problem, we can have 4 gigabytes worth of requests stacking up. It's like, well, we're never gonna get through them all in a timely fashion. It's just gonna get worse and worse and worse over time, we're gonna get further and further behind.

[00:13:56]
So you wanna pick some number here that is based on what kind of traffic patterns you anticipate and how fast you think you can handle the requests, that sort of thing. And finally, once we've done all that stuff, and remember [LAUGH] how many slides worth of code we are omitting with these dot, dot dots.

[00:14:13]
Now we can finally print, hey, we're listening on localhost port 8080. And now we can actually start to get to the interesting part, which is processing these things.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now