String Length & Null Termination

zed.dev

Lesson Description

The "String Length & Null Termination" Lesson is part of the full, C Fundamentals course featured in this preview video. Here's what you'd learn in this lesson:

Richard creates a valid HTTP response message that contains the version of HTTP, status code, and message. A byte array containing the response message is created, and the strlen method calculates the length. Null terminators are also discussed in this lesson.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "String Length & Null Termination" Lesson

[00:00:00]
>> Richard Feldman: Which brings us to actual HTTP responses. So now that we learned all the fundations of what we need to build an HTTP response, let's actually build one. So let's say that I wanna hit https://frontendmasters.com. What's going to come back from that server, the server is going to send back a response that begins with a string HTTP/1.1 200 OK.

[00:00:22]
I'm assuming that we're using the HTTP/1.1 protocol, which is extremely widely supported. HTTP/2 is much fancier. We are not gonna support HTTP/2 in our static web server, number one, because we don't need to. And number two, because it's way more complicated and would take way longer to explain.

[00:00:36]
So we're just gonna stick with classic HTTP/1.1. The next thing it's gonna do is it's gonna have a blank line. And then it's gonna say doctype html, blah, blah, blah, basically the entire context of the index side html there. This is literally what comes back from the server.

[00:00:49]
It's this string, new line, this string. It's all one gigantic string, that's it. It's not JSON or anything like that, it's just this format. If we were to represent this header right here, HTTP/1.1 200 OK in bytes in memory, again, it would be like, okay, well, H is 72 in ASCII, which means it's also in UTF-8, which is what web servers use.

[00:01:09]
T is 84, P is 80, blah, blah, blah, blah, blah. These correspond to 1s and 0s. And then there's gonna be an extra 0 at the end here. Now, when I say the extra 0 at the end, what I mean is that it's the byte 0. It's not one of these 0s, cuz these 0s are actually 48, 0 in ASCII/UTF-8 is the number 48.

[00:01:27]
But what I mean is the actual byte is 0. This is what's called a null terminator. And basically, what it means, is that it's what C strings use to mark the end of a string. So I can just give that memory address to whatever function I want. And if that function wants to know, conceptually, where does that string end, it can just keep reading, reading, reading, reading, reading.

[00:01:46]
And then as soon as it hits the 0, it knows, I found the end of the string, that's where it's supposed to be. And it's a way that you can have your functions work just in terms of the memory address here, without needing to also separately pass around the length of the string, like write took the explicit length as a separate argument.

[00:02:01]
But it's a way that the function can sort of figure that out for itself. If we wanted to name this and give it to a variable, this is the type that we would get. [LAUGH] It's short for character. Some people call it car, because it's C-H-A-R, looks like it might be pronounced car.

[00:02:17]
Some people will call it char. Character is like, maybe you could call it care, but I don't hear that as much. I usually hear either char or car. I guess I've heard care before, but I personally always say char star, because it rhymes. But whatever, you can pronounce this however you want.

[00:02:34]
But char basically means byte. For historical reasons, 1972, they called it char for character, even though in Unicode terms it doesn't actually necessarily correspond to a character. And then the star there basically means the memory address. So this thing right here, again, it's 1972 design decisions, they decided to make the star be next to the word rather than next to the thing.

[00:02:56]
This is actually the type of header, is like the char and the star both are part of the type of header. And what this is saying is basically, this is the address of a byte, or you might hear people say the pointer to a byte. So char means byte, basically, and the star means it's not that header itself is a single byte, but rather that header is the memory address of the this initial byte right here.

[00:03:21]
And yeah, basically, so in memory, if you were to look at the 1s and 0s in memory, this right here would actually be an integer. It would be the actual memory address of the beginning of this string, because that's what like char star means. It means the address of a particular byte, namely the first byte of this string.

[00:03:37]
So in memory, that's what we would see right here. So when I call write, passing one and this header, if I wanted to do it this way, like previously, we had the Hello World, we had the actual string here. But if I wanted to just pass in this variable instead, what would actually be happening at runtime is that this integer is gonna be passed in here cuz write takes three integers.

[00:03:59]
If I didn't wanna hard code the 15 here, which is a totally reasonable thing, I can call the strlen function. That will basically go, it takes the header, which is, again, an integer, and it walks through the array just boop, boop, boop, boop, boop, until it finds one of these 0 bytes and it's like, that's the end.

[00:04:13]
However many we pass along the way, that's the length of the string. Now, obviously, that's pretty inefficient, walking through the entire every single byte in the string just to count up the total number of bytes. But basically, that is essentially what strlen is doing. Fortunately, optimizations will mean that if you're calling strlen on a hard coded thing like this, where this, at compile time, is gonna get translated to this number.

[00:04:37]
And then the optimizer can say, well, wait a minute, I know that this is just working on a hardcoded memory address. I'm just gonna optimize that away for you. You're not actually gonna pay a runtime cost for this. But if you're doing this based on something dynamic that's coming in from the user and not hardcoded, then you actually are paying for walking through the entire string at runtime.

[00:04:55]
Another way you can see this happen is, instead of saying char star, you can say this basically is the symbol for an array. This is another thing you can do, is basically define it to be this way. And this allows you to actually literally put the address there if you want.

[00:05:12]
Not saying you should do it that way. Probably the way that you should do it is with the char * style. We're gonna talk a little bit more about arrays later. But basically, just sort of be aware, if you do see this square bracket syntax, it essentially means this is an array of unknown length, which works out at runtime to be exactly the same thing.

[00:05:28]
It's an integer representing the memory address of the beginning of that array. So both of these two are essentially equivalent. Okay, and yeah, the memory addresses of these, once again, will just sort of increase monotonically, like that. Okay, so, yeah, yeah. So the 0 at the end here, the 0 byte, this is what's referred to as a NULL-terminated string.

[00:05:53]
So NULL meaning 0 in this case, not to be confused with null in JavaScript sense, which is its own semantic thing. Here NULL just means 0 byte, as opposed to 0 character. So NULL-terminated just means this is a string that has a 0 byte at the very end of it.

[00:06:10]
And yeah, you might be saying, isn't strlen incredibly slow for something like C? Well, yes, so why don't you destroy the length up front? Then you wouldn't have to deal with this fancy like, well, performance optimization only works if it's known at compile time. At runtime, if we got this value and the string came in from a user, then we'd have to march over the whole thing.

[00:06:31]
Yeah, it would be more efficient in terms of getting the length of a string to store the length up front. But again, you've got to remember, 1972 design decisions, right? When you're this limited on memory, it can potentially make sense to say, well, instead of storing a 32 or 64 bit length with every single string that you're passing around, maybe we wanna just store one byte.

[00:06:50]
And yeah, it's gonna take longer to traverse the string to ask about its length is. But at least you are reducing your overall memory usage compared to passing around the length every single time, or storing it alongside the string. So it's sort of like pre-computing the length. Having said that, I also have heard people say that even in 1972 this was the wrong trade off to make, but I wasn't there, I wasn't even born yet, so who am I to say?

[00:07:14]
But that's kind of the reasoning for this. When you encounter C APIs in the wild, even the node, JSC API, it's still doing stuff like expecting NULL-terminated strings and stuff. So even though this design decision hasn't really aged very well in terms of what's performant in a good trade off in terms of CPU cycles and memory.

[00:07:32]
It's still like a whole lot of C APIs will use NULL-terminated strings and say, you gotta give me a NULL-terminated string right here. Okay, yes, but at least in this case, where we're using static hard coded thing, at least it will get optimized away. So that will just become a 15.

[00:07:48]
Okay, so to sort of like flesh out the rest of this example, the import unistd.h for the right, we also need to include string.h if we wanna call strlen, because that's not available unless you import that library. And finally, we can basically say, okay, here's our int main, and then return 0.

[00:08:06]
And now, instead of Hello World, we basically have HTTP/1.1 200 OK. If we wanna do a little bit fancier and use the printf formatting here, we can do %s, which we talked about a little bit in the previous section, instead of %d. Worth noting though that, again, remember, this is a number, right?

[00:08:25]
At runtime, this right here is the memory address of the beginning of this string. Which means that %s is just going to be like, okay, I assume that this number that you're giving me is the memory address of the beginning of that string. How does it know where the end of the string is?

[00:08:40]
Again, it expects it to be NULL-terminated. So what printf is gonna do is it's gonna say, okay, %s means literally, take this memory address, start traversing it, and start sending the bytes one at a time to standard out until I get to a 0 byte, at which point I'm done.

[00:08:52]
And that's the end of that string, and that's what %s does in printf. As we saw earlier, write doesn't do that. So this is one of the conveniences that printf does for you, being a wrapper around write, is it the %s will do that NULL-terminated aware byte traversal for you.

[00:09:08]
And then, yeah, if you wanna print it as a 32-bit integer, you can do that, as we saw the exercises, but that's just gonna show you what is the actual memory address, or at least the first 32-bits of it. If you wanna print the entire memory address, I don't know why, I'm sure there's some reason that it's zud instead of d to print it as a 64-bit integer.

[00:09:28]
But this being a 64-bit laptop, and pretty much every machine that everyone in this course is gonna use is gonna be on a 64-bit machine, this is gonna be a 64-bit integer. So if you wanna literally see what is the actual runtime, the actual memory address that's being sent to printf, you can do that.

[00:09:46]
And if you do it like this, it's gonna be like, cool, here it is, here is that integer. Again, memory is basically a gigantic array of bytes. And if you want, C will let you see what is the actual index into that memory at runtime. Cool, all right, so, summary of all the things we talked about.

[00:10:03]
We talked about numbers in memory, like the the actual 0s and 1s and how we can interpret them ourselves. The critical takeaway there was that we get to decide how memory is interpreted. As we saw in that very last example there, you can decide that the giant array of bytes in memory is 1s and 0s.

[00:10:20]
You can decide that it's integers that are 8-bits, 16-bits, 32-bits, 64-bits. Or you can decide that there's strings that could be encoded as ASCII, UTF-8. It's all up to us as the programmer to decide what those 1s and 0s mean. And if we make mistakes and one programmer thinks they mean this thing and the other one thinks they mean this thing, at least in the C world, we're gonna get some unhappy results.

[00:10:40]
But it's not gonna give us a nice error message, that's for sure, it's just gonna interpret them incorrectly. We talked about byte arrays, so we can have just back-to-back bytes, and then we can index into those, just like the global array of memory. Talked about null-terminated strings, where the 0 byte at the end of the string, not to be confused with 0 the character, which is not a 0 byte.

[00:10:58]
But 0 terminated strings are essentially a way to be able to just pass this memory address to something and have it actually without any other information, nowhere the string ends. And then finally how getting a string's length is a little bit inefficient. Maybe a design decision that hasn't aged well since 1972, but if it's a hard coded string, it will get inlined.

[00:11:19]
And if worse comes to worse, you can also just explicitly pass around the length separately. Then some functions, such as write will say, I'm not gonna traverse and go look for the null-terminated. You just give me the length and I'll just assume that's how many you want.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial