Copying Memory

zed.dev

Lesson Description

The "Copying Memory" Lesson is part of the full, C Fundamentals course featured in this preview video. Here's what you'd learn in this lesson:

Richard introduces the memcpy function, which copies a block of memory from one location to another. The memcpy function could potentially introduce exploits or program segfaults since it could write into a memory space outside the URL request. Conditions could be added to ensure the length of the request memory block is longer than the file path being written into it.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Copying Memory" Lesson

[00:00:00]
>> Richard Feldman: The next thing we're gonna do is we're going to write the actual index.html followed by a null terminator after that slash. So if you look up here, we can see, okay, we want to end up with blog followed by /index.html. So we've got our slash here, but we don't have the index html and also we don't have the null terminator there.

[00:00:17]
Like as this thing continues on, it's got like the ht headers and all that stuff in there. There is no null terminator after the l and html like we want. So we need to actually copy all the bytes for index.html followed by a null byte after this slash right here.

[00:00:33]
So here's how we're gonna do that. So we're gonna call memcpy, or memcpy, but everybody pronounces it memcpy. This is another C built-in function that comes in the standard library. And I'm gonna pass in end + 1 and then index.html and 11. So let's kind of break down what this is doing.

[00:00:49]
Basically, so the end + 1, so end is essentially, through all the stuff we've been doing before. We've arranged it so that the memory address of end is this trailing slash that we now have guaranteed to be right here, the + 1 is going to say, okay, move it over one slot after that.

[00:01:08]
So now we've got the memory address of whatever comes right after the slash. This is the destination. I don't know why mem copy takes the destination first and then the source, but it does. So basically we're saying, look, start writing the bytes. Copying these bytes to right here.

[00:01:22]
And then the source thing is index.html, okay. And this is going to basically copy it, basically saying like, take all of the bytes in this memory right here and put them right here. And then 11 is how many bytes we want to copy from the source to the destination.

[00:01:41]
So very similarly to the right function, this is once again, a function that's taking three integers. First we have a 64 bit integer which is the address of the destination, and then we have a 64 bit integer which is the address of the source. And then we have how many bytes do you want to copy from the source into the destination?

[00:01:55]
Now if we look at the characters here, index, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Why are we passing 11 when there's only 10 bytes in this string? Does anyone know? Why would we pass one more byte to memcpy? And so we wanna copy one more byte than the length of this string, which is, there's 10 bytes here.

[00:02:13]
We wanna copy 11 bytes over. The null zero. Exactly, we want to copy the null terminator. So essentially, we want to write index.html, and we also want to write that null terminator over there. Now, because this is C, there actually is a null terminator in there. When I write this string letter like that, C does that for me automatically, so I don't need to put in the explicit backslash zero here.

[00:02:33]
I can just copy 11 bytes, and we know that C is going to have null terminated this for me. Anyway, there's gonna be 0 byte right after that. All I need to do is specify that I want to copy it. So this is little bit of a using the thing that we did in the first exercise for bad purposes to try to break things of passing in a length to an operation that's actually longer than the string.

[00:02:55]
But here we're actually doing it for justice because this is actually a good idea. Like we actually do want to copy that extra null byte there. And it's a little bit more efficient. If we did put a backslash zero here, it might be a little bit more explicit, but then it's a waste of a byte because then we've got two null bytes back to back when we could just copy in the null byte we've already got, okay.

[00:03:12]
So in summary, what this function call is going to do is it's going to take our original string, which now has /blog/ guaranteed whether or not there was a slash there original. And then it's going to copy in this index.html, it's just going to stomp right over the HTTP 1.1 that's right there, followed by the beginning of a header after this.

[00:03:30]
So once we've completed this operation, this original request is useless. This is no longer a valid HTTP request. We've just slammed this blog /index.html. Right into the middle of this thing. And then what we're gonna do is return the the initial address there. Of note, one thing we can do to make this like a little bit nicer is we can actually define a variable here called like default file, which is gonna be index.html.

[00:03:56]
And then we can make it a little bit clearer that we are intentionally doing one more than the length of it because we want the null terminator. And we can do the strlen default file +1 with the understanding that since this is a constant, like this is a string literal, this strlen is just gonna get optimized away anyway.

[00:04:11]
And this is all going to compile down to just an 11 there anyway. But at least if we write it this way, it's a little bit more obvious that we are intentionally getting one more byte than what's actually in the string's length. Also of note, normally if I were doing this, I wouldn't make this be a local variable like this.

[00:04:27]
I would actually just bring it up to the top and say const char star default file equals voila. Const is essentially a special keyword that you use when you're doing things like either at the top level, like. Sort of where the functions are defined. I think you can also do it in line if you want.

[00:04:44]
That basically just says yeah, this is something that I'm not able to mutate these other things I can like start and end. We can mutate them, we can change them as we go along. But this is actually saying, yeah, no, I'm just defining this once, and then it's always gonna be that memory address.

[00:04:57]
I'm never going to go in and change this. Also there is a convention of if something is a constant, you do it in all caps snake case. So also known as screaming snake case, whereas with local variables, that's the conventions to just have them be snake case. Okay, so here's our final memcpy with this constant, and this is gonna get us the index.html that we want.

[00:05:20]
Cool, okay, and then the final thing that we're gonna actually do is we're going to return, the actual, start here. But this does, so this is going to like get us back the actual like a blog /index.html that we, that we want here. Now there is a problem here, which is what if the request is too short.

[00:05:38]
This is gonna come up in the exercise. So in other words. Let's suppose I'm doing this memcopy and I'm copying into an X509 email right here. But we can see, okay, there's HTTP/1.1. But what if there wasn't that HTTP/1.1? Or what if the string ended right here? There wasn't quite enough space for that L in there, or maybe for the null terminator.

[00:05:58]
What happens if we get an HTTP request like that? Well, first of all, this shouldn't come up in the wild in theory in the sense that like any browser is just always gonna send you some headers here. So you definitely have some more string to work with. But in practice, that's something that like you really would rather not leave to chance.

[00:06:14]
You really don't wanna try to make an assumption about that. And when you have like a third party that's sending you this data, that's how exploits happen. And this is a good example of what we're the approach that we're taking right now. First of all, it's like pretty unusual and it's not what I would do in most program language is even though in c it is like a defensible choice.

[00:06:32]
But it does have security downsides if I don't explicitly go outta my way to check for that case where I'm doing this memcopy onto. Destination that doesn't have enough sort of bytes, like, semantically for what I'm trying to put here again, C, there's no guardrails here. Like, C is really just gonna be, like, okay, I got three integers.

[00:06:51]
I've got, like, here's where I'm gonna write to, here's the memory address that I'm gonna write from, and then here's the number of bytes. If that number of bytes writes off the end of the string, c does not care at all. It's like, yeah, I don't know, whatever memory is there, I'm just gonna stomp all over it.

[00:07:04]
So we really want to make sure to handle that case. We want to explicitly go out of our way to check for this. But this is yet again another example of memory unsafety in C. If we don't remember to or don't think to go out of our way to check for that case and we just write this code as is, this could be an exploit vector.

[00:07:21]
This could result in, if any user agent doesn't send us a full string that, like, intentionally omits that, this could do some very bad things when this memcpy stops all over whatever bytes happen to be in memory there. It could seg faults, any number of things could happen.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial