Hello World in C

zed.dev

Lesson Description

The "Hello World in C" Lesson is part of the full, C Fundamentals course featured in this preview video. Here's what you'd learn in this lesson:

Richard walks through a basic "hello world" example with C. Preprocessor directives, return types, and functions are introduced. The write method from the unistd.h library is called to print the string to the console. Richard also explains what happens at the hardware level when a C program is run.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Hello World in C" Lesson

[00:00:00]
>> Richard Feldman: All right, let's plow on ahead to part 1, Hello, Metal! This is gonna be like Hello World, but close to the metal, or a lot closer to the metal than we normally would be able to in something like JavaScript. So, this is essentially just gonna be Hello World but in C.

[00:00:14]
And what we're gonna talk about in this section is not just, here's how to implement Hello World in C, which of course is very quick. But also we're gonna get down into the details of what's actually going on in Hello World in C. Because there actually are so few steps between what C is doing and what the hardware is actually seeing that we can actually just go all the way down to that level.

[00:00:34]
We're not gonna go all the way down to the hardware level and all the other sections, but I did want you to get a sense of, here is how close we are to the metal. We can actually talk about the actual instructions that the hardware is seeing from Hello, World.

[00:00:44]
We're gonna talk about the performance implications of being this close to the hardware. And finally, we're gonna talk about what can we do to improve the ergonomics on top of the basic Hello, World that we're gonna do. At first, we're gonna make a slightly nicer version of Hello, World a little bit later.

[00:00:58]
That's a little bit more ergonomic and maybe more idiomatic. Okay, so we got our nice metal font here. Any Iron Maiden fans in the house? So this is basically the close to the metal Hello, World in C. So let's kind of go through the different parts of what we got here.

[00:01:14]
So essentially, this pound include here, we're gonna talk about what the pound sign means later on, this is a preprocessor directive. But basically this just means import the contents of this library here. So this is unistd.h, I don't know, universal standard, or something like that. And this is what gives us this right function that we're using right there.

[00:01:34]
Main is essentially a function that returns nothing. So, C is a type check language, you do need to actually explicitly spell out every single type that you're saying here. So you do always need to start by saying, what is the type of my function's return value? So in the case of main, we're saying void because we're basically saying main is not returning anything.

[00:01:57]
If we wanted main to return an integer or something, then instead of void, we would say [INAUDIBLE] here. Yeah, then we have the list of arguments. So in this case, we're saying main doesn't take any arguments. And then, basically, this write function, which is our 1 single function call here, the number 1 here means we're gonna write to standard out.

[00:02:14]
If you gave it the number 2, that would mean write to standard error. So these are hard-coded constants, you could import a file to actually use a named constant instead of a hardcoded integer here. But this one is so hard-coded that it's actually pretty universal that you can just write 1.

[00:02:31]
And also, one of the things that we're gonna see throughout this course is that actually in C, you don't always see in idiomatic C code necessarily people using constants as much as you might in other languages. For example, it is actually kind of common to see people use the number 1 instead of true.

[00:02:47]
C is maybe the original language that embraces truthiness, among other things. And they actually do not have, or at least the original versions of CF. I think there might be some C compilers that support this today. But it's still very common to see people just instead of writing out T-R-U-E, they'll just write 1, cuz it's more concise and C treats them essentially the same way.

[00:03:08]
So then we, of course, have the string Hello, World. And the 13 here is actually the length of the string. So when you're calling the right function, you need to not only give it the string, but you also need to tell it what the length of the string is, or at least how much of the string you want to print.

[00:03:20]
So in this case, when we're doing a hard-coded string like this, of course, the length is really obvious. But if we were using a variable there, that last argument would allow us to say, well, I don't wanna print the entire string, I wanna print a subset of the string.

[00:03:32]
So that's just part of what write does. Also, as we're gonna see, the way that you detect how long a string is in C, it's like one of those things that feels like this is kind of a 1972 thing. Today this is not really the choice that people make for how to design your strings.

[00:03:47]
Getting the length of a string in C actually requires going through every single byte and counting up all the characters in it. So that's why, for efficiency reasons, it's actually just saying, yeah, just please tell me what the length is if you already know it cuz I don't wanna have to go through and count all of that manually.

[00:04:03]
Okay, so next let's talk about what the actual hardware sees. So, this is gonna be the only section of the course where we get into assembly language. We're not normally gonna be talking about assembly language, but we are, for Hello, World purposes, gonna talk a little bit about assembly.

[00:04:15]
This will give you an idea of what people say when they mean portable assembly. It's much more verbose than C and also kind of strange. So don't worry if you don't completely understand everything we're doing here. The main purpose of this section is just to give you a sense of, here's what is actually going on with the hardware, not like, hey, let's actually learn assembly language right now.

[00:04:34]
So this right here, this is essentially a section of the final compiled binary. So, when you actually run your C compiler and end up with a binary executable on your disk, there's gonna be different sections of it. One of the sections is gonna have a whole lot of hard-coded constants.

[00:04:49]
So, if you ever imagine, if you compile a single standalone binary that runs Hello, World, at some point in those bytes has to be the actual string Hello, World. It needs to know how to actually print out that string. And so those actually do get stored in a particular section of the binary.

[00:05:03]
So that's one section. The C compiler is gonna compile down the equivalent assembly, it'll take care of putting that in a different section for you. Even though, when we actually wrote our C code, we wrote that string Hello, World in the middle of the main function. The compiler is gonna say, nah, I know that that goes in a separate sort of constant section.

[00:05:21]
LC0 is basically local constant number 0, [LAUGH] that's what that abbreviation is for. So again, the compiler is sort of gonna generate automatic names for things in the actual binary. This offset, flat thing is basically saying, at this point in the program, we wanna go load from that local constant, that section of the binary.

[00:05:44]
This part right here, this is the actual body of the main function. And this basically corresponds to the exact call to write(1, "Hello, World!", 13). So this one function call actually compiles to four different assembly instructions. And actually, believe it or not, I have alighted some other instructions.

[00:06:04]
If you actually look at the real assembly for main, there's some other boilerplate on either side of this for setting up meeting itself. But basically JMP, right, essentially means there is going to be, somewhere else in your binary, a function called write, which is gonna be compiled in from that unit STD library that we imported.

[00:06:23]
And JMP is basically, jump over there, which means it's basically like a go to. It's go over there and continue running those assembly instructions right there. Before we jump over there, assembly actually will do a little bit of prep work with these other things. So you can see the 1 here where mov to edi, that 1, this offset flat thing was the Hello, World, and then of course there's the 13.

[00:06:44]
So essentially what these four instructions are doing, there's one per argument, is each of them is sort of preparing the state of the processor for executing the right function. So you can think of these as sort of like they're writing to global variables. And then when you jump to this other function, the processor's gonna be, cool, I expected these global variables to have already been populated, I'm just gonna go ahead and read them.

[00:07:04]
So these mov instructions are essentially saying, set these CPU memory slots, these are called registers, it's the lowest level of memory. So every CPU has a hard-coded number of registers. They're essentially, imagine if you had a certain fixed hard-coded number of global mutable variables that you could use to have your functions talk to each other.

[00:07:23]
I know, assembly language is read, they're almost great, that's kind of what registers are. So basically, this right function which comes from that unit STD library, that's actually provided by the operating system. So I don't need to mpm install anything or an equivalent to that. These libraries are basically just kind of built into the OS in the same way that, in JavaScript you have the math is built-in.

[00:07:46]
You can say math.random, you don't need to go install a library to get that.
>> Speaker 2: If there's a second constant, where would it be stored?
>> Richard Feldman: If there's a second constant, it would also be stored in a similar way. Basically what you would see is, you would see a second section like this.

[00:08:00]
So basically, in the binary, there's a whole contiguous chunk of the binary that is gonna be for constants. Each of them has their own label. So, you'd see .LC0, and then below that you'd see .LC1. LC1 might not be a string, maybe it's an array or something like that.

[00:08:16]
But all of them have their own sort of labels. And really all these labels mean, it's like, from this index in the 1s and 0s in the binary to this index, here's where the data is. And this is really just one of the very few abstractions that assembly language gives you.

[00:08:33]
Is instead of having to have that actual number here, it says, okay, this refers to wherever that's going to end up being in the compiled binary. So you don't have to memorize the actual individual [LAUGH] number of the location within the binary, yeah.
>> Speaker 3: So let me know if I'm getting off track, but those turn into global variables, even though they are scoped to the main function?

[00:08:54]
>> Richard Feldman: In assembly, scope is not a thing, everything's a global. At the CPU level, it's like the CPU has this many registers, that's what everything is. Scope is the thing we make up in software to make our own lives easier. The hardware has no idea what scope is.

[00:09:11]
For that matter, the hardware has no idea what a function is. So actually back in the day, you may have heard of procedural programming as a paradigm. That was because, at some point, somebody came up with the idea of, hey, what if we wrote our programs around this idea of procedures, this concept we made up, even though the hardware doesn't know what it is?

[00:09:28]
And it's like, yeah, it's actually a really nice way to program. Now today, we think of this as just, yeah, that's just how programming works, but it's not how the hardware works. So way back in the day, back in the punch card era, people weren't necessarily programming with procedures at all, because that was a thing we made up.

[00:09:42]
They might have just been like, yeah, I don't know, global variables, start running the program, jumping around, just go tos everywhere, that's all you need. So yeah, at the lowest level, you have no scope. That's one of the many conveniences that C introduces on top of assembly and everything's a global, [LAUGH] yeah.

[00:10:01]
>> Speaker 4: So the right function that we're importing, that's tapping into an OS API and not a C standard library?
>> Richard Feldman: Yeah, so the C standard library, generally, is your way of accessing OS level APIs. There's a little bit of a distinction here between Linux and macOS. I don't wanna get too far in the weeds on this, but there's this thing called syscalls.

[00:10:25]
So one of the assembly instructions you can do is you can say, run this syscall. And syscall essentially means, go tell the operating system, hey, this program wants to do this particular operation. That is sort of a low-level operating system primitive, and it's, once again, gonna have some sort of hard-coded number.

[00:10:43]
It's like, hey, go to syscall number 17, which corresponds to writing just standard out or whatever. The reason that, for example, in the case of macOS, these are not actually exposed to the programmer. Technically, if you want, I mean, you can make whatever bytes you want go in the binary.

[00:10:59]
You can be like, macOS uses 17 to mean, right, to stand it out. But macOs actually officially says, do not do that. Don't compile to hard-coded syscalls because we're gonna change what those numbers mean. So you have to, if you wanna write a correct future proof macOS binary, that's gonna work with future releases of macOS.

[00:11:16]
They require you to use the C standard library, which they feel free to change every time they release a new version of macOS. There's also dynamic linking, which we're not gonna get into either. But basically the way that they do that is they can say, okay, if you are calling, if you're using this as the lowest level primitive where you call right, and that's all there is to it.

[00:11:36]
You're not actually hard-coding the syscalls, then they can feel free to change what those numbers mean and sort of mess around with those. Linux actually does guarantee that. So if this were a course on Linux C, we could actually go one step further and say, at this point we don't do a jump to write, we actually do even more instructions ending in a syscall to do the write directly.

[00:11:56]
Because Linux does have a hardcore guarantee of, yeah, if we said 17 means that, 17 means that forever. And you can guarantee that, because we're gonna promise not to break that in future releases. MacOS and Windows do not do that. So typically, the way that people do it is they'll use the C standard library, which includes things like unit, STD.h, in order to provide functions like write, which are for our purposes as close as we're gonna get to the metal in this course [LAUGH] at least.

[00:12:20]
Because if you wanna write something that's portable across operating systems, that is really as low level as you can get. Okay, so one question you might have is, how fast does this program run? Let's benchmark it. We don't really need to benchmark it, [LAUGH] because the answer is, at least unlike macOS, it's literally as fast as the hardware can possibly run it.

[00:12:38]
You cannot make Hello, World run faster than this. What instructions would you remove to make it go faster? If you're using the operating system abstraction of write, which is, again, what you're required to do on macOS and Windows, there's just the bare minimum, it's all there is. You gotta, say, call the write function in order to actually print to the standard out.

[00:13:00]
And you need to give the write function the arguments that it needs. These are literally the hardware instructions to do that. Then, of course, you need to string Hello, World itself. There's nowhere to go, you can't make the C program any faster on macOS and Windows, because this is all there is.

[00:13:17]
So this is a really cool example, I think, of the types of strategies that we use to get maximum performance out of C. It's not that we're doing anything fancy, it's almost the opposite. It's that C is taking your code and generating the absolute bare minimum amount of assembly instructions.

[00:13:33]
Cuz remember, PDP 11, right? [LAUGH] 120, whatever, kilobytes of RAM. 120 kilohertz processor, single core, yeah, you didn't wanna waste any cycles back then on things like garbage collection and objects and whatnot. So it is literally unbeatably fast. You can make a programming language that's more ergonomic than C, but you cannot make a programming language that, at least for this use case, is faster than C, it's literally not possible.

[00:14:01]
So now we are as close to the middle as you can get, congratulations. All right, yeah, and also, the right function could be doing stuff that could get faster. But again, if the operating system, as macOS and Windows do, is saying, this is what we're providing you. We don't have as programmers the ability to make that go faster unless you wanna fork the operating system kernel, then you can go in and rebuild the operating system.

[00:14:25]
Okay, fine, but we're not generally doing that, [LAUGH] we're making stuff that runs in the operating system.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial