
Lesson Description
The "Introduction" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:
Richard Feldman introduces the course by giving historical context for compilers and explaining how compilers transform code into machine instructions, covering interpreters, JITs, transpilers, and bytecode. He also provides a brief look at WebAssembly, which the course’s final compiler will target directly.
Transcript from the "Introduction" Lesson
[00:00:00]
>> Richard Feldman: I'm Richard, I am the creator of the Roc Programming Language, and I'm a full-time rust programmer for Zed, make an awesome code editor. And today, we're going to talk all about compilers. We got some instructions here to get set up for the workshop, it's very simple. It's basically clone the repo and you need to have Node.js installed, that's it.
[00:00:14]
So, doesn't need to be a big rush, you can probably just kind of do it right before we start the exercises. So back in the day, way, way back in the day, before the time of compilers, people used to program computers like this. These are people who are actually programming that computer physically by moving wires around.
[00:00:31]
This was considered to be less ergonomic than what we do today by a considerable margin, but it worked, I mean, it was a way that you could convince the computer. Convince the machine to make the ones and zeros go in the place that you want it to by physically moving these cables around.
[00:00:45]
This is the 1946 ENIAC computer or ENIAC, I've heard it pronounced both ways, I like ENIAC better, at University of Pennsylvania, which is actually walking distance from my house. Although, I haven't seen the ENIAC in there [LAUGH]. And ENIAC stands for the electronic, numerical, integrator, and computer. So one of the big things that people used to use these things for was just like mathematical calculations, that was kind of the bread and butter of them.
[00:01:10]
And importantly, there was this concept then, and there still is this concept now of machine instructions. So the way that these programmers were giving the machine its instructions were by moving things around. And the way that we typically give machines their instructions these days is not so manual, but rather we have a compiler generate those machine instructions, sometimes with several layers of indirection in between.
[00:01:31]
So if you zoom all the way out of what is a compiler's job, basically what people typically mean when they say compiler is you've got your source code representation and you've got your compiler in between that. And then the machine instructions come out the other side. You run the compiler on the source, and what you get out the other side are ones and zeros, machine instructions, and that is much more ergonomic than the 1946 way of doing it.
[00:01:51]
C is a great classic example of a compiler, we just did a workshop on C, if you wanna learn about C, this is not a course about C, though. Rust is another, good example, famous of compilers, I also do the Rust [LAUGH] course for FrontendMasters. And assembly also is a thing that you could consider to be a compiler, but most people actually would put that in a separate category of assembler.
[00:02:14]
It's a little unclear, maybe it's kind of a historical reason for this, but I mean, fundamentally, it is taking a source code representation and turning it into machine instructions. So I would personally put it in the category of compiler, but some people would say, no, no, that's a different thing.
[00:02:25]
Debates maybe had. Now, these are all things that people would call ahead of time compilers, which essentially means that you turn the source representation into the machine instructions before you are even running your program. It's like that is the head of time, ahead of run time, transformation that these compilers all do.
[00:02:41]
You also have sort of a runtime transformation, where you're like, as you're running your program, it is on the fly turning your source code or some intermediate representation of that into machine instructions while the program is still running. And this is usually what people mean when they say interpreter.
[00:02:56]
That's basically the fundamental difference between a compiler or interpreter is, whether a compiler is ahead of time turning your source code into machine instructions. Or some other intermediate representation, whereas an interpreter is doing that same thing at runtime. So shell scripts, classic example of runtime, like interpreted languages.
[00:03:13]
JavaScript is an interpreted language, but there's a little bit of a wrinkle there. So JavaScript, in addition to having a runtime interpreter has a just in-time compiler, or JIT compiler, for short. So a lot of interpreted languages do this. And basically, the reason for it is essentially performance.
[00:03:28]
If you've got a loop that's running in JavaScript, last I checked the cut off in V8 was like, if the loop is running 4,000 iterations or more, it sort of counts how many iterations it's doing in the runtime interpreter. And once it crosses the, I think it's 4,000 iteration mark, 4,001 it's like, okay this seems like a very long loop.
[00:03:43]
Maybe we should optimize this thing and turn it into machine instructions, rather than having to do the interpreter thing every single iteration of this loop. And they're kind of betting that the extra work that they take to do that translation at runtime into machine instructions will pay off.
[00:03:56]
Because the remaining iterations of the loop, they figure, if there's 4,000 there's probably another couple 1,000. Then, okay, that'll pay for itself, and it'll end up being faster than if you just continue the entire loop of the runtime interpreter. So just in time, compilers have really made a big difference in the performance of a lot of interpreted languages, or languages that originally just interpreted.
[00:04:14]
To the point where pretty much every popular runtime interpreted language these days, maybe, except for bash [LAUGH] now has a JIT. So JavaScript does, Pearl does, Ruby does, PHP, Python, as far as I know, all of those have JITs now. Maybe not the main implementation, but it certainly, you can find an implementation that has a JIT.
[00:04:34]
WebAssembly is an interesting example of a language that only has the just-in-time compiler. So basically, WebAssembly is designed to be sort of streamed, like have streaming compilations. So the idea is like your browser wants to be able to execute this, partially WebAssembly has sandboxing features. But also it's like a language where it was designed from the get-go to be like, you've got bytes coming in over the network and you're able to actually just-in-time go.
[00:04:59]
Compile those to machine code as they are coming in off the network before you have the entire program. So everything has to be organized in a way where you never need to wait and be like, hang on, hang on, I can't continue until I get some bytes at the end of the message, it's like, no, no, streaming.
[00:05:12]
As it comes in, you're able to do this just-in-time compilation. That was one of the design goals WebAssembly from Yego. Now you also find compilers that will compile from a source representation to machine instructions by way of some intermediate representation. So one example of this is transpiler. This is the term that people use when the source representation and the intermediate representation are both sort of intended to be human readable languages.
[00:05:34]
So typescript compiler, tsc, would be a perfect example of this that's going from a human-readable, human consumption language, namely TypeScript, to compiling to JavaScript. You will note, by the way, that tsc is short for TypeScript compiler. So it doesn't call itself tst, it's not calling itself a transpiler, it's calling itself a compiler.
[00:05:54]
This is another one of those terms that is embattled. Some people will say that, no transpilers don't count as compilers cuz they don't go to machine code or other reasons, just like people will say assemblers don't count as compilers. I'm not taking a stance on this, I'm just making you aware of the controversy [LAUGH] in case you see people argue about this.
[00:06:10]
The enemy representation in this case is JavaScript. And you can also have an IR interpreter. So basically, this is a browser Netscape, for example. You can also have an IR compiler, which was either a JIT one like V8, or you could have an ahead of time compiler that would take JavaScript code and ahead of time compile it all the way to machine code.
[00:06:29]
I'm not aware of any popular tools that do this, but I have seen some floating around GitHub that will take JavaScript and give you a statically compiled binary of ones and zeros directly executable machine code. But obviously, the interpreter for that Netscape being an old school example, obviously Chrome and Firefox, SpiderMonkey, all those, being more modern ones.
[00:06:49]
Now you also find bytecode compilers. So, bytecode compilers are different than transpilers, because bytecode is an intermediate representation that is not intended to be human-readable, it's intended to be only machine readable. So Java would be probably the most popular example of really embracing this. So Java has a source code format.java files, and then it also has a bytecode format, and that bytecode format is intended to be portable.
[00:07:12]
So you can take your Java code, compile it down to Java bytecode, and you can take that Java bytecode and run it on any different machine that has the Java virtual machine on it. This was what Java's original selling point of write once run anywhere meant. It was kind of, compiler once run anywhere cuz, I mean, you can write run Python anywhere as the as long as I got Python installed.
[00:07:29]
It was more about the separating out that compilation step, where you don't have to give your Java source code to everyone. You can compile it down to this bytecode and give them that, and then they don't have to do the work of compiling it on their machine. Maybe there was also some, like, this was back at a time when proprietary wasn't the dirty word that it is now, but maybe they'd like, we can hide our source code, that's great.
[00:07:46]
So JVM bytecode, another famous IR, and then the IR interpreter, in that case would be the Java virtual machine. And an example of a just in time compiler would be hotspot. This is a famous Java compiler, that was Bach and his team developed to make Java basically jittered and a lot faster.
[00:08:05]
And actually, that same team, because they did such a good job on Hotspot, Google hired them to go work on V8, and they made V8 way faster, which ended up sort of kicking off a JavaScript performance war. So that all came down to basically just in time compilation of Java bytecode.
[00:08:20]
We are gonna do some WebAssembly bytecode in this course. And so, an example of a WebAssembly, well, so WebAssembly bytecode is like JVM bytecode. It is just like ones and zeros. It's not intended to be human-readable, as we're gonna see in the last section of this course. It's basically just like, hey, we got some bytes, they're arbitrary integers, it was magic constants.
[00:08:41]
And we're gonna write them to a specific part. And then magically, the browser or in our case, Node.js is going to actually know what to do with them and be able to run them. Now, in this case, let's say you wanted to run WebAssembly in the browser, you could do that same thing, but you do still need some sort of source language to compile to that.
[00:08:59]
So in the case of Java bytecode, Java would be an obvious language that compiles the Java bytecode, we talked about Kotlin earlier. Kotlin also compiles the Java bytecode. Lots of languages compile to WebAssembly bytecode. So C, in addition to going to machine code, has the option of compiling to WebAssembly bytecode.
[00:09:14]
Rust is the same way, you can take your Rust source code and get out WebAssembly bytecode. And in this course, we are going to make something, this spits out WebAssembly bytecode directly. We're also not gonna do any libraries, so everything we're doing in this course is just like, you get to see how the entire sausage is made [LAUGH].
[00:09:29]
And in this case, that is what we're compiling to is WebAssembly bytecode. Of note, once you understand compiling to WebAssembly bytecode, I can say from experience that, the next step to machine code is kind of just about learning a different flavor of a very similar thing. There definitely are differences between WebAssembly bytecode and machine code.
[00:09:47]
But fundamentally, there's a whole lot of looking up magic integer constants and figuring out where they go in the right order to make the machine, or in this case, the WebAssembly interpreter understand what you're saying.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops