Wrapping Up

zed.dev

Lesson Description

The "Wrapping Up" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:

Richard provides a comprehensive overview of the course content and recommends classic compiler books and modern alternatives. He also answers student questions regarding how recursive types are handled, LLVM as a compiler backend, and the limitations of type inference.

Join Now

Preview

Transcript from the "Wrapping Up" Lesson

[00:00:00]
>> Richard Feldman: Okay, so I did a little recap as part of our wrap up and then basically just do a little bit of further learning if there's other resources that you're interested in on these topics. So at the very beginning, we talked about how basically the fundamental job of a compiler, whether it's ahead of time or runtime interpreter, is to go from source representation to machine instructions.

[00:00:20]
Talked about how there's different arguments about terminology is an assembler really a compiler or is that a different thing? I'm not gonna take any stances on that, but the main thing is just like head of time compiler runtime interpreter. Pretty well understood what those mean. We also talked about just-in-time compilers, such as WebAssembly, which we actually got to build something with, as well as an optimization for interpreters, which is the most common way that they're used, such as JavaScript, PHP, Ruby, etc.

[00:00:43]
We talked about these different terms, and sort of like assembler is low level to low level, you have a really low abstraction. It is human readable, but it is very, very close to the middle language that's compiling to something that's definitely not human readable transpiler being something that's going from a high level language that is designed to be human readable to another high level language that's designed to be human readable.

[00:01:04]
I talked about interpreters, which are generating the machine code at runtime. Compilers are basically taking the input and generating the output at compile time. We talked about some compiler passes and we saw a variety of them. We talked about tokenizing, aka Lexing that feeds into parsing, followed by name resolution or canonicalization, type Inference, and then finally code generation.

[00:01:23]
We talked about basically if you're generating code as a string, that's essentially formatting. So you're going from source string over to tokens via Lexing, then tokens to parse tree, that's parsing, name resolution, type inference, and finally code gen. And if what you end up at the end of all that is a string, we call that formatting.

[00:01:41]
But if we're not compiling to that and everything else is exactly the same, then now we call it, it's not formatting, it's a compiler or something like that. And that's where we ended up doing all of these different stages, spending the most time on type inference, granted, but still getting a little bit of a taste of lexing and parsing at the beginning, name resolution.

[00:01:59]
And then at the very end, some code gen to WebAssembly. Really the most classic book in all the history of compilers is this book. It's known as the dragon book. I actually don't know what its name is. I'm looking at it and I still don't know what its name is.

[00:02:11]
It's just the dragon book, that's what that's what it is. When I wanted to find an image for this, I just Googled dragon book. This book is very famous and also very old and out of date, so I hesitate to recommend it and except in the sense that it's a classic.

[00:02:24]
So I'm not saying you should read this and then be like, this is the best way to build compilers. But it is maybe worth your while to read it just because of its classic status. A book that I have not read, but which I have heard recommended as an alternative to the dragon book is this one, it's called Engineering A Compiler.

[00:02:41]
I can't first, recommend it cuz I haven't read it, but I have heard people say that if you want something that's more focused on, hey, this is a practical modern thing for people who are application developers, I'm assuming most of us are. Then this is like kind of a more reasonable book to get more things.

[00:03:00]
This is also not something that I have personally read, but I have a very strong recommendation from one of the best compiler authors I know, Ayaz Hafiz. He was like, this is the book if you wanna do get really, really hardcore into type inference. I'm embarrassed to say I bought it and have read two pages of it, so I can't endorse it yet.

[00:03:17]
But I mean, so far the two pages were stellar so. But yeah, if you wanna get really, really deep into all the nuances of different type inference implementations and ways you can go about optimizing different things. This apparently is a really great book for that. And finally, if I could only leave you with one thing to take away from this is this theme of traverse the input, generate the output, and then traverse the thing that the previous step Gib is really powerful.

[00:03:44]
It's not just a compiler's thing. I mean, it is powerful in compilers, of course, and we got a lot of value out of it. But it is something that comes up even outside compilers, in all sorts of programming. And Steve Yegge had a famous blog post where he's talked about, once you've done some compiler stuff, you start to see compilers everywhere.

[00:03:59]
And I think this is the main thing that he's talking about. [LAUGH] There's so many problems in software that just break down into like, can I get this into a more useful representation? And then can I get from that useful representation into an even more useful representation? Really at the end of the day, that's what we've been doing all day, is we've started with a representation that's nice to read but doesn't actually run all the way in through something where we've analyzed it, we got outputs in the form of naming errors and type mismatches.

[00:04:25]
We did type inference and then finally ended up with something that could actually run and actually run faster than the source code that we put in on the first place. WebAssembly, generally speaking, runs faster than Javascript. So yeah, this idea, powerful idea, and if there's one thing you can take away from today, I would hope it would be this.

[00:04:41]
>> Speaker 2: Some final, Questions here.
>> Richard Feldman: Yeah, sure.
>> Speaker 2: How are recursive types handled by type inference?
>> Richard Feldman: [LAUGH] How are recursive types handled by type inference? That was the first thing I'm like, we're definitely not doing that in the workshop. I mean, the short answer is complicatedly. It absolutely works, but that's where you get all sorts of gotchas around, I mentioned the occurs check earlier.

[00:05:04]
That's really important for recursive types. Also, when you get to code generation, they're a total nightmare. And a lot of cases, especially if you're doing monomorphization, the short answer is, there's just a lot of complexity there. I don't have a concise answer the question of, how do you do other than just, how much time you got?

[00:05:21]
[LAUGH] Yeah.
>> Speaker 3: So I don't know a whole lot about LLVM, but how hard would it be to take that kind of intermediate representation we made and plug that into LLVM, and once you have that is that support generating WebAssembly.
>> Richard Feldman: It does, yeah, okay, so LLVM is also a deep topic.

[00:05:40]
I can give you some brief points on that and then also we can talk more after if you want. So the question was, so LLVM is a compiler back end that its job is essentially you give it an intermediate representation that it controls. It says, hey, instead of generating web assembly, you generate web LLVM IR.

[00:05:59]
And then from there you can tell LLVM it's a library. Basically say, hey, generate me some WebAssembly, literally, LLVM will do that for you. Also you can say, generate me x86-64 machine code, or generate me ARM64 machine code, or RISC-V. And then they have really long lists of generation targets.

[00:06:16]
On top of which the thing that LLVM is most known for, the second most thing is known for is running really slowly. But the thing is most known for is its optimization passes. So in addition to giving it the code and saying, hey, generate me this code, first, you can say, really, really optimize this thing.

[00:06:33]
And LLVM is basically the gold standard in terms of doing the best job of cranking on your code and getting it to be so that what the machine code that comes out the other side runs as fast as you can get from pretty much anything. Unfortunately, the downside is that LLVM takes a long time to give you that code even if you're not optimizing anything.

[00:06:51]
So it's because of architectural problems in LLVM rather than, they're just working really hard on your optimizations. No, you can say zero optimization and it's still very slow. But, yes, you can totally do that for WebAssembly. How hard is it is a different question. So, yeah, the ergonomics are not amazing.

[00:07:07]
It's a similar level of abstraction, I would say, overall to WebAssembly in terms of they're really not giving you a lot of conveniences compared to if you were compiling to C or compiling to Javascript or something like that, like a higher level language. It's quite low level, there are some conviniences but, yeah, you still have to do basically pre-similar stuff to what we just said with WebAssembly.

[00:07:31]
The difference being that then you have to bring in the LLVM independency and we first along compile times and dealing with its' api and stuff like that. So my preference would be kind of just go straight to WebAssembly for something like this. But if you wanted to, you absolutely could in theory.

[00:07:48]
And also I didn't wanna wire up no js to LLVM [LAUGH] that doesn't sound fun, but in theory, yeah, you absolutely. If you wanted to, you could take this compiler we just built, take the exact representations we made and translate them into LLVM IR instead into WebAssembly, that would totally work.

[00:08:05]
Yeah, other questions? Was there one online? Do I remember that, right?
>> Speaker 4: Yeah, could you talk about the limitations of type inference?
>> Richard Feldman: The limitations of type inference, I would say it depends on the type inference algorithm. Limitations of Hindley-Milner? I mean, I guess it's mainly opportunity costs, just in the sense of, if you choose Henley Miller for your type system, you get type inference that I don't know if it would be hard pressed to think of something where I could say that Hindley-Milner's type inference has its limitations.

[00:08:38]
Well, it's principal decidable type inference, I don't know what else you want [INAUDIBLE]. It doesn't get much better than like it always correctly infers 100% of your types and it always correctly infers the most general type. I'm at a loss for what you could do better on the specific side of type inference there.

[00:08:55]
Mind reading type inference might be better, yeah, yeah.
>> Speaker 5: I would personally argue that resolving into generic types is a suboptimal idea and maybe that's an angle.
>> Richard Feldman: Okay, but what would you do instead?
>> Speaker 5: Yeah, well you have to design, right? There's a design choice in there, you can't fully genericize, but yeah.

[00:09:12]
>> Richard Feldman: Okay, I mean, yeah, so if you don't like it being so general, yes, then I guess that maybe you would want something else. But to me, I would think that the trade-offs are more, at least to me personally, I would say that the type inference capabilities of Hindley-Millner are sort of an almost an unalloyed good, that's just a big selling point.

[00:09:32]
The downsides would be more that, well, yes, but if you're buying into this system, there are these other things that are harder, that are outside the remote type inference, subtyping being the biggest one. And there are lots of nice benefits to subtyping, also some downsides like covariance and contravariance are become things you have to worry about.

[00:09:50]
But yeah, in general, so I don't know if this is what the comment was referencing, but anything that I have heard people say is, okay, I don't like type inference because if I don't have types anywhere, then it's hard for me to understand what things are going on, or I get spooky action at a distance where I change one type over here and it affects some type that's very distant to that.

[00:10:11]
The problem with that to in my mind is that it basically is saying, okay, so if you have the capability to choose to annotate or not annotate your types as much or as little as you want, and you make bad choices, the thing that allows you to make those bad choices is bad.

[00:10:26]
I don't know, in some senses that logic makes sense to me when I'm trying to make good choices, but it's error prone. So again, we've taught a workshop on C it's pretty hard to get right a lot of the time. But the very, very simple heuristic of just always annotate your top level types is something that pretty much all professionals I've worked with, do in languages that support this.

[00:10:48]
But then if you do that, you're not any worse off when it comes to top level types than languages that require it. So I don't see how that's a downside, really. But then if you're writing, a small script where I actually would prefer not to do it here, then it's convenient that I don't have to.

[00:11:02]
So I don't know, I have a tough time buying the argument that it's a downside, that it gives you that much flexibility, because the amount of discipline and carefulness you need to use to not have that downside materialize is zero. So anyway, [LAUGH] I don't know if that was what the question was getting at, but that's my answer to it.

[00:11:22]
>> Speaker 2: What do you think about the SCIP book.
>> Richard Feldman: SCIP, so SCIP structure and interpretation of computer programs. I mean, it doesn't really have anything to do with compilers, really. It's been a long time since I don't even remember if I read all of it. I don't know, a lot of people talk about it as it changed the way they looked at programming, they really loved it.

[00:11:44]
It wasn't really for me, they talk a lot about, I don't know, the elegance and beauty of code and I don't know, when I read stuff like that, I just see slow programs that I don't wanna use. [LAUGH] And that was kind of the vibe for me. But I appreciate that a lot of other people have a very different vibe from it.

[00:12:04]
And I don't think there's any type checking in that book, but I assume that there is based on the interpretation part of the title, they do get into interpreters. But there's a number of good, I don't know, highly respected interpreters books right now. Was it grokking interpreters? I'm gonna blank on the name, but anyway, there's some good interpreters book that a lot of people seem to like by Bob Nystrom, crafting interpreters, that's one, yeah, thanks, right?

[00:12:30]
Grokking simplicity, an unrelated book, yeah, Eric Normand, yeah. Also, I hit him on the podcast. Other questions, I think there was, yeah.
>> Speaker 6: Yeah, I just kind of wanna go back to the kind of comparison you made about kind of formatters, right? And how they're kind of comparable to compilers in a sense, or there's a lot of overlap.

[00:12:48]
And I guess another thing that comes to mind is like linters, right? I guess, I'm curious to know what are some other maybe underutilized kind of maybe not necessarily, I don't know the right term for comparisons or comparable use cases for just parsing and things that are adjacent to compiling, that you could do with code more generally,

[00:13:04]
>> Richard Feldman: Yeah, it's a good question, I mean, I haven't sat down and made a list. I have to think about that, I don't know, I thought about these, because I think about compilers a lot. [LAUGH] But as far as other things, yeah, I don't have a list off the top of my head.

[00:13:20]
I'll think about it. Thanks very much.
>> [APPLAUSE]

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now