Lesson Description

The "Polymorphism Exercise" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:

Richard walks through the polymorphic type checking exercises by implementing type inference for arrays. He explains how the type checker ensures that all elements in a homogeneous array unify to the same type, using a simple approach of comparing each element to the first.

Preview
Close

Transcript from the "Polymorphism Exercise" Lesson

[00:00:00]
>> Richard Feldman: All right, so we are on part six out of seven, we're in the home stretch. So this is gonna be the last time that we do this on typecheck.js, the next one's gonna be on wasm.js. All right, so here we are, typecheck.js, scrolling down. And now at this point, right, we're closer to the bottom here.

[00:00:20]
I think that's where all the things are clustered, yeah, they are. Okay, great, right. So here's what we're doing here. We are now actually doing some type checking on an array. So this is our first polymorphic type checking that we've done in an exercise. So array types, they have one type parameter.

[00:00:37]
If the array is empty, then this is actually very straightforward, like when we visited it's, well, if it's an empty array then no problem. There's definitely nothing to infer about the element type, it's definitely gonna be unbound, and so you can just early return and just do that fun.

[00:00:50]
Create concrete type array, which is going to end up with an array that has null for its type parameter. So that part, pretty straight forward, but the more interesting part, of course, is when you actually do have elements in there. Now, the cool thing about arrays is that, some languages have what are called heterogeneous arrays, where you can have a mix of different types of the arrays, you can get a string and a number, and this and that.

[00:01:14]
Lots of other languages including this one, require homogeneous arrays, where you're allowed to have as many different elements as you want, but they all have to be the same type. Or, to be more precise about it, all of their types have to unify. We're not really gonna get into the implications of that quite right here.

[00:01:29]
But something that does fall out of this is that, you can have an array of arrays, and some of those arrays are more strict or permissive in terms of what their type parameter is. Cuz you don't have to just have a type parameters and all or nothing, it's not just a concrete type string or a null, it could also be another array type, for example, you have an array of arrays.

[00:01:53]
And as you nest these, just the fact that they're all getting unified together means that you can end up with different arrays getting restricted more and more as they get used in. And sort of, you end up with sort of the lowest common denominator of flexibility as all of those get unified together.

[00:02:11]
So, basically, what we're gonna do is we need to ensure that every single element in the array has the same type. And we're gonna do this in pretty much the most basic way there is, which is just to march down all the elements, [LAUGH] and just be like, okay, you all gotta be the same.

[00:02:23]
There is one thing we do have going for us, though, which is, the nice thing about the whole unification system is that, it means that, if the if we knew that they're all going to get unified together, we don't need to do the matrix of all possibilities. What we can do instead is we can just say, okay, just grab the first element out of there, and just everybody else needs to be equal to that.

[00:02:40]
So that's exactly what we're gonna do, we're gonna unify every single element in the array with the first element in the array, and then just say, whatever that thing's type is, that's what everybody else's type is. And as soon as we encounter somebody that's a mismatch, it's a mismatch game over.

[00:02:52]
But otherwise, all the symlinks will end up with, just everything is essentially equivalent to everything else, and we don't really need to do any other work for that. Okay, so that's what the purpose of this first element type is. Note that we'll be handy later is we're doing the visitNode to get the actual type out of this and all the right implications for that.

[00:03:11]
So this is just the first element of the array, and we already handled the case where the elements list was empty. So we know that we already have at least one here. So this is definitely gonna end up with something. Then, we have this first element concrete, which is our, the thing that we used in a previous exercise just to be like, a convenience like this will either return null or else it will return the thing that's inside the concrete field of that record.

[00:03:33]
So it's sort of unwrapping it or just giving you back a null. And then, we're doing node.elements.slice1, which is basically just skip the first one, cuz we already got the first one over here. And this is the part where we start to get into the stuff we're gonna change.

[00:03:50]
So basically, the procedure for unifying these all with one another, this is gonna be our our first from-scratch unification of polymorphic types. Get the element's type ID and then basically get concrete type name just like we did there, in order to just pull out the concrete type if there is one or it might end up being null.

[00:04:08]
And then we need to check for type mismatch. So if it has a concrete type, in other words, if this is non-null, and so does the first element, and those are different, then type mismatch, but all of those conditions have to be met. Otherwise, it's okay, and we let it through.

[00:04:22]
If that's not the case, sorry, if we didn't have a type mismatch, then basically we just wanna unify the two. So first we check to see if there's a type mismatch, and then afterwards, then if there was not a type mismatch, then we go ahead and unify. Great, let's take five minutes and work through it.

[00:04:44]
>> Richard Feldman: All right, so this initial case, not a problem, we're already done with that. Nothing to do there, getting the first element type and the first element concrete, also not a problem. The interesting part is in the body of the loop. So in the body of the loop, we're gonna go through this and basically say, okay, go ahead and first we get the element type.

[00:05:01]
So this is actually basically a carbon copy of what we did up here. The only difference is that we're actually, I forgot that I refactored that. So basically, just go visit that node, instead of visiting the first node, we're just going to do what we did over there.

[00:05:17]
And then once you've got the element type, you can use that to get the element concrete type, I don't need that anymore. Yeah, there's a funny example of JavaScript, or guess, TypeScript telling that rule that we talked about earlier, right? Now we know how to implement this if you want, right?

[00:05:31]
This is whether you search the entire scope and say, disallow duplicates based on the entire scope, or based on current scope only or allow it, right? So here we're seeing, they are not allowing it in this case, but now we know how to do all three if we want, from part 2.

[00:05:49]
Okay, so there's that one, and then basically, here we have this conditional I mentioned, all of these conditions have to be true in order for us to report the error. So, the first element needs to be concrete and the second element needs to be concrete. The first element concrete needs to not be null is what I should say, so this is the nullability check and the same thing there, and then basically just, yeah, are these actually not equal to one another.

[00:06:12]
So if they are both concrete and they are not equal to each other, then we have the classic type is matched scenario, both are concrete, they're not the same, therefore, we report an error, so do that. And then afterwards, if we, yeah, that's a little bit annoying to break out before each loop.

[00:06:32]
So return will actually do what we want, I guess that's a good argument for not refactoring this the way I did, but whatever. Point being, we're doing the reporting, that's the critical thing here. And then basically after that, we can then try to unify the rating one. So this is where we essentially say, okay, the first element type needs to be the same as every single one of the other element types.

[00:06:51]
So all we have to do is essentially say, since that's sort of fixed outside the loop, we're in the middle of the loop, take this element type and just unify it with that. And then, this once again, did the refactor there, elements, or, sorry, element, and now we have unified these two.

[00:07:09]
And unify does actually return a Boolean as to whether or not the unification was successful. If they didn't actually unify, then, okay, type mismatch, right here, another one, and there we go. So now we rerun it, is not defined a missed one, yeah, that one.
>> Richard Feldman: It just looks nicer without the old school for loop style.

[00:07:33]
Although, shout out to the C course that we did the other day. [LAUGH] Definitely do some old school for loops in that. All right, any questions about arrays, polymorphism? Last call for types of questions. Actually no, you can ask some of the next section if you want, but we're after this, we're moving on to WebAssembly.

[00:07:49]
So now is the best time, I guess, to ask any type related questions that you got.
>> Speaker 2: I forget the name of the type system we're using, but in languages like C and stuff, that's a different type system, and I'm assuming the type annotations there are necessary versus the one we're using?

[00:08:10]
>> Richard Feldman: In a lot of cases, yeah. So quite a lot of languages have either no or very minimal type inference, and so they rely very heavily on type annotations. Honestly, they look very different than this, they'll still have the same basic concepts of types and how they relate.

[00:08:27]
And of course, they find type mismatches too, but the way that they're going about updating their database is different in many ways, from what we're doing. So I would say there's some overlap at a foundational level, at a basics level. But yeah, if you get into the details, you're gonna find quite a lot different between a C-type system, for example, and Henley Milner, who's [LAUGH] the one we're doing.

[00:08:52]
>> Speaker 2: Do you know what a system that TypeScript is built off of?
>> Richard Feldman: I mean, TypeScript is a beast, I mean, it's a really, it's among languages that are top 10, top 20, most popular languages, I would say TypeScript is almost certainly the most complicated type system there.

[00:09:10]
The only other contender would be C++, and I'm not actually sure if C++ type system is more complicated or if C++ lets you make a bigger mess in this type system.
>> [LAUGH]
>> Richard Feldman: But those would be the two top contenders, [LAUGH] really. So, yeah, I don't know what they're using, but it's doesn't fit neatly into a bucket because it's so complicated, yeah

[00:09:32]
>> Speaker 3: Why do you think TypeScript doesn't have function parameter type inference?
>> Richard Feldman: Why do I think, first of all, I didn't know that I don't use TypeScript much [LAUGH] so this is news to me that TypeScript doesn't have function parameter type inference. I don't know offhand. I mean, it's actually pretty common for a lot of languages that they will say things like, if you're doing it for like an inline function, like an anonymous lambda or something like that, we'll do type inference on the arguments.

[00:10:01]
But if it's at the top level, you have to spell out every single type, there's no type inference whatsoever. Obviously, that's not true of what we're doing here, but I think that hoisting might be one of the reasons. So that is an example of a case where a top level function is just sort of special and it works differently, in the case of hoisting, it's working differently from a name resolution perspective.

[00:10:23]
But I wouldn't be surprised if, in those languages, if you really dig under the surface. You can find multiple differences, and not just the like name resolution between what it means if you have a top-level function versus if you have an anonymous lambda. And some of those languages may not even support anonymous Lambdas.

[00:10:40]
So I guess, I don't know, and I don't really have a particular guess other than just the general, there's multiple reasons that many languages don't allow type inference for top-level functions.
>> Speaker 3: In functional languages like Elixir, is there something like unify, but not for types, but for values like pattern matching?

[00:11:00]
>> Richard Feldman: So the way that pattern matching works is, I would say, pretty unrelated to type unification. You basically build up this big matrix and then, yeah, it's just kind of its own thing, I wouldn't put those in the same group, really.
>> Speaker 4: What happens to type inference when you partially apply a function?

[00:11:19]
>> Richard Feldman: What happens to type inference when you partially apply a function? So a piece of trivia, I guess, about most of these languages is that, more of them than not are curried, meaning that they're partially applying non-stop. So the way that it works in those languages, so the way that we have functions working is, I think arrow function was how I solved this, yeah.

[00:11:42]
See if you say arrow function, now you have something to grep for that's not function which has a million matches it, I guess I could also use the outline. Yeah, there it is. Okay, so in our case, we have multiple params here, right? But if you look at the type checker for like Elm or Haskell or ML, or something like that, actually rock is not curried, but all the other ones are, rock is very unusual in not being curried.

[00:12:02]
What the language being curried means is that, technically, in Elm and Haskell, technically, every language they have has exactly one parameter, always, one argument. But syntactically, they disguise that fact by basically making it so that both the syntax for declaring a function looks like you're defining multiple arguments.

[00:12:22]
And then also the syntax for calling a function looks like you're passing multiple arguments. And the only time the distinction comes up, is if you omit an argument when you're calling the function. And in most languages, that's where the compiler says, hey, you forgot one, that's error. But in those languages, says, that's fine, I'm just gonna return a partially applied function that has applied all the arguments you did provide, and then just says, a lambda, basically that takes the final argument.

[00:12:46]
And then if you provide that to the thing that it's returning, then it goes back and runs the whole thing, without going on a huge tangent about currying. So what does that mean for type inference? What that basically means is that, as long as you've got a language that's already set up to have all the functions be curried, it's actually an easier job than what we're doing.

[00:13:04]
Because basically you're just, instead of having to iterate over here you're just, cool, node.param. It's the one param I just get rid of, I just delete some for loops, basically [LAUGH]. And then there you go, you already have the partial application and all that. Where partial application becomes more of a pain for a compiler author is actually in the step we're about to get to, which is code generation.

[00:13:26]
If you do the naive code generation and you just actually, literally create all those lambdas every single time, that's tends to be bad for performance. And overcoming that and making it so that you sort of undo the querying in the very, very common case where you're not partially applying it and applying everything is more work for the compiler author.

[00:13:45]
So that's the trade off, really from a type perspective, it's actually more convenient to curry everything and partially apply everything. And if you're partially applying manually where you're just like, I'm just like returning a lambda that I wrote that is calling this other one with arguments, that doesn't even need any special handling.

[00:14:00]
It's just plain old ordinary function calling and type unification. So, that would already work using everything we've already done here, we don't even need to lift a finger [LAUGH] and support that.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now