Lesson Description

The "Polymorphic Functions" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:

Richard explains the function for concatenating arrays, highlighting the need to handle different types without mismatches. Richard emphasizes the process of unification, the role of type annotations as constraints, and the principles of decidable and principle decidable type inference in Hindley-Milner type systems.

Preview
Close

Transcript from the "Polymorphic Functions" Lesson

[00:00:00]
>> Richard Feldman: At any rate, we've now arrived at a point where we have successfully unified everything. We've ended up with basically achieving our goal. We had our goal of two successful calls, two identity function, no type mismatches, and yet full type inference occurred, and we've achieved that goal. We can now ask what is the return type of this thing and it'll go and say, okay, that's in slot 4.

[00:00:20]
Let's go look at number 7. It's a number. Same thing over here, we can go ask what's in slot 8. It's 11 and that's a string. Great, we've successfully inferred the correct types and all without any type mismatches or with messing up the identity function, great success. Okay, now let's move on and talk about arrays.

[00:00:40]
So here we have a function that we're gonna define named concat. So this is gonna be your classic staple two arrays together function. We're not gonna write out the type annotations for this, because, as previously noted, the type annotation is going to be the very last section of this part.

[00:00:55]
But you can just imagine the implementation, right, we put two arrays together. The relevant part for our purposes is like, I wanna be able to call concat passing an array of numbers like this. And I also wanna be able to call concat passing an array of strings. So this has some things in common with the identity function example in that I want to be able to call it with two different flavors of types and not have a type mismatch here.

[00:01:18]
But also it has some things that are different, namely that it's not like we're just taking some variable like a, and just like not using it. There are going to be some constraints here, because we are actually using these as arrays. So, once again, we're gonna select, okay, 0, 1, 2, 3, 4, etc, initialize our database, just kind of just blasting through these things we've seen a million times now.

[00:01:42]
This is assembling to 1. This is gonna be our function here. And then we have for these, we're gonna end up with Array <5>, Array <5>, Array <5>. Just to pause on this for a second, the basic idea here is that each of these three things we have ended up unifying to basically say, I don't know what my arrays element type is.

[00:02:03]
But I do know that these all have the same array element type, which is gonna be whatever ends up being in slot 5 over here. So taking a step further from this, we're gonna again, just a lighting numbers 2 through 3 for space reasons. We've got our function up here.

[00:02:21]
We've got our array <five>. We still don't know anything about 5 at this point. It's one of those variables that we've created and assigned all these array types to. But it has not been further constrained. So at this point, it is still totally polymorphic. And then we're gonna start doing the blue ones, the sort of fake instantiated on the fly, just for the purposes of call once.

[00:02:41]
So this call to concat, we're gonna make a fake instantiation of the concat function. So you can see over here we've got two, three, and four. So in the identity function, these were all the same numbers. Here we have three different types, like the two argument types and the return type, two, three, and four.

[00:02:57]
They are actually different types or potentially can be. They're different type variables, if you will. They're different type Ids, so when we copy it, we're also going to give it three different ones. We're not gonna like make them be the same. And then when we get into these things, instead of the three array bracket fives up there that we had, we have array bracket 10, which is going to be unified in this case to a number.

[00:03:19]
So once again, the similarity here to what we saw with the identity function was that we ended up with this thing that's basically saying, yeah, there's an element of polymorphism here. Now, in that case, the element of polymorphism was the argument itself. In this case, it's something that actually doesn't really appear in the implementation.

[00:03:35]
It's more of a behind the scenes thing. It's like, I know that this is an array, and I know that this is an array. And I know that they return an array, and I know that all those arrays have to have the same element type, but the element type itself, at least in our implementation.

[00:03:47]
Okay, maybe it would appear in here if we actually written out the implementation. But we don't see that, we didn't write a five on here, because it doesn't correspond to any one of these elements out here. The five is about, this is the type parameter of that array of.

[00:03:59]
Of these two arguments and also that return value, and it's exactly the same thing over here, except this is gonna end up unifying with number, because that's what we're passing in here. One cool implication about this is that the process that we use to arrive at this is not actually different than the process we used for the identity function It like we already had this problem.

[00:04:20]
We needed identity function to work, and so we came up with this cloning the weird fresh thing, and going and making a copy, but it's like the same shape, but it's like different type IDs. Then from there, you can just proceed normally with unification. This is what I meant earlier in Henley Miller, you kind of are priced into slash, get for free generics.

[00:04:38]
Cuz it's sort of, I don't know, we didn't actually do anything different here than what we did on the previous section with the identity function. If you want identity function to work, which of course, you do, you just already have to deal with polymorphism on a way that happens to work already perfectly well for generics, like parameterized types array.

[00:04:57]
So if you're in one of these languages, the price in part is essentialized, well, if you're not gonna support you there that means that this isn't gonna work. You're not even gonna be to have identity function, of course, you wanna be able to have identity function. But of course, if you've already done that work to implement that then you're basically already all the way here other than maybe having a, [LAUGH] syntax for having type annotations around it.

[00:05:19]
Now I don't want to completely hand wave this away and say, this is 0% different. There are some differences. The unified function does get a little bit more complicated, because, much like with function, array is also one of those concrete types where you have an additional consideration, and you can't just say, these two arrays, they're both an array.

[00:05:34]
So I guess that's the end of it. You have to descend into it and compare their element types as well. There is a little bit more work there. But fundamentally, the overall approach is it's all the heavy lifting is already done. There's just a little bit of detail work left over.

[00:05:48]
But fundamentally, it's really not any different to support having arrays with polymorphic type variables in them than it is to just support the basic identity function works at all in. So that's why you tend to see like ML, Elm, Haskell, like they all have parametric polymorphism. They've all way before Java and them did because they were like, well, yeah, we just need it on a basic level anyway.

[00:06:12]
[LAUGH] So anyway, so to sum up like everything we talked about when it comes to polymorphic type checking before we move on to type annotations. Basically whenever you're calling a polymorphic function, that is to say a function that behaves differently, which might be a behind the scenes runtime thing based on its type, what we do is we just make fresh copies of the function's type, like the entire structure of the function.

[00:06:37]
The new copies, importantly, have new unbound variables. So there are different numbers, and then also those numbers are initialized to null, and then they'll, of course, immediately get overwritten based on the actual call. Because what we do is we unify the function call stuff with the copies instead of the originals, thereby making it so that the originals are unaffected, only the copies are affected.

[00:06:57]
Something I'm not really getting into here is that you. Can mix and match a little bit. So it is possible to have array.concat both arguments are polymorphic. You can also imagine array.concat that has a third argument that's just like a concrete type. And then in that case, when you're doing the copies, you actually would intentionally copy the original type, because you do want a type mismatch if somebody gives the wrong type for that.

[00:07:16]
So it's not like the polymorphism is just like an all or nothing thing, it's really just like the arguments that are polymorphic work this way. And if you have both polymorphic and non-polymorphic arguments, it is actually important that you maintain some of those sim linky kind of relationships with the copies so that they are actually unifying against the originals and breaking when they're supposed to break.

[00:07:37]
>> Speaker 2: I had a quick question just for my understanding with the identity function, right?
>> Richard Feldman: Yeah
>> Speaker 2: We have just like that, the argument type is the same as return type. But say we had another polymorphic function where the return type is different. The only difference there is that we would, instead of having the same database ID, we'd have another ID for the return type.

[00:07:56]
>> Richard Feldman: Yep.
>> Speaker 2: That would also be null. Is that correct?
>> Richard Feldman: Yeah, so let's say it would be kind of silly. But what if you were just return zero, Yeah. It's like, what's that argument doing? But, yeah, so like, if you did that, then, right, exactly. So what would happen is that way back at this stage, these would not both be two, Yeah.

[00:08:12]
And so then later, when we're doing the unification, the whole reason that we were unifying, and this is a place where it's maybe a little bit easier to understand if you don't do the short cutting and cutting to the chase, cutting out the middleman kind of a thing.

[00:08:27]
If we actually had different type variables here, it would be more obvious that this one was for the argument. This one was for the return type.
>> Speaker 2: Yeah.
>> Richard Feldman: But yeah. So this is the point where it's like, okay, you would get something different here because the return type was not identical to the.

[00:08:40]
>> Speaker 2: Right. And it could be different, maybe you return maybe a Boolean or a string, depending on what value you get for a.
>> Richard Feldman: Well, so that's a more advanced type system feature. If you have the possibility of depending on runtime conditions, like a conditional, it can either return a bool or a string, now you need union types in the language.

[00:09:02]
>> Speaker 2: Yeah, yeah.
>> Richard Feldman: Or at least some types or something that can distinguish a runtime between those. But in this type system, that would just be a type mismatch if the two branches were different things, going back to our conditionals, ternaries discussion. But yeah, so when you got to this point, these would not both be 7 in that world, where you were returning 0.

[00:09:20]
One of them would be 7, the other would be, I don't know, something else.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now