Inferring Function Declarations

zed.dev

Lesson Description

The "Inferring Function Declarations" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:

Richard explains inferring functions by breaking down the components of a function, such as arguments, return types, and expressions. He also demonstrates unifying different types within a function, like linking arguments to numbers and handling return types.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Inferring Function Declarations" Lesson

[00:00:00]
>> Richard Feldman: That segues us into inferring functions, where we have a lot more interesting moving pieces to deal with. So here we have our increment function from earlier. So, we got x's, that's one argument. We're returning x +1. Again, we're just gonna go through and like very quickly assign some things here.

[00:00:14]
So, we're gonna have 0 for the name of the increment. I'm just gonna use one to be the entire function body. It's kind of a squiggly line there, but basically this is, like, it's not, we're not talking about the argument, we're not talking about the return type. We're talking about the function, like the entire lambda.

[00:00:30]
And, in JavaScript, of course, as in the other languages, you can pass an entire function, like as an argument to something. You can put it in an array, and, of course, you can assign it to a constant. So that's what that type is referring to that. That one, two, we're going to give two, the argument of that function.

[00:00:45]
Three is gonna be the return type of the function. So those are all distinct pieces that all have different type checking going on with them. And then we have four being the actual expression here of like x + 1. And then within that we have two is what we're going to assign here.

[00:01:00]
Here we are doing the cut to the chase thing, right? Where we're saying, like, okay, I see it's a two in scope. So, like, don't bother some linking. Just like, that's a two. We know it's a two, like, it resolves directly to that, and then one is gonna be its own type, which is just gonna end up being unified to a number.

[00:01:17]
Okay, so everything's null. Yeah, question, right?
>> Speaker 2: Why are you able to just so, in like, the past example, you had a const, and then you reference that same variable down there. Why did you have a different ID in the database in that case? But here you're able to,

[00:01:35]
>> Richard Feldman: Gotcha, just because I was doing the like, demonstrating that this is sort of like the. This whole thing is following the like maximum, like use variables for everything thing. You don't like, you don't need to do that. You definitely can just like cut to the chase and be like, yeah.

[00:01:49]
Like we know that x is 2. So, like, we don't need to like make a new variable and simply get to say that they're the same like that. The point in both cases is that like, if you have knowledge for sure that these two things are going to be identical, you have a choice.

[00:02:01]
Like, either you can, like, assign the thing and then, like, immediately turn around and link them together. Or you can just say like no, these are gonna be identical because the whole point of like the sim linking is that it's supposed to mean that these two are so connected that it doesn't matter which one of them gets changed.

[00:02:18]
And it's gonna affect the other one anyway, so either way should work equally well, right? So, we've got all our variables in place, and now we're gonna start going through and inferring things. Now, keep in mind that at this point we have no call sites, nobody is calling this function.

[00:02:31]
All we're trying to do right now is we're trying to use the information that we have about this function, namely that it returns x +1, to learn things about what are the different pieces that go into this function? Like, what can we say about that function just using the information within the function itself?

[00:02:45]
And then in the next section, we're gonna get into callers 'cause they can also influence the function types. So here we have unify(0, 1). So this is basically our classic, like unify this with this. As previously noted, you could shortcut that if you want to. We're gonna really keep that consistent with how we've been doing it.

[00:03:02]
I'm keeping a little running tab over here, of what is the reason for these unifications? Because we're gonna end up with a quite a little stack of them here. Next one is gonna be we're gonna unify four with number. So, the reason to do that is that basically, whenever you have plus, much like when you have multiplication in this language.

[00:03:18]
We're choosing to say that it always returns a number. Obviously, there are plenty of languages where it could be a number or string, or something like that. But probably this is like, yeah, any time you see a plus just like any time you see a multiplication, that means we immediately know that thing is a number, no further analysis needed.

[00:03:34]
Next, we unify 2 with number. So 2 is x over here and that's once again, gonna be for exactly the same rule, except applied to the operands. So this is saying, just like with multiplication, if you're on either side of the plus, you must be a number, which leads us directly to 5.

[00:03:50]
There's actually multiple reasons that 5 needs to be a number. One is it's a number literal. And even if it weren't a number literal, the plus would cause us to do the same unification here. So basically, like we're doing two unifications, two unifies here, and both of them are exactly the same thing.

[00:04:05]
They're gonna have the same result. It's always harmless to redo the same unification multiple times. It's like a waste of performance. But point being that one of them is because of plus, and plus says, hey, unify both operators was number, and the other one is because of number literal.

[00:04:21]
I don't know of any precedent for trying to be fancy and, like, elide something like this and try to say, like, hey, this is a waste of time. Don't do it. There might be a way to do that. I'm not sure. It probably wouldn't apply in all scenarios, but there's probably some way you could be a little bit strategic about like, hey, I'm about to go unify this.

[00:04:38]
Let me go see if that's a number literal, because, if so, we already did, like, when you're still at the parse node level. But yeah, I said, I'm not aware of people like trying to get that fancy with unification yeah.
>> Speaker 3: The algorithm is called worst case optimal join.

[00:04:52]
You can run that.
>> Richard Feldman: Worst-case optimal join okay?
>> Speaker 3: Yeah.
>> Richard Feldman: I guess there was a way to do it. Cool, good to know. Okay, and then finally we have a unifying 4 and 3 and that's because of the return. So, what 4 and 3 are basically saying is like, okay.

[00:05:05]
So this is the entire expression that's being given to this return statement, and then 3 is the actual return type of the function. Now, the reason it's important to connect these is that return is statement that can appear multiple times in the function. So, similarly to the ternary that we just learned about where we had the two different branches and the only rule was they have to be connected.

[00:05:24]
That's exactly the same relationship that we have between return and the function's return type, except instead of it being two branches, it's one or more [LAUGH] branches or one or more returns. So we could have a bunch of conditionals up in here, and each of them is doing its own early return, for example.

[00:05:40]
And if we had that, every single one of those return statements would need to get unified with this return value, which in turn means they all need to agree. So in other words, if I have five different return statements in this function, they all better be numbers or whatever.

[00:05:54]
But one of them are numbers, and one of them is a string, then we got a problem. That's gonna be a type mismatch. Doing this unification one time for every single return we encounter is the way that we both do that, while also inferring what is the actual return type of the function.

[00:06:07]
Was there a question?
>> Speaker 2: You might go through this, but I'm struggling to understand, when you unify 4 and number, how you're inferring that number type without direct. I would think you'd have to go to the + is where that comes from.
>> Richard Feldman: Yeah, so basically, the rule for + is the same as the rule that we, for this language, we're choosing to have the rule for + be the same as the rule that we chose to multiply, which is basically just that.

[00:06:34]
+ means, whenever we see +, we're always gonna unify this with number. And we're always gonna unify this with whatever's there, because we're saying plus takes two numbers, and so I have no idea. Without even looking at this, I'm like, whatever this thing's type ID is, it's getting unified with number.

[00:06:49]
Whatever this thing's type ID is, it's getting unified with a number. And also, whatever my entire return or return value is the wrong term, like, whatever I evaluate to that four is also getting unified with numbers. So, x is getting it, because it's one of the operands, one is getting it because it's one of the other operands, and then separately it's getting it again, because it's a number literal.

[00:07:10]
And then the entire thing that is to say, the four is getting it, because that's what the plus returns, so to speak. Does that make sense?
>> Speaker 2: Yeah.
>> Richard Feldman: Okay.
>> Speaker 3: In another way of saying this, we set some rules that the plus operator has a fixed definition.

[00:07:26]
>> Richard Feldman: Yeah, so that's the language design thing, right? Every language has some set of hard-coded rules that in some languages, have more than others, but the hard-coded rule that we have decided to use for this language, that this is how plus works. We could make it have totally different semantics, but these are pretty reasonable semantics, and also, give you a good intuition for what's going on here without having gotten into function calling.

[00:07:49]
Which is an even fancier version of what we're doing here. All right, so just going through the actual unification. So, this one basically says, all right, simply get to 1. Next one is a unify 4 with number. So 4 was null, now it's number. Same thing with 2, now it's number.

[00:08:05]
5, again, was number. These two, it's like it already is number, so we're fine, we'll leave that alone. That one's basically a no-op or wasted work. And then finally, unify 4 with 3, and so that basically results in 3 being symlinks to four, which means it yet again is going to end up being a number.

[00:08:20]
Now, we may notice that we finished all of our unifications, and at this point we still have somebody being null. Now, as previously noted, in future examples, we will actually have functions that like where that stays null for real. But in this case, we actually do have some, sorry, that's the wrong one.

[00:08:38]
I was referring to one, right? So one is the type of the entire function. So, the type of the entire function, yeah, let's talk about that. Basically, what we have here is we have a concrete type that has more information than what we've seen in a concrete type before.

[00:08:54]
So previously our concrete types are just like, I am a concrete string. The end, goodbye or I'm a concrete number. The end, goodbye. Here we have, I am a concrete function, and I have more information. Now this more information is very important for our type checker implementation, because we have to take this stuff into account.

[00:09:11]
For example, when we were saying if I'm unifying this string and this number, all I need to look at, if this field is different, it's a type mismatch, game over. That's still true here. If you look at this field, and you're this is a function and that's a string, type mismatch.

[00:09:27]
This is a function, that's a number, type mismatch. But if they're both function, it's actually not sufficient anymore to just conclude that they automatically are unified and everything, it's fine because there's more information here. So, if with strings and numbers, if it's like they have the same concrete type, we're in that branch of unify, where we're like, yeah, we're done moving on.

[00:09:48]
But if it's specifically two functions, there's more work to do, because these functions could have different argument types. They could have different return types. Basically, what we need to do at this point is say, well, before we can answer the question of like, what's going on with these functions, we essentially need to recursively unify.

[00:10:05]
We basically need to say like, now that we've gotten to a point in unify where we have two functions, the next step is to go through and unify all their arguments and unify their return value. And any one of those could produce a tightness match absolutely. But basically, the reason that we have to do that is because otherwise we are just completely disregarding these things.

[00:10:24]
They're sort of sneaking by. That's how you get type unsoundness is where you have types that are participating in type unification. But you're not actually constraining them the way that they're supposed to be constrained. We've just like missed some constraints. So this is one of several complications that are going to end up being added on to our unify function and that are gonna make it ultimately more complicated than the, well, it still took up like an entire screen.

[00:10:48]
But the simpler version that we talked about two sections ago, back when we didn't have any functions to think about. So, the way that I'm gonna represent this in the database is like this. So, this is basically just like a little rotation I made up for, this is the argument.

[00:11:02]
The argument is a sim link to 2 in this case. And then the return value is essentially a sim link to 3. This is really just me trying to fit something concise on the slide, but in the actual compiler, this is how we're gonna represent it. We're gonna say you have a concrete function.

[00:11:18]
Here are your ARGs, and then here is what it returns. And I said returns, not because it's portable, because return is a keyword in JavaScript, so I can't call it return, but you can also say ret, like is short for return. But I figured, if you're not used to seeing ret all over the place, like I have a compiler code bases, returns is probably reasonable.

[00:11:37]
Okay, so actually, sorry, before we move on to calling. Any questions on defining functions, yeah?
>> Speaker 4: I guess I'm curious. Like, when you represent the function instead of the database, I mean, obviously, you have args is to like a concrete value, but like, would you want that to be?

[00:11:55]
Like you said, like, number, right? And then returns would be also number as well. Or would you just stuff the actual, I guess, like node in the database itself.
>> Richard Feldman: I didn't quite follow. So which thing are we stuffing in the database or not?
>> Speaker 4: The function definition, if we have like.

[00:12:14]
>> Richard Feldman: Yes.
>> Speaker 4: So maybe I'm misunderstanding it a bit. Is like on the bottom left, that object with concrete function?
>> Richard Feldman: Yeah.
>> Speaker 4: Is that the exact value that would be stuffed?
>> Richard Feldman: Yes, yeah, right. So, in the same way that when we say number in the table over here, what we're actually meaning is concrete colon quote number.

[00:12:31]
>> Speaker 4: Yeah.
>> Richard Feldman: It's the same exact thing, except that it's concrete function, and then there's more information than there is in the case of number.
>> Speaker 4: Gotcha, so I guess what I'm wondering, and sorry if maybe I misunderstood you like confused. It's like for like args we have two, like the value two, but would that be?

[00:12:46]
>> Richard Feldman: That's like this.
>> Speaker 4: I see.
>> Richard Feldman: Type ID two yeah.
>> Speaker 4: I'm sorry, that makes sense. That was like the value too.
>> Richard Feldman: Yeah.
>> Speaker 4: Okay.
>> Richard Feldman: Yeah, so I like, I don't know. Maybe I should have represented differently here. I just feel like I said.
>> Speaker 4: I think I have a citation.

[00:12:58]
>> Richard Feldman: But it's the type ID and then usually the way we refer to. It's whatever that type of years like the sublink notation here, but yeah, this is the source of truth. This is what's actually the compiler [LAUGH].
>> Speaker 4: I was evaluating it and you put x as 2 and term value is actually 3 sub, that's why I think there's a bit of confusion on my part.

[00:13:16]
>> Richard Feldman: Yeah, so 2 is the argument, 3 is our term value, yeah?
>> Speaker 4: Correct, but is you saying that's a value in the database or the key, the ID?
>> Richard Feldman: It's the ID, yeah.
>> Speaker 4: Yeah, I was actually saying the actual return value.
>> Richard Feldman: I see.
>> Speaker 4: So that was just my confusion.

[00:13:30]
>> Richard Feldman: Okay, got it.
>> Speaker 4: Yeah, exactly.
>> Richard Feldman: Okay, cool, yeah, and it is worth noting that much like the symlinks, these can change over time, as other things affect them outside the definition of the function, such as function calls [LAUGH]. So let's talk about those.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial