Inferring Conditionals

zed.dev

Lesson Description

The "Inferring Conditionals" Lesson is part of the full, Write a Compiler That Understands Types course featured in this preview video. Here's what you'd learn in this lesson:

Richard discusses inferring functions, starting with conditionals, which serve as an introduction to inferring functions. He also explains the process of inferring function declarations and function calls, emphasizing the need for type consistency within conditionals and the unification of types in programming.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Inferring Conditionals" Lesson

[00:00:00]
>> Richard Feldman: All right, let's move on to part five. Let's talk about actually inferring functions. This is a big part of programming and unsurprisingly a big part of type checking. So let's get into it. First off, we're gonna start off with conditionals, which are not functions, but have a good number of things in common with inferring functions.

[00:00:15]
So, since we wanna learn how to infer conditionals anyway, they're kind of a nice little on-ramp. We're about to talk about next, which is in first inferring function declaration. So, this is we're actually writing out, this is the body of the function. These are the arguments that states, etc.

[00:00:27]
And then finally, actually inferring function calls. So, calling functions, as it turns out, is actually a little bit of a different beast than declaring them when it comes to type checking. Okay, so we're gonna start with conditionals as sort of the nice little onward ramp here. So again, we've got our const an equal hi, this time we're gonna, instead of the classic B equals A, or something like that, we're actually gonna define a variable called Cond, which is gonna be the condition of our conditional.

[00:00:50]
So this is like the if, it would be what comes in the parentheses right after if. But since we're doing only ternaries here, for reasons that we'll talk about later, we're just gonna say Cond equals true, and that's what's gonna be our condition in our ternary. And then finally, we actually have the ternary.

[00:01:05]
We're saying const b equals and then con parentheses a colon other. The reason we're doing ternaries here is that basically if you know how to do ternaries when it comes to type inference and type unification, if it's very simple. It's basically just like this part isn't allowed, so there's just strictly less to think to think about, especially if you've already got a concept of statements, which we do in this language.

[00:01:25]
So, yeah, if you want to think about it as if you just can, but ternaries are more interesting because they evaluate to a value, and so we get to talk about that aspect of how we unify all that. All right, in this language, we are not gonna get into truthiness.

[00:01:40]
Somebody asked earlier about scenarios where you have like plus can accept either strings or numbers. Truthiness is in a similar boat where like if you're trying to infer the type of con and you're like, well, it could be this or it could be that or it could be that.

[00:01:54]
Truthiness, I guess, can theoretically go as far as to say like it could be anything and we might accept it there. We're not doing that, we're just gonna say this has to be a Boolean, that's the rule. Cuz if you're going all the way, then you could actually just say like, there is no constraint.

[00:02:08]
It's just literally you can put whatever you want there and it'll type check. I guess that would work, but then that's not very reliable at runtime. Pretty much every language that uses Hindley Miller has the rule that it's like, no, if you're doing a conditional, the condition has to be a Boolean.

[00:02:20]
True or False, that's it. So that's what we're gonna do here. And then the other rule that we have here is that we have two branches for the condition. But unlike with the multiplication scenario, we don't know what those are ahead of time. We don't know these are both numbers.

[00:02:35]
We just know they got to be the same. So why do they have to be the same? Well, we're trying to assign them to be. So imagine if you're like, okay, this one is a string, but this other thing, this isn't a string, this might be a number.

[00:02:46]
So if you take this branch, you get a number. You take this branch, you get a string. Well, then okay, fine. So what's the type of B? It's like string or number? I mean, yeah, that's maybe nice. You see why Typescript has union types as a language feature, but we're not gonna get that fancy.

[00:03:00]
We're gonna say, no, no, no, it's If you're gonna have a condition with two branches, and either those branches can end up being assigned to something here cuz it's a ternary then guess what? I don't know what type they are. You can put whatever type you want there, but it's got to be the same type.

[00:03:12]
They have to be able to unify with one another, otherwise type mismatch. So, this is gonna be our first example of a type mismatch that can come without actually having a hard coded type involved at all. We're just saying, hey, if these are two different concrete types, I don't care what the concrete types were, if they were different, type mismatch.

[00:03:29]
Okay, so really, what this sort of reduces down to is, essentially, it's like, okay, these three things all have to be the same type. Whatever type they are, they're all gonna end up being the same thing. These two have to be the same type because that's the rule of the branches.

[00:03:43]
And then this one has to be the same type as those because it's being assigned to them. So, it's either going to be assigned to a or it's gonna be assigned to other, depending on whether Cond is true or false. And so because of that, we basically can just simplify that down to just saying, like, look, all three of these things have to be the same type.

[00:03:58]
They all need to get unified together with one another. Okay, so let's go and do that. So as we do more and more of these unifications, I'm gonna kinda speed run a little bit setting up the Ids that we're gonna put in our database. So here we go this is basically the same kinda stuff we've seen before, a is 0, a is symlinked to 1is a string, 2 is symlinked to 3, and 3 in this case is a Boolean.

[00:04:23]
I'm gonna use the term Boole rather than Boolean. I'm not gonna lie mainly because I didn't wanna have it run off the side of the slide. But there was a little bit of interesting history there, which is that booleans are invented by the logician George Boole, which is spelled B-O-O-L-E.

[00:04:37]
We totally have room for an E there, but I'm just gonna follow programming tradition and just mess with that guy's name. For no reason, just like, chop off the E and then some languages go as far as to make it longer. They go like, B-O-O-L-E-A-N, like Boolean and then he still not using his name.

[00:04:51]
So really, for whatever reason, the programming rules has decided that this logician, we will never use his exact name. We will only either truncate it or extend it, who knows? Anyway, okay, so, now at this point, we've got our basic types here. We've got our types database populated, and what we want to do is we want to start setting up the relationships between the conditional and its two branches and this return type that it's getting assigned to.

[00:05:16]
So, we're just gonna go ahead and assign some numbers to those, and then we need to add one more here, which is for the entire expression itself, like the ternary, like what it evaluates to. This is spoiler alert going to get similar to B, but it's important that we do have that as a concept because in not all cases are we going to be like immediately assigning this to be like a named thing.

[00:05:37]
You can also have a ternary like in the middle of a function call or something like that or like an Array element. So, it's pretty important that we do have some type in general, like in the general case, to track, okay, what is the inferred type of this whole thing, so that we can unify that with these other things.

[00:05:53]
Okay, at first we initialize these to null. So, let's just start by unifying this one, just to sorta get that one out of the way. So, it's like, alright, initially B, we've got an idea for initialize it to null, and then we're gonna unify 4 and 8. Why are we doing that?

[00:06:07]
Because just like what we always do with with conditionals and equals, it's like you got the thing to left the equals, that's your the ID, and then you have equals, and then you have what's whatever's on the other side of the equals. And there we've conveniently made this entire eight thing here, so we're just gonna similarly do that.

[00:06:23]
It's exactly the same thing as what we did in the first two consts. Okay, so now let's move on to five here, which is the condition. This is another easy one because we've said that the rule is that anything that is in a ternary condition has to be bool in this language.

[00:06:39]
So we can basically just pretty directly say, yeah, okay. Like, we know that cond is similar to two because, or, sorry, not simply to two. It's it resolves the two, because we're saying, like, yeah, that's, that's just what that, that look up means. And then we can take that one more step and say, like, okay, this is a bool because it's in that position.

[00:06:57]
So, in exactly the same way that when we were doing the multiplication thing, we said, both things on either side of the multiplication operator have to be numbers. With ternaries, it's exactly the same thing, except instead of it being on either side of the operator, it's just like this particular thing has to be a Bool.

[00:07:12]
And then, yeah, we do the resolving it thing. And sure enough, we say, cool, that does end up resolving to a Bool. So, all is well, the fact that 2 was assigned to a Bool and also, we have the constraint of that thing being a Bool. They're both Bools, so we're all set.

[00:07:27]
If in contrast, we had said that this is a string, so the string not a Bool in this case. Then if you go through the same unification steps, you end up with a string attempting to unify with a Boolean and then you're just gonna get a tight mismatch.

[00:07:40]
So, this is basically just like how the types are flowing through to end up with us arriving at a type mismatch for this condition. Okay, but so let's put it back to true, because that's actually what we want to do with this scenario. So then we get to the interesting position of and I'm kind of scrolling the table down here a little bit.

[00:07:59]
So, iDS 01, and two are kind of off the screen there, but they're still there, I have limited room on the slide. So at this point we're saying okay, now we wanna deal with 6, 7 and 8. And these are the things where we start to get into the scenario of saying, it's not that we're trying to unify them with a concrete type like bool, like we just did.

[00:08:18]
Now we're just trying to unify them with each other. So initially they're all gonna be null. And then we're gonna do our unify of 6 and 0 to basically say like okay, this you know a is lined up with that thing which we get out of the scope.

[00:08:32]
7 is unified with string because it's a string literal. We just get that from the parse tree node id like, yep, that's a string literal. So that's just gonna be a string. And then finally, we unify 6 and 7 because we're saying these two have to be the same.

[00:08:44]
Whatever types they have, they are connected in that way. Of note, we do not subsequently go and unify seven and six, in part because we don't want infinite loops, but also because anytime you're passing these two arguments to unify, we've already established the relationship like we talked about in the previous section.

[00:09:01]
So, there's no need to do the symmetrical one where you unify 7 and 6 as well. It's totally sufficient to just pick some ordering and just make sure that those two are connected. Okay, and then yeah, sure enough, that checks out because we unified 6 with string. So when you unify 6 and 7, it's all good, it's just a string.

[00:09:20]
Now, if, in contrast We had done it like this, where 6 is a string from high, back up here, but seven is a number. Then, once again, just as we had with the Boolean scenario, we would get a type mismatch, not because there was any hard coded type in here, but just because those two have to agree, and now they don't, so put it back to a string.

[00:09:41]
Okay, and then finally, we have the last one, which is just basically saying, like, this whole thing needs to unify with one of them. So, I just picked the first branch at random. We could have also picked like the other branch, but again, the point is that the entire conditional is always going to evaluate to either one branch or the other branch.

[00:09:58]
But since we said like, these things are connected and we can just pick a branch and say, yeah, you have to be the same as one of those two branches worth noting, by the way, that there are other ways you can represent this. So, I've kind of done for this section, the maximally symlinky kind of way of doing things, if you want, I guess I haven't tried this, but conceptually, there's no reason it couldn't work, that you could sort of take a shortcut and say, like, do we really need this extra type of variable here?

[00:10:24]
Like, why don't we just have this like, all right, look like, when I'm Sim linking B to something, I'm just gonna say, like, yeah, just simply get to a just, just go straight to just go figure out what's in the first branch. Go take a little peek at the structure there, and just say, yeah, I see what's going on over there when I'm still back in the node stage.

[00:10:40]
And say, like, I'm not gonna bother with this whole like. Formality of having an extra expression that's just gonna get symbolized in this anyway, I know that I'm going to link all these things together. So, I can just delete some of these and just get them out of there and just like cut right to the chase.

[00:10:53]
There are probably trade-offs there in terms of like, I don't know, how easy is your code to read? How obvious is it like what it's doing? You're saving a little bit of memory if you don't do that. But I can't think of any reason why either wouldn't work.

[00:11:07]
But at least the way I've learned it is sort of like, just as a matter of course, it's like, yeah, just give all these things variables and then make sure you get all the relationships right, that's the important thing. There is probably a corner you could cut here if you wanted to squeeze out a little tiny bit of performance in terms of just, sort of cutting out the intermediary, just going direct to the source.

[00:11:25]
Okay, yeah and then there was that question, hey, do we need to do the symmetrical one? And just like before, the answer is no, we don't. If we've already unified eight with 6, there's no reason to unify with 7. All we ever need to do if we're ever unifying with one branch, since we already unified the one branch with the other branch, there's just no point in.

[00:11:43]
It's overkill to unify it with the other branch as well, because that branch is already unified with the other branch. Okay, finally, there's path compression happening here. So, the only other piece to all this is that basically it's gonna go through and change all these things around, and eventually we're gonna end up with a perfectly well inferred conditional, where we have successfully said that this thing is a Boolean.

[00:12:09]
This thing is also a Boolean. We verified that those line up, this is a string. This is also a string, those line up, the whole thing is a string, because both of their branches are strings. And then finally, if anyone was curious, b must also be a string.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial