Building Your Own Programming Language Parsing Solution
Transcript from the "Parsing Solution" Lesson
>> Steve: So I'm gonna un-skip them both,
>> Steve: We haven't implemented any recursion yet, so we shouldn't have any headaches just yet.
>> Steve: npm test, and we'll say parse.test.
>> Steve: I'll try it out and they fail which is what we would expect which is nothing gets returned out that function because we never find them return anything, right?
[00:00:32] [LAUGH] Sometimes it would be more worrying if that test passed, right. So all I'm gonna do is I'm gonna just grab this,
>> Steve: And we'll say if it's string, give me that 'Stringliteral'. And if we look closely at the test for the,
>> Steve: Identifier, you can see there's actually a name.
[00:01:13] So that we would get the hang of using the visitor pattern to transform the AST, but not have to do so much work that we spent bogged down in the ways. We've learned the concept and be able to do it but not have to like drilled into the ground.
[00:01:29] So I'm gonna kinda follow the path and that one, I'm gonna use the name there as well, awesome. So I'll go ahead and I've got the string and now we'll get the identifier. So it's a token type of name, we'll go ahead and we'll do with the identifier.
>> Steve: And this will be the name property.
>> Steve: Awesome, so we've got those two in place, now we wanna a single call expression where I just add 2 plus 2 or add 2 and 2, right, it'll do the plus part eventually. And then the idea of having nested function calls like what happens if you have a function inside of a function, right?
[00:02:16] There's a great opportunity for an exhibit meme that did not put in the slides, we can edit that at in post. So we need to go ahead and start to figure out how to implement that. Which is like that's where the ecurssion part, right now, we just have the ends.
[00:02:31] We have the places where the recurssion will end, right? When you get to one of the single values that can't have anything nested, we have those test passing, so how might we do this? Well, there's two things that we wanna think about, like we were gonna cheat a little bit.
[00:02:45] And we're gonna use the fact that we have a race to be kind of a data structure for us that really makes sense and allows you some pretty cool things later on. So we've got this ability to parenthesize, I'm not gonna say that word again out loud. Which is the ability to kind of go through, and this is using the parentheses to guide the structure of our data, right.
[00:03:09] Which is you know, that if you see a parentheses you've entered a function, but if you see another one before you see the closing parenthesis, you've entered a nested function. Right, and so we can kinda of create ourselves a data structure that we can use in order to make sense of that.
[00:03:23] So I'm gonna write a version that is not going to work. [LAUGH] I will tell you why it doesn't work before I ran the test to see why it doesn't work, right but see if you can spot it as I write it. So before we use that cursor, and I think Jason asked me question of like can you use arrays, and now I have an array of tokens so yeah.
[00:03:42] This is where we will be able to use peak and pop again now, so I'll say const token is go ahead and pop off the action of the tokens. Can someone be on the lookout for when I forget to put an S so I put an extra S on token at one point or another it's going to happen at one point, I promise you.
[00:04:02] It's not like a hidden thing and we do on purpose. So if it's an opening parenthesis,
>> Steve: The token of value, now we know that we've entered one, right, we've entered an expression in our case. So we're in there I'm gonna say okay, I'm in an expression, I'm not a number, a string, or an identifier.
[00:04:23] I need to go in and I need to find one so I'm gonna start a new expression.
>> Steve: And we're gonna say cool, and now until we come across a closing parenthesis, we know that we're going need to put things onto that expression.
>> Steve: So we'll go ahead and we'll say,
>> Steve: While it's, get that on the right side there, while it's not is closing parenthesis. We'll go ahead and say for the token value.
>> Steve: All right, in that case, no, it's not anything I ever wanted. We'll go in and say, okay, then you'd go in onto the expression.
>> Steve: Cool, when you put the token on. This isn't gonna work that doesn't really matter, and then we'll pop one off.
>> Steve: Cuz when we get to the closing parenthesis we need to, we don't need it. Basically when we when you see me pop, or iterate the cursor and not do anything with it, it's because I've reached something that I wish to throw into the garbage can.
[00:05:37] I don't need the closing, I didn't do anything with the opening brace, I don't need anything with the closing brace either, so we'll do that.
>> Steve: And then we'll return the expression and that'll be the process of parenthesising it. So, what's gonna happen with that? Well, what's gonna happen with my nested expressions, right, or anything along those lines, like we need to also continue to do this until we hit anyone those edge nodes.
[00:06:10] So it means, we need our good friend recursion, right? And recursion will say like, okay, look at the next node, are you also a call statement? Well then I need to dig in and do the same thing for you. So this will be where we,
>> Steve: Parentheses eyes, all the tokens again and keep iterating through them until you get to the end, right.
[00:06:32] Like when you get to something where this is not true at all like where's this token, and then we will return all the tokens. This eventually becomes a no op, right, when there's no more as either edge knows that there's nothing left in the array, this condition will never happen.
[00:06:50] It will be like, okay, we've popped the last closing brace off, here we go. And then we're ready to rock and roll, so we do need to like kinda pull these together a little bit. So we've got this ability to go through but now we have these data structures and we need to actually work with them here.
[00:07:12] Like this, this parse function has no concept of what we did up here, right? It's like, cool, I know about strings, and I know about identifiers, and I know about numbers. I don't know about what you just did in the thing you did in the function that leads up to this one.
[00:07:27] So, we can go ahead and we can implement that, which is we know that if it's an expression, what data structure did we make it?
>> Steve: Right here, it's an array, right? And so right in here, we don't have any concept of that. So we need to say all right, cool, what we'll say is, this is actually gonna handle the pop into the tokens for us now.
[00:07:50] We'll say,
>> Steve: If,
>> Steve: If Array.isArray, cuz we're gonna pop tokens here now.
>> Steve: If Array.isArray for the tokens, then get me the first one, cuz that's gonna be the function that we're calling. And on a list, you can have as many arguments as you want, right?
[00:08:24] They kinda basically reduce them all down, right? So if you give add two arguments, it will add those two numbers together. If you give add 17 arguments, it will add those 17 numbers together, right? So, we'll say first and we'll grab all of the rest of them.
>> Steve: This is effectually also like popping one off cuz you're grabbing the first one and then you're grabbing all of the rest of them as well.
[00:08:49] And then in that case we're gonna return,
>> Steve: A type call expression.
>> Steve: The name, cuz we know the function name is that first one, is going to be that first token's value.
>> Steve: Now, if we want to be super compliant with Babble, it would actually use a full identifier in this case.
[00:09:14] And we would parse it like that, but that's really verbose. It all made the unit test more confusing, so I opted not to do that. Luckily, we have a way to rectify that later. So the arguments are all the rest of them, but those arguments also could be any of these as well.
[00:09:33] They could be tokens at this point, so they need to be handled recursively as well.
>> Steve: All right, so here we're going through, and we're looking for parentheses. We're trying to build up all our arrays, and it's finding all of the arrays for us, it's finding all of the function calls.
[00:09:52] But then when we go to parse it that's when we have to get those leaf nodes and turn them into the identifier node, the string node, right. So go ahead find me all of my functions in this first one. Once we've found the kind of boundaries of the functions, go ahead and turn the rest of those tokens into the actual literal nodes on my tree.
>> Steve: All right, let's see if those tests passed. I wrote a lot of code which always makes me nervous that I've made some terrible mistake.
>> Steve: What's great is it's been a little sad right now that, cuz we're still figuring out the semantics of the language, we can't use our language yet, right.
[00:10:34] But we got to the point where we can identify all the tokens. And should these tests pass, we've got another point where we can figure out their semantic meaning. So you can kinda guess what the next phase is. We know what the intent of the program is, now we gotta do something with it, so let's give it a shot.
[00:10:49] So I had like a few likes small errors that I need to rectify, I made a joke before I said I was gonna mess up pluralization at one point, I did. [LAUGH] Right, which is two things that I made a mistake on was. Opening parenthesis, we're checking if it's the only thing in parentheses.
[00:11:07] Then we were also checking the same value again, right, and you can't actually be both, right? And so what we wanna do is actually look at the next value. Is the next value the closing parenthesis? Then we're done, if not, I need to actually do stuff, right? So, the two things I needed to fix were to peek ahead at the next token, to see if it was the closing parentheses.
[00:11:28] And anything is not the closing parentheses we wanna put in right to look again, there is not gonna work. Also, when I said it should be a no op, because we want to get that token and return that token. I was returning the tokens plural, which means I was returning the argument again, which was not working, right.
[00:11:48] And so, minor mishap on the on the pluralization there, so let's kinda talk through it again. We'll get all the tokens, get me the first one on there, right if it's an opening parenthesis, okay, we're starting. Keep looking at the next one, if the next one isn't there, is that the case?
[00:12:06] We're gonna recursively keep going through, when you find that closing token, throw it in the garbage. We're done with our expression, we give you that back. If what we were dealing with was not an opening parentheses at all. Then, this was the kind of end of our recursion, right do we hit the leaf node.
[00:12:28] So we kind of call this tokens floral, but we know that a single token can come out of here, if there was no opening practices. Either an array is coming out, cuz we found opening parentheses, we started an array, we collected them all, or we didn't find opening parenthesis.
[00:12:46] Which means it was a single node, right, so within which point we just rename it token at that point. And we go through and figure out which token it actually was, that make sense? As much as like recursion can?
>> Steve: It's like the joke I made earlier, which is recursion when it works makes complete sense and recursion when your tests are failing it's I don't understand anything.
[00:13:08] [LAUGH] What is gravity, what's up?
>> Steve: No further, you can push on to it if it's you can mutate constant. You could not replace it with a new expression. So if I wanted to use array.concat or use the spread operator and places a brand new preferentially different array.
>> Speaker 2: Thank you for
>> Steve: Yeah, yeah.