Check out a free preview of the full Building Your Own Programming Language course:
The "Lexing Numbers & Letters" Lesson is part of the full, Building Your Own Programming Language course featured in this preview video. Here's what you'd learn in this lesson:

Steve goes over the first exercise, live codes a test that looks for numbers and letters in order to tokenize them, and explains how lexers tokenize numbers.

Get Unlimited Access Now

Transcript from the "Lexing Numbers & Letters" Lesson

>> Steve Kinney: Someone unskipped these tests, I have not done the work yet, I'm gonna unskip them and watch the world burn.
>> Steve Kinney: Just verify that that in fact happens.
>> Steve Kinney: Cool, at a run but a crash.
>> Steve Kinney: This is never a good sign.
>> Speaker 2: [INAUDIBLE]
>> Steve Kinney: Yep, we have two things that we might consider doing, right?

[00:00:34] Is we can go into that tokenize. You can add a cursor++ at the bottom of the while loop. Also what we might want to happen is if we come across something we don't know about, we might wanna just throw an error, right, and bust ourself out of that loop that way, right?

[00:00:50] So we do something like,
>> Steve Kinney: You can say,
>> Steve Kinney: Character is not valid, or something along those lines, right? Cuz it's probably, at that point, if it has fallen through all the things we know to check for it is probably something that we can't parse later as well.

[00:01:17] So we might as well kinda throw there early on. There's lots of things that we can and should do, right? When we get to syntactic analysis, I believe it was a pretty common coding test which is check to make sure you have as many closing braces as you have opening ones, right?

[00:01:34] Stuff like that we're gonna skip but that's also stuff that you should totally consider doing. All right, so now it blew up. Which is fine cuz I unskipped the test but they kinda fell through. So yeah, you can either cursor plus plus and just deal with the fact that the test will fail.

[00:01:52] Or we can throw it there and which is probably getting more safer if we make mistakes later. So I'm gonna keep the throwing in there cuz I think it's probably the right thing to do. But yeah, we got to see what it feels like when you don't end that loop.

[00:02:07] All right, super cool. So this is gonna be pretty straightforward. If we look at the test, there's a type number and a type of name, right? Because strings will have quotes but names are kinda like function names or variable names or stuff along those lines. We can go ahead and say if number is a nice helper, we saw it before, it's great.

[00:02:35] If is number for the character, right? What we'll do is we'll take, let me grab this.
>> Steve Kinney: Type number and then we do need to parse in the numbers or you can do the little plus sign, whatever makes you happier. And you can put a ten if you want.

[00:02:55] And then for the,
>> Steve Kinney: If it's a letter,
>> Steve Kinney: What we'll do is we'll paste this in again. We'll say name, character, cursor, continue. This should pass my test. Right, all right, we are half way there. So we've got that in place.
>> Steve Kinney: Let me see, we might do the skip, let me check up the texts really quick.

[00:03:28] So it's got the single number, let me go ahead, I also did the letters as well, so we can undo that one. We're gonna see a problem with this next one, which is not surprising. All right, so let's look at our multi-digit numbers, right? We wrote the code, we can we have a sense of what's about to happen here, yeah.

>> Speaker 2: A student online commented that you could do MPX just dash dash watch and then tokenize.
>> Steve Kinney: Yeah, my problem is if I am hitting save and I forget to iterate the loop or something like that, I'm going to basically lock everything up in a very sad place.

[00:04:13] So I'm mostly doing it cuz if that happens and my fan starts spinning and all that fun stuff. But yeah, I thought about that. I tried that, early on Ted Price [LAUGH]. Right, if you don't make mistakes, go for it. If you are both talking and coding at the same time, not great.

[00:04:31] [LAUGH]
>> Speaker 2: Good to know.
>> Steve Kinney: Yeah. Cool, so this fails and for a reason it's not super surprising, right? Instead of getting 22 or 11 and 22 we're getting one, one, two and two, all right? Because we don't have the ability to look ahead and see, okay, is there more to this number or not, right?

[00:04:57] So we're breaking up and we're seeing each one of those. So what we need to do is saying, okay, if we come across a number, keep looking. Look ahead, is the next one a number? Cool, it's part of the same number. It's not a number? Right, okay, neat, right?

[00:05:12] We can do something else with it, right? But we wanna make sure that we kinda are looking ahead and seeing what's going on next. So we'll go ahead and I will implement that. So we do it for numbers. And so what I need to do is I'll start with the one that I got, right?

[00:05:34] So I'm gonna start with a new number. And right now it's still a string, so this is great cuz I can effectively use the fact that I can add the strings really easily. I can say number is the first character I came across cuz we know it's a number, all right?

[00:05:47] And then we're gonna say while is number, I'll look at the next one. So input, and this is where we're gonna iterate that cursor.
>> Steve Kinney: Right, look at, and we have white space breaking stuff theoretically, there are a lot of little edge cases, to our syntax that if we wanted to spend the entire day writing in lexer, we could totally cover every last one of the edge cases.

[00:06:09] And you would be bored out of your mind. We're gonna assume that the number is over cuz it has had whitespace. So that's when we iterated the cursor anyway. The ++ as a prefix says, bump the number and then use it, right? So this will get us the next one in this case.

[00:06:33] So if the next one is also a number we'll kinda pop it on there.
>> Steve Kinney: Right, it's not the characters that got set up there. And so we'll keep going until we come across something that's not a number which we assume is gonna be some white space. And then we will go ahead and take this new number that we have, and push that on instead.

[00:07:01] And cool, we shouldn't need to iterate cursor at this point, famous last words.
>> Steve Kinney: All right, someone's passing. So I did numbers, you're on the hook for doing strings, just to get the muscle memory in place. And then we're gonna skipping, it should ignore white space, it's just gonna be with other characters.

[00:07:26] And then we ignored all white space, we should be able to get that one for free. And then we'll just quickly do strings and we'll be done with our lecture at that point.