Check out a free preview of the full Building Your Own Programming Language course:
The "Lexing: Lexical Analysis" Lesson is part of the full, Building Your Own Programming Language course featured in this preview video. Here's what you'd learn in this lesson:

Steve defines both lexical analysis, or lexing, and syntactic analysis, which are both a form of parsing, and then dives into how lexing works. - http://static.frontendmasters.com/resources/2019-05-31-build-your-own-programming-language/programming-language.pdf

Get Unlimited Access Now

Transcript from the "Lexing: Lexical Analysis" Lesson

[00:00:00]
>> Steve Kinney: So we're gonna start with parsing, which is kind of the first part. We have to figure out what we're working with, right? And parsing we can kind of break into two pieces, lexical analysis and syntactic analysis. One is like, we don't just want every character, we want the kind of like pieces of it, right?

[00:00:18] We want the given keywords. We want an entire string as a piece of our code and then giving an entire multi digit number as a piece of our code, right? We need all of those things kind of separated out first, like we might not care about white space.

[00:00:32] We might throw the semicolons in the garbage can, right? Any other things are fair game, but we need to figure out what are the pieces that are important and kind of break them into those meaningful chunks. Once we have those meaningful chunks, we can figure out, okay, now that I know the words, what does this sentence actually mean, right?

[00:00:47] And I can begin to figure that stuff out. They're technically both parsing, right? But we're going to kind of split them out into two because it makes a little more sense. So let's first talk about lexing, which is how cool kids say lexical analysis. Basically, like I said, it's taking the string and turning it into what we call tokens.

[00:01:03] So what are the tokens? The given parentheses are tokens. The keyword 'add' is a token. The numbers 1 and 2, and 6 and 3 are tokens, right? These are the pieces. I don't want every I don't want a d d space 2. I want these pieces I can start to think about.

[00:01:24] So how might this work, right? We would basically have a string of code. The way we're gonna do it is we're gonna have a cursor, right? Which is just the position of where we are in that giant string of code, right? And as we will kind of iterate that cursor through, kind of looking for things, right?

[00:01:52] We came across a word, great. Let's start scanning that word until we come to the end of the word. Cool, that's a token. Put it in the token bag, right? It came across a number. How many digits does it have? We need to collect them all. There's a quote?

[00:02:05] This must be a string. Go until you find the next quote. Put it in the token bag, right? It's a parentheses? I need that. Go to the list. I need all the parentheses. Give me each and every one of them. White space? I don't care about white space, right?

[00:02:17] Other than the fact that it breaks up numbers and words, once I've used it as a boundary, it doesn't matter to me what happens, right? And we just keep putting everything into the array of tokens. And then we'll go back through that array once we've gotten everything broken up into chunks.

[00:02:31] And that is when we'll be able to kind of figure out what it all kind of comes together and means.