This course has been updated! We now recommend you take the Complete Intro to Computer Science course.
Transcript from the "Tries" Lesson
[00:00:02]
>> Brian Holt: Tries, tries are trees. That's not a joke. I said two words. This is how you say this. Trie, and they're implemented as trees. Someone was being really clever, so I had to look up. Cuz why the hell do they call tries the way that they do. And the reason why is because they make it easy to retrieve data.
[00:00:28] And so that's why they named this data structure a trie. So, I'm sorry, I would try and disambiguate this, but please stop me if I'm being ambiguous.
>> Brian Holt: I've heard people call them tris even though obviously that's not the word for it. I don't know if it's obviously, it's obviously weird.
[00:00:52] But yeah, just communicate. Communicate as best as you possibly can. So good luck, I even wrote it there, so good luck to you. So what is a tree? It's a tree, LOL.
>> Brian Holt: It's specifically, a sort of tree that is meant to make doing things like auto complete, very quickly.
[00:01:19] The technical term of what it's doing is optimize for prefix searching. So pretend that we have a whole list of all of the biggest cities in the US, because that's exactly what you're gonna do. And if I type SAN, what are the cities that I should get back?
[00:01:37] Well, if I type in SAN, I'm gonna expect to see Francisco, Jose, Diego, all the cities, Santa Barbara or Santa Monica, all those cities that begin with SAN. This is what tries are specifically designed to address. So a trie starts out with a root node. That doesn't represent necessary anything, it's not gonna to have any specific value associated with it.
[00:02:06]
>> Brian Holt: Often given the value of empty string. It's going to have a bunch of children node, you don't actually know how many. In the case of the English language, assuming everything is lowercase which you should assume everything is lower case. It could have up to 26 children, right?
[00:02:23] Well, you don't know that yet, but I assure you it's up to 26.
>> Brian Holt: So the first root node doesn't represent any particular thing, but the second node will represent the first letter of what you could be searching for. So you could be searching for Boston, or BO as I have down here.
[00:02:43] The first node there is going to be B, right? And then, what you're gonna do from here is we're going to branch based on all these different letter combinations. So using this example, if I type in b, I'm gonna get all the suggestions for BA, I'm gonna get all the suggestions for BO, all the suggestions for BI.
[00:03:05] And it's gonna go out and basically search for all of these potential things. But there's no city in America that starts with BZ, as far as I know. Certainly not amongst the biggest ones, so that's not going to exist. So that's how you're going to build these tries is that you're gonna go through and you're going to basically separate everything out by letters.
[00:03:26] And you're gonna build these tries build on these branching logics. Why this is advantageous to you is if I'm searching for a city, right, and I type SAN. I just go down, I drill down into the trie at that particular part, and then I just do a depth-first traversal.
[00:03:40] And I find the top three results, and I just return those as the auto-complete suggestions, right? Does that make sense? So if I type BO here, I'm going to do a depth traversal on both of the children. I'm gonna get Boston, and I'm gonna get Boise, and then I'm done.
[00:03:59] Because those are the only two possibilities.
>> Brian Holt: Sometimes there are chains contained with other chains. There's whole words contained in other ones. A good example is there's a city called Salina, and there's a city called Salinas, right? I think this is one of the extra credit examples, but I just wanted to bring it up with you.
[00:04:21] The way you handle that is, pretend that BOST was a city, you would just mark T as a complete word. So that would get added to the trie, and then it would just continue with its depth-first traversal as well. That's all you gotta do to handle that.
>> Brian Holt: If someone is searching for something that has a space in it like San Francisco, you just represent that space as a letter.
[00:04:48] There's nothing special about spaces is what I'm trying to tell you. So just insert another note in there that's a space. People often get confused about that.
>> Brian Holt: There are more complicated ways to go about this. Like if I type Francisco, you might reasonably expect that auto complete would be San Fransisco as well.
[00:05:09] The way you would do that is you would just insert Fransisco and then once you get to the O element you'd say actually the complete word is San Fransisco right? So, that works as well. We're not doing that today though we're just doing prefix. Nothing more complicated than that.
[00:05:28]
>> Brian Holt: Lastly, you can add waits to something, right? So what's the best way of thinking about that?
>> Brian Holt: Sometimes the order in which things will complete in is not necessarily the order that you want to auto complete in. So you want to do a certain amount of edge waiting.
[00:05:47] So if I type in S, I might not want to do all the San Francisco, San Diego, San Jose. I might also want to like branch into other Ss to auto-complete those, cuz those might be better suggestions. You would have to do some sort of waiting or queueing.
[00:06:02] We're not gonna do that again today, but I just wanted to bring that up as well with you.
>> Brian Holt: Okay.
>> Speaker 2: So like get the cities closest to the user who is using the auto-complete, or the ones with the largest population.
>> Brian Holt: Yeah, largest population is one I was thinking.
[00:06:16] I guess you could do it with geo-targeting. That's just a whole other level. [LAUGH] But I would love to see it. That would be really cool.
>> Brian Holt: Typically these are designed to be fast and dumb. You just wanna suggest something. Usually you'll get the first suggestion, right? Cuz usually the order that you add things in is what's important, right?
[00:06:37] So usually look at the top result, right? And that's usually what you care about, and you just throw two other ones on there to make you feel like you're doing something. I don't know, but. Yeah, you want these to be fast and usually not necessarily complete, if you understand what I mean, so, cool.