Transcript from the "Trie Tree Data Structure" Lesson
>> As I mentioned at the top of this exercise discussion, we have different data structures that we can choose that end up leading us down different algorithmic approaches. So sometimes, the real question is not, can I come up with a different algorithm? It's actually, can I come up with a better suited data structure?
[00:00:19] And a lot of times, the data structure it's front loading, the processing work, if you will, to get the data into a format that will make it easier for us to work with. So there are data structures that are pretty well known in these words space. And I wanna talk about a choice of using one of those because it seems like a pretty natural choice for us.
[00:00:44] The data structures referred to as a Trie, a T-r-i-e. There's a little bit of debate as to how to pronounce this because a lot of people pronounce it as Trie. Because if you say Trie TrIe, that's really confusing. I think prefer to pronounce it as Trie Trie. But the guy who invented this invented the name of this from the middle characters of the word retrieval, so he wanted it to be pronounced Trie.
[00:01:13] So if you're sticking to the original definition, this is a gift versus gift thing. You're supposed to call it a Trie Trie, but most people like to say Trie Trie, whichever one you prefer. What's the notion of this data structure? Well, obviously it is a Trie data structure.
[00:01:30] You'll notice that we start out with a node and the values inside of the node are not the important part. It's the linkages between nodes where the meaning is assigned. So, we're always start out with a root node, and it has, in this case, for this dictionary, two children, one labeled C and one labeled E.
[00:01:50] That represents of all the words in our dictionary, what are the unique first characters of all the words that we have. So we have CAN, CAP, CAPE, CAR, CARE, and CLAP. Those are all C. And then EACH and EAR I'll start with E. That's why we have a C and an E.
[00:02:10] And you'll notice that it's a recursively defined Trie so the same is true that underneath C we have A and L because of our words. The only first two letters are C-A and C-L. And then we have C-A-N, C-A-P, C-A-R. Those are the only first three letters. We have C-L-A-P.
[00:02:34] So, you can see how this dictionary structure, or this Tri-Trie structure, is representing all of the words in our Trie one letter at a time. Here's where this data structure is most commonly cited. One of the most common usages of it is for autocomplete. When somebody has typed in the first several letters of something, like for example in a search box, and you wanna return all of the possible suggestions to make to them, you match on the first few characters.
[00:03:05] And hopefully visually you can see why that would be useful. If you had a data structure like this representing all the possible suggestions, and somebody typed in the first three letters, you just simply go down, you follow what they've typed down in the Trie one at a time down, and then the rest of the words that can be made in that Trie, are the suggestions that you give back to them.
[00:03:26] You don't need to consider the whole rest of the structure, only that sub-Trie. So that's one of the most common usages of this data structure. I'm showing it to you because when I started this problem, I kind of had this intuition, maybe this data structure would also be useful for our case, which is a little bit like autocomplete but not quite.
[00:03:49] And so, it was an exploration that I tried to see whether or not this Trie Trie would work. The only other thing to point out, it should be obvious here, but the blue coding that I have there, we have to indicate which ones of these nodes represent actual end of words where they're valid.
[00:04:07] So the black ones don't represent an end of a word. CA in our dictionary is not a word, but C-A-P is a word and C-A-P-E is also a word. So, it is not the case that only leaf nodes can be the end of words. We have to track an additional boolean on each node that says is this the end of a word or not?
[00:04:27] And we track that as we build up the Trie Tree. So we see E-A-R is a word and that's why that's marked as blue. But E-A-C is not a word and that's why that one is still black. Another thing that we know just because this is a well known data structure, we know that traversal of a Trie Tree is done in linear time.
[00:04:48] It's actually K times M, K is the length of the dictionary, like the number of words. And I'm sorry, K is the average length of words in the dictionary, and M is the number of words in the dictionary. So K times M, that's sort of a fixed, known thing in your dictionary size.
[00:05:10] You can treverse it in linear time. Make sure I'm saying that correctly. I think I said it incorrectly. The average length of words represents the average height of the tree. So that's what K is, is the height of the Trie and M is the number of words in the dictionary.
[00:05:35] I think I'm saying that correctly. Anyway, because it's a known data structure, we just know it has that known property we need to go and re-prove to ourselves that math. That's just a known stated property. If we construct a data structure like that, we know that we'll be able to traverse it in linear time.
[00:05:51] Let's switch back over to our code editor. And we're gonna start again with our load words. Instead of making dictionary an array, we're now gonna make an object. A Trie, it doesn't need to be anything more than simply linking objects together. Having an object with one or more properties under that point to other objects, that's all it really takes to make a tree.
[00:06:20] So we're just gonna start with a plain old empty object that will represent our root node. And instead of loading this thing up into a copy of an array, we're gonna just create this nested series of objects as we load the words from the word list array. And then we need to keep track of how many we've done, So I'll have that count here and we'll return that.
[00:07:11] It's just a Unique primitive value could think of it like a number or a string, but it's guaranteed to be unique. All right, so what we're going to do, firstly, let's just make sure, because we know that our app can call load words multiple times, we need to reset the dictionary each time we pick a different drop-down.
[00:07:36] So, I'm just gonna check to see whether or not our dictionary has anything in it. And I'll reset it to an empty object in case. So we'll just throw away the old Trie and rebuild it. All right, we're gonna go one at a time through the words in the word list array that was given to us from the JSON file.
[00:08:08] And there's no recursion involved here, which is kind of nice. We're just going to do an iterative traversal down this Trie. And the way we do an iterative traversal down the Trie, is we just keep track of a reference, a pointer if you will, a reference to the current level of Trie that we're at.
[00:08:29] So we start that reference at the root node and this property, I mean, this variable code node will be our reference pointer to our current root or our sub-root as we go down our Trie. And then, we need to loop through all of the letters in this particular dictionary word.
[00:09:19] Because we're gonna see the same starting letter from a whole bunch of words, and we don't need to recreate the object over and over. So we only need to create it the first time. So we can just say if node being our pointer to whatever our current portion of the Trie is, if not node of letter, meaning we have not yet created that child, then we need to go ahead and create that child And that object I said that the objects were empty or that they didn't carry any particular data payload, but they do need to have that Boolean flag on them that tells us whether or not they're a word or not, whether they represent the end of a word.
[00:09:57] So we're gonna include that symbol, the isword symbol as our property name and we'll initially set each node the Boolean to false. So we're saying the property of that symbol name value is false as we add in a node. We do need to keep track of the count.
[00:10:20] So this is a good place for us to increment our counter so that we know how many of these nodes we have created Actually, let's do that inside the F because that's where we actually created it After we have added a letter or at least found an existing one, whether we added it or found an existing one, now we can advance the pointer.
[00:10:47] So we can say node = node of letter. So we're simply taking that running reference that was started out at the root, and now it's pointing to this child node whether we just created it or had already created it. So we're basically just walking our way down for every word in the dictionary.
[00:11:04] We're walking our way down to the place in the Trie, where that word would have needed to be finally represented. And at any place where we haven't yet gotten the nodes for the letters left in that word, we just add them. So we're just progressively filling out more and more of the Trie with more and more of these nodes.
[00:11:21] At the end of the four loop where we have added all of the letters for a word, we know that the node reference, the node pointer there, that variable node that is pointing at the last letter of the word, and therefore, we should take that one and set its symbol to true.
[00:11:40] Because that does represent the end of a word. So we're gonna say node, of is word = true. Those 20 or so lines of code are all that it takes to construct a basic data structure that behaves like a Trie tree. Now, there are well known implementations of data structures like Trie trees that you can find on NPM.
[00:12:13] And some of you have probably already Googled for this is there an implementation of a Trie tree? They are more sophisticated. They have API methods. They're written as instances of classes and all of this other fanfare. And those very well might be the kind of code that you would like to just use in a production app.
[00:12:31] But I like to do the bare minimum that is necessary to get the job done. So, if it only took me 15 or 20 lines of code to write a Trie tree implementation, like I've done here, that's what I would almost certainly do, rather than going and getting a 400 line nicely packaged, Trie Tree.
[00:12:49] There is a dividing line where the complexity of doing your own implementation, and all of the potential performance gotchas where you might not have an optimized solution, it becomes actually way better for you to use a known well battle tested implementation. So this one's only 15 or 20 lines of code, it's conceptually pretty straightforward.
[00:13:12] there's not a lot of places where we could have made a mistake or performance foot gun or something like that. So, I think this code is the right code. This is the code I would use in production. But in our next option, we're gonna start looking at data structures that are much more complicated and that we definitely don't wanna reinvent ourselves.
[00:13:30] And we'll use packages that exist for them. Okay, so you do have to make those judgement calls, just be wary of where's that balance line of how many lines of codes and how many potential pitfalls have I introduced.