Practical Problem Solving with Algorithms

Finding Words in a Trie Tree

Kyle Simpson

Kyle Simpson

You Don't Know JS
Practical Problem Solving with Algorithms

Check out a free preview of the full Practical Problem Solving with Algorithms course

The "Finding Words in a Trie Tree" Lesson is part of the full, Practical Problem Solving with Algorithms course featured in this preview video. Here's what you'd learn in this lesson:

Kyle refactors the findWords function to utilize the trie tree data structure. If the current node in the tree is a word, the word is pushed into the words array. Otherwise, the function is recursively called with the remaining letters from the input.


Transcript from the "Finding Words in a Trie Tree" Lesson

>> This is only part of the problem, but it is at least useful that we've constructed ourselves a trie tree data structure and just a few lines of code. We're now gonna need to adjust our find words utility to know how to search through such a data structure.

So, what I'm going to do is delete, our find words and our check word, function. And we'll start over. So, remember, in the autocomplete sense, what we really need to do is be able to get to a part of the tree to know if we've found a match or have any other matches that we can go to.

So we're gonna use a very similar concept, which is we're going to, again recursively step down our trie tree until we have convinced ourselves that there are no more matches and we're gonna stop. But if along the way, if we find any notes where we've got an is word marked then we know we found a matching word that's our broad general overall strategy.

So we're just gonna keep traverse for every word in your dictionary. Keep traversing the dictionionary, sorry for every one of our input permutation. We're gonna keep traversing our tree and whatever words pop out, those will be the words that we spit back as our result. That'll be our general strategy.

Okay, the first thing that we need to do is check if we're already, I'm sorry, we also need to change the signature of find words. So we're gonna still pass in the input but I'm gonna this is another technique where I'm not creating an inner function to do my recursion I'm just changing the signature of the function itself.

In this particular case I'm adding on two extra parameters that have default values. So from the perspective of the place in the code that calls find words, it still only passes in one argument. But I've got two other parameters that have default values and so I'm able to do this with the same function.

I just wanted to illustrate to you that we saw using a sub-function for our recursiveness. We've seen using separate functions and now here we're just gonna adapt the existing function to our needs as recursion. Each one of those is techniques that you can use. All right, so the second parameter will be the prefix starting at the empty string, as we probably saw before.

And the current node that we're at in the tree, again, this being like a node pointer, a node reference. So, we'll start that, we'll default that to the root dictionary. All right, so the question that we need to ask is, are we already at a word? If so, we need to go ahead and add that into our words list.

So we can say, if node of is word, have we found a node that is marked as the end of a word? Then we need to push in and this is why we're passing down prefix. We don't want to go then have to walk back up the tree to reconstruct our word remember we got to this node letter by letter letter by letter.

But we're pushing down with each level of recursion, the progressively built up string that represents that word. So at this iteration, if we're at a node that says it's a word, the thing that's in prefix is the full word as a string. We push that down, that'll save us some work basically.

So we can just push this Into the list of words that we can return. And now we start to go through whatever's left in the input characters. And again, kind of do our, basically our permutation recursively down through our tree. So we're gonna say for( let i= 0:i< input.length, i++).

Let's pull off the current letter off of the input. This looks a lot like when we were pulling the letter out of the remaining or whatever it's the same concept here. So I'll call this current letter. Input of AI. It'll start out with the first one and then the next iteration will take the second one and so forth.

The question we want to ask is if the node that we're currently pointed out wherever that is in the tree, if the node that we're currently pointed at has a node for this letter. If not, there's nothing else for us to do in this part of the tree, right?

But if it does have a node for that, then we can keep going down. So this is the big if statement. We need to check that. Now, what we need to do is get the set of the remaining letters and this is, again, similar to doing it with Slice.

I'm just gonna show you a slightly different style of doing it. We're gonna say that the remaining letters Is an array. Rather than keeping that as a string, I'm just going to keep it as an array. Arrays and strings are very similar, but I'm going to keep it as an array.

So I'm just gonna spread out input.slice from 0 up to i, and then spread out input.slice starting at i plus 1. That should look familiar because that's the same way we kind of picked out a single one when we were doing the permutations. So I'm going to call find words with remaining letters as the input we just computed remaining letters so I'm going to pass in remaining letters as the input.

The prefix is going to be whatever the prefix was passed to us plus whatever our current letter is. So if it was the empty string now it's the first letter if it was first letter, now it's the second letter. And the node is the node of the current letter.

So now we're walking down in the tree one, one layer of node reference.
>> So to make that more readable, I'll put that on to sip A few separate lines. That's our recursive call and what does that return back to us? It returns back an array of a partial list of matching words potentially.

It'll at least always pass us back an empty array but it'll pass us back any words that it found in its part of the sub-tree. You with me? So what we want to do is add that into the words array. So I'm gonna do words.push, and whatever this array is, I wanna spread that out as arguments to the push call Might have returned best back an empty array in which case it found none and we're just spreading out an empty array and doing nothing pushed in.

It might have found thousand words and we'll put all those in as well
>> They're a little confused on why you're used the symbol for is word. And where that could possibly get mixed up?
>> It could not get mixed up. I used the symbol because it's the best semantic coding choice.

This is a special property on the node. It's not a child property that represents a letter and a word. I could have picked some other thing like 1 2 3 but the best way of doing this this is what symbols are for. Is to create special meta properties special properties on objects so I'm just using the JavaScript feature for what it is for, which is to make a special set apart character.

But our particular usage of these objects doesn't enumerate over the children in some way where it could get confused. It just would make more confusing code to pick some other arbitrary value for that property name. One last little thing to fix up which is that at the end of this for loop It may not be obvious, but if the inputs had had any repeat characters.

In other words, if anybody had typed in the p letter twice in the input, we would end up with duplicate words. Because we would have ended up visiting that same part of the tree multiple times and got all of its sub matched words multiple times. So our list might annoyingly to the user end up with some duplicates in it.

That's one of the implications of this algorithm. And so to fix that up, I'm going to use the JavaScript set data structure. I'm going to push our array into a set and then pull it right back out as an array. So I'll do that. New set of words, and then spread it back out into an array.

So this is just making sure that our list of words that we return is unique. Then what's checked in the code repository, this is how the code looks, but I want to point out something, that this is a recursive function. And because I'm not using an adapter function, this function is the actual one that recurses on itself.

Which means that we are reuniquing this at every single level before we return back up. Which is a whole lot of unnecessary work. So we really only wanna do this line at the last level, right before we return back to the calling code. After all the recursion is done, and we've gotten a big old list of all the possible words with potentially a bunch of duplicates in it.

Then we just wannna do one last deduplication if we had an adapter function that had called this one we could put that line of code in the adapter function and not in the recursive function. A way of approximating that same thing is that we know that when node is back at the dictionary level we know that's our last level out because we always start with the root.

So we can just simply, this is just a little, it's tiny performance, it probably wouldn't matter, but if just as we're thinking about things. You can say if node is equal to the dictionary, that let's us know that we're at the last level, so that's the only place that we need to do duplicate.

I see the question I'll get to it in just a moment I just want to finish typing this. So what was the question?
>> Can you please clarify why we have what we have in remaining letters?
>> Okay, so we are going through the input, which is the set of characters that we can basically permute on, let's whatever the user typed into the input box, that's a array or string of letters.

It starts out as a string and we end up turning it into an array, but the way we use it, it doesn't matter. And so we're looking at that set of those letters and we're saying I wanna take each one of them as the next letter. But I want to do that one at a time.

So I'm gonna start with again going back to the PC, And A example, I would take P as my current letter. And then remaining letters is going to be the array that holds C, E,R and A and then on the next iteration of line 43 Waterloo. Instead of taking P, now I'm gonna look at C.

So I wanna take C as the letter that I'm fixed on as my next letter in a permutation. And I wanna pull out a remaining set that has P, E, R, A. So remaining letters becomes those four. Then on the next loop of line 43, I take the e as my next letter that I'm fixed on and I get a set of remaining letters, that is P,C, R and A and then I keep going.

So it's just pulling out one letter from the set of possible letters and getting everything else into another set. That's what we're doing.
>> Can we just remove duplicate letters from the input first
>> So if we were to remove duplicate letters first, then it would prevent us matching words like that we want the duplicates in.

Remember that we constrained our problem, such that you have to pass in PP and AA if you want the word appear to match. If we deduplicated that down, then you would get to the part where you did A and P, and you'd be looking for another P in our algorithm, and it wouldn't be in the input set,.

So therefore we couldn't keep going down that part of the sub-tree. So unfortunately, we can't deduplicate the input letters, we have to deduplicate the output. This is just a side effect, if you will, of the algorithmic choice that we made here. That may or may not end up actually, it won't end up being what we like about this particular choice.

But we are gonna see the performance implications that we, little hint, this is gonna be significantly more performant. Because this is a linear algorithm as compared to the factorial algorithm that we tried before. All right, so let's go ahead and switch on over to our browser. Now, with our new implementation in place, pay attention now to these benchmarks for our data structure for a 2400 word dictionary.

We ended up creating 5705 nodes. Each node is only represented by a single character, but we are creating about It looks like they're just about a little more than twice about two and a half times as many entries in the data structure. And for treating entries in a data structure as units of memory we could sort of approximate that this is using a little bit more memory to represent this data structure.

Looking at the timing of it, 8.6 milliseconds, I don't remember exactly what the short dictionary one was, but I do remember that this one ended up at something like 20 milliseconds. So let's see how long it takes. So that took 210 milliseconds to construct the tri-tree. That was about ten times as much time to construct the try tree as it was to just copy the array.

Notice also that for 370,000 words, we've got a little bit over a million entries in our trie tree. So, yes, we are using both more memory and more startup time to construct our trie tree data structure as opposed to simply copying the array. What's that gonna buy us?

What's that, we're paying the more upfront memory and more upfront performance not horribly, I mean 0.200 milliseconds is not that terrible for the startup time. And a 370,000 word dictionary is a pretty large dictionary to represent. So seems like 200 milliseconds is not that big of a penalty to pay, but whatever that penalty is that we are paying, what's that buying us?

What's our runtime performance? Let's go back to our example. I'm gonna go back to the 2400 word dictionary because I remember those timings a little better. And let's go back to our eight character input America. That's 0.8 of a millisecond. Let's see what happens when we go to nine inputs.

It went down actually. What that doesn't mean is not that it was faster in any way. What it means is that it didn't grow by 5 or 6x like it did before. Let's try it again. Let's go to 10. Went to 1.4 milliseconds. When we did 10 before, you remember it was like multiple seconds remember when I tried meaningful and that one ended up at like 36 seconds.

Let's try it 0.8 milliseconds. So now we have a linear search algorithm and that's a huge payoff win for us compared to the relatively I would say small cost that we had to pay to construct a better data structure. It's a very good chance that at this point, you would look at the memory and performance profile of this solution compared to relatively not that many lines of code that you needed to write.

And you would say, I'm done ship it. Because this is a pretty good balance. It is taking up a bit more memory. It is still taking up a bit more of our time upfront but it's ending up with a pretty good overall balanced solution and nobody would fault you if this is the one that you said ship it for.

But as we transition into our next set of discussions there are a few other things that I want us to consider about our implementation.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now