Practical Problem Solving with Algorithms Divide & Conquer Solution
Transcript from the "Divide & Conquer Solution" Lesson
>> Let's talk about implementing the loadWords function in the most basic way that we can possibly imagine. If we load in the dictionary from a JSON file that just has an array, we can just stick it in another array, right? We can just stick it in an array, and that's kinda the most basic form of data structure that we will start with.
[00:00:19] So I will make a dictionary variable. I'll have it be an object, I'm sorry, we'll have it be an array. Actually, we don't even need to assign it to an array, but we'll just assign that. Because in our loadWords function, all we're going to do is assign it to a copy of what was passed in on wordList.
[00:00:43] We're only making a copy of it, because we're not actually gonna mutate this array in any way. But just as a kind of a good step, it's important to be clean with what we receive. We don't wanna receive things and then potentially cause mutations of them. The other thing that the to-do comment mentions is that the loadWords function needs to return a value that represents how many entries are in the data structure for whatever that means for the data structure.
[00:01:12] In this case, it's whatever the length of the array is. But we'll be doing different data structure shortly, and those will have different representations of their quote, unquote size. So here, we'll just return dictionary.length. But in other cases, we'll run a return a different value. The reason why we're returning that value, by the way, we're treating that as a unit of memory.
[00:01:35] We're not actually getting down to the low level and trying to count bytes or do any kind of memory profiling. Again, that was asked earlier. There are great courses on here about how to use the deep developer tools and do real memory profiling and heap snapshots. We're just approximating here with units of memory.
[00:01:53] So we'll just treat each string as kind of an equally sized unit of memory. And so if we have 1,000 strings, that's 1,000 units of memory. If we have 10,000, then that's 10 times as much, so we'll just treat it kind of that way. So it's an approximation of our memory usage that's why we're doing that.
[00:02:11] The next thing that we need to do is implement the algorithm that we described, the permutation algorithm that we described. So I'll take out, let me give us some room down here at the bottom. Take out that to-do comment. And we do want a list of words that we will fill in the array of words, and that does end up being what we wanna return.
[00:02:36] So we know that we have an array of dictionary words to consult, so we're definitely gonna need to loop over the entire data structure of words. So we'll say, let word of dictionary, And for each one of those, we can do some very basic filtering, which is if the input.length, meaning the number of characters that were passed in for us to match against.
[00:03:05] If the input.length is at least as long as the word.length, then we know we should look at it. Otherwise, if the word is longer than our input characters, there's no reason to even possibly consider it, because it's definitely too long. So that's one of our quick filters. And then we're gonna call out to a function we're gonna write.
[00:03:26] We're gonna call out a checkWord function, that's gonna be our recursive function that we'll check this. And it needs to check the word that we are looking at and the input. And if that works, then we want to push that into our words already. So we'll say words.push(word).
[00:03:43] So if it returns as true, then we'll push it, otherwise, we'll just keep going. So all the real heavy lifting we'll do in a function that we called checkWord. I'm going to for just simply purposes of creating a slightly cleaner API, I'm going to use an inner function that actually does its recursion.
[00:04:08] We could have reworked it a little bit so that the outer checkWord does this, but I'm just gonna to keep things a little bit cleaner. So what we will do is call the inner function that will name permute. We're gonna start with an empty string and the input.
[00:04:24] So effectively, what's gonna happen is that we're gonna take a character off of the input and add it to this word that we're building up little by little by little until we are able to find a permutation that matches the word. So that's what this empty string is.
[00:04:42] It's gonna go character by character through the possible tree of permutations adding a character, and then adding all the other possible characters after it, and so forth. So that first parameter to permute is the thing that's gonna build up progressively as we take characters off of input. So let's then, Write our inner function called permute.
[00:05:11] We'll call that prefix because, again, that's the leading characters of a permutation that we're hoping will match the word basically, that's what prefix is. And then we'll call the letters passed in, we'll call them remaining. So we have a prefix which starts out as an empty string, but then on the next recursive call, it'll have a single character in it.
[00:05:40] And we've got the remaining characters, cuz we've taken that one out of the set, get the remaining characters left. So we need to look at all of the remaining characters to create our next level, our next layer of our permutation. So we will say, for let i =0; i < remaining.length, and i++.
[00:06:01] So this is just a loop to go through all of, I don't know why I called that remaining, all of the remaining characters. Let's pull off the current character. I'm sorry, this will be kind of the current word, but it's not really a word yet, because we haven't matched it to the dictionary.
[00:06:23] But we'll just call it current, and that is the catenation of whatever prefix is. Again, starts out at empty string, and then builds up whatever prefix is plus remaining(i). So what you can see here is that whatever that prefix is, let's say, that prefix is AC, and now our remaining characters are P, and E, and T.
[00:06:45] So we're gonna have one loop iteration, which is ACP, ACT, ACE, so far, follow where I'm going with this? All right, now, what we can ask is if we made a match. If the current== word, then we know we can just exit out quickly. We'll say, return true.
[00:07:09] If we have more characters to consider, so remaining.length has to be greater 1, not 0. Cuz we already took the first, or one of the items out of remaining. So we just need to make sure that remaining has more than one other character, at least one other character besides the one that we're looking at.
[00:07:32] And we need to make sure that we haven't gone over the length of it, so word.length. We already have a check here that says, don't even try dictionary words that are too long. Well, this is just making sure that we stop the permutation whenever we reach the length of the word.
[00:07:52] So remember, I said if we had a three letter word in the dictionary, but we had 15 input letters, we would stop at 15 times 14 times 13. So that's what we're doing here as we're saying, as long as the current.length that we built up is less than the word.length that we're considering, let's go ahead and stop.
[00:08:14] And then we're gonna check to see if we make our recursive call. We're going to pass in current. Remember, current is kind of the temporary, partial permutation is the best word for it. We're gonna pass in that partial permutation. And we need to pass in the remaining characters minus whichever one it is that we just pulled out.
[00:08:42] We're pulling them out from different positions as we go along. So we're gonna need to do a slice on the remaining, From 0 up to whatever i is, but not including i. And then a slice starting at i, i think, starting at i+1 Okay, just so we are clear here.
[00:09:13] You see that remaining of slice, remaining.slice0 up to i is whatever characters are earlier in the remaining string that we aren't considering right now. Those are in the set of further remaining. And then whatever is after the characters, the one character that we're looking at, that's also in the set.
[00:09:35] So take the two slices on either side of whichever character we're looking at, put them together, and that makes a new smaller set of remaining characters. Because we've already taken off one of those characters and put it into current. All right, so if the permute was able to find that by going down its recursive tree however far and it bubbled us back up to true, then we can just simply return true.
[00:09:58] But if it somehow failed way down deep in its tree, and we need to bubble back up, we don't wanna bubble back up a false. Because that doesn't mean that we haven't found it, it just means that we didn't find it in that part of the tree. So we just need to keep going.
[00:10:13] So that's why we're not just returning the result of permute, we're checking to see if that was true. And if so, then bubbling back up the true. So what this is going to do is keep looping through and keep checking these different words, or these different partial permutations until we find a match or until we know that we're never gonna find a match and then it just exits out of this loop.
[00:10:40] And so at the end of the loop, if we never returned true, then we know for sure that we can return false. Let's make sure we understand from a theory perspective how this is recursion, how this is divide and conquer. What we're saying is that to permute a set of characters, we start with the first character and then we take the permutations of all of the remaining characters, right?
[00:11:08] So we're saying all these remaining characters, if I started with the five characters, the P, C, E, R, and A, let's start with P, and let's get all of the permutations of C, E, R, and A. And then, let's start with C, and get all the permutations of P, E, R, and A.
[00:11:26] And then let's start with E, and get all the permutations of P, C,R, and A, and so on. So we're starting with the first letter and then, again, we're breaking down to permute C, E, R, and A. C, E, R, A, to permute those four characters, we could start with C plus all the permutations of E, R, and A, you follow that?
[00:11:51] That's our recursive definition or divide and conquer definition for a permutation. And we are absolutely taking advantage of the short-circuiting, which is, as soon as find a match, exit out now. Don't keep going through the entire tree. So if you think back to our discussion about traversal, here, in some ways, the order kinda does matter.
[00:12:18] Because we could go out this from the wrong order and ended up going through a much longer portion of the tree before we ended up at the right thing. That's why we're picking the permutation where it matches in the first position as opposed to waiting until we got a permutation where it matched in the last position.
[00:12:36] We would had go through a whole lot more of the tree to find that. So we're optimizing to quit as early as we know that we found a match, and we only exhaust all possible options if we never find a match. Let's save that, switch back over to our browser And we should have a working solution if not a particularly non-performance solution.
[00:13:08] So on the short dictionary, which is the one that comes up by default, I know that there's the tiny and the long, but I'm just gonna stick with the short dictionary. But just so you can check, if we load in the 80 word dictionary, it loads in so quick that it's not even a 10th of a millisecond.
[00:13:26] But the 370,000 words dictionary, it takes 23 milliseconds to copy that size of an array. So we can see even in this very small first example a little bit of benchmarking of the performance time. We see VA it's now kinda caching some of that work, but you can see a little bit of that benchmarking.
[00:13:48] And we will see later when we try other data structures that those times significantly change. The other thing to be looking at is the size of the entries. When there's 80 words in the dictionary, there's 80 entries. When there's 2,400, there's 2,400, right? So we will also keep an eye on that, because some data structures will have more entries than words, and some data structures will have less entries than words.
[00:14:10] So that's our approximation of the memory usage, and our approximation of the processing time for the startup of the app before we've tried to do any searching at all. And we'll keep an eye on that as we try our different options as we finish this exercise. So because I'm here in the United States, I'm gonna go ahead and try a set of characters that I know works.
[00:14:35] I'm gonna try an eight letter input, a, m, e, r, i, c, and a, and let's see what words come back from our short dictionary matching though. Okay, so we got acre and air, we got America like we'd expect arc, are, area, arm. You'll notice that the order that we're getting these in is alphabetized, because we went through an alphabetized dictionary and we preserved the order in the way that we went through them.
[00:15:04] So that's why those words are ending up in alphabetized order. That won't necessarily always be true of any algorithm that you pick. So that's another thing that we would just wanna keep an eye on is, did I accidentally end up selecting other words in a different order? And then maybe I need to go back and sort my words, because users generally want to see them presented in dictionary order.
[00:15:29] Okay, so that took 0.2 of a second. It took 200 milliseconds on 8 input. Let's try american, which is nine letters and let's watch the timing of that. It went from 200 milliseconds up to 1100 milliseconds. So that was about a five or six X, looks like. Let's go up to ten characters.
[00:16:11] It's still going, so we went up to 6.36 seconds. So we see it starting to significantly grow by way more than just double each time, right? So if I did this word, which is 11 characters if I'm counting correctly, is that? No, that's still ten, but should be the same as Americans.
[00:16:39] I just wanna run that one and see what it does. So that took 39.25 seconds to run that one. So you can see that this is growing at what we predicted, which was approximately factorial or in that neighborhood. It's growing at a significantly long time. And the more characters that we put in, it would lock up for even longer periods of time.
[00:17:07] So we know we have generally working solution. We've kind of done some some visual checking against this, but we've only been checking against the 2400 character dictionary. What happens if we try go back and try the long dictionary, and try it with our eight character and let's just see what that does.
[00:17:31] So before, that was 200 milliseconds, And it's still running, and it's still running. It's kinda funny that when I do this, sometimes it does what I'm expecting, but sometimes it's significantly longer. I can't control what background processes might be running on my computer or whatever. But yeah, 14 seconds just to do the eight input.
[00:17:58] I'm not even gonna try the nine input, because we'd be here for another minute or two waiting on. But you can see that now, we see a verification that not only is it growing on the size of our input characters, but also on the size of our dictionary.
[00:18:14] Now, 370,000 word dictionary is way bigger than we need. My estimates from my Google searching are that almost all common English words could fit in about 80 to 100,000. So that's almost four times the size that it needs to be, but that's still crazy too long for us.
[00:18:34] It was too long at 2,400 words, which was way too short of a dictionary. So we can imagine if we had a dictionary at 80 or 100,000 words, we'd have similar problems. Okay, so we can see here that our initial kind of naive, recursive permutation approach is not going to work for us.
[00:18:59] The code that you have now represents option one. And so if your code happened to not work, you missed parentheses or a comma somewhere, again, stash whatever you've done so you can check it out later. And then just check out the option one branch so that's your starting point for the next optimizations that will do it.