Transcript from the "Wordy Unscrambler Algorithm" Lesson

[00:00:00]

>> So let's start to think about an algorithm that if our algorithm had been trying to simply go through the letters and see if all the letters in our input. If we were going through a dictionary and we were saying, what are all the letters that I have in this word?

[00:00:18] And then just checking to see if those letters are in our input letters, that would have been the most naive algorithm. But that algorithm would not have taken account, or it would have allowed replication of the letters, and so it wouldn't solve our particular purpose. So let's think about a different kind of an algorithm, and there's actually a variety of different ways that you might do this.

[00:00:38] You might count the letters and see if the count, and the letters is sufficient or whatever. There's a lot of little nuances here, but let's take a step back and try to come up with a different algorithmic way that might or might be great or really terrible for solving this problem.

[00:00:53] So what we're going to think about is what if we were to simply think about or consider all of the possible permutations of the input letters? Permutation, that's a fancy mathematic word. It simply means all of the possible reorderings, okay? So it turns out that for a five letter input, there are five factorial, there are 120 permutations of those letters.

[00:01:19] And what's interesting about the permutation space, first of all, I was able to fit it all on the screen, and yes, this slide did take me 30 minutes to put all of those in. And it was kind of a nightmare. Anyway, nobody appreciates how much work it takes to create slides these days.

[00:01:34] Anyway, so all 120 of those permutations, and what's interesting about that is, for the words that we're interested in looking at, it is guaranteed that our word will appear in one of these permutations if it's possible to do the reordering of it. That's, by nature of permutating the input letters, we are getting all possible reorderings of the letters.

[00:02:00] So the word that we care about has got to appear somewhere in this list if it's possible to make that word with these letters. And that takes care, by the way, of our repetition. It just automatically says, you got to have enough of those letters for it to show up.

[00:02:13] Not only do we know that it must show up, we know that it must show up in the first few characters. So the word CAN never appears anywhere. But the word CAP does appear as the first three letters of at least one of these permutations, in fact, two of them, okay?

[00:02:31] And the word CAPE does show up in exactly one word in the very first position. It also shows up in the last position, but it shows up in exactly one word where it's in the first position. Same thing with CAR, same thing with CARE. The word CLAP is never gonna show up in these permutations.

[00:02:49] The word EACH is never gonna show up. The word EAR is gonna show up in one of these permutations, in fact, two of them at the very beginning. So if we were to think about this as an algorithm, we could say, well, I could take my inputs, whatever those are, produce the entire space of permutations of those.

[00:03:09] And then simply search through those to see if the word that I'm looking out in the dictionary appears in the first position of any permutation, and then I know whether it's a match. So that is a workable algorithm. Does the algorithm make sense? We'll talk about whether it's a good algorithm, but it is an algorithm that we can implement.

[00:03:28] Does it make sense? Let's think about the problem space and a little bit about its theoretical growth. One of the things about factorial is that it grows pretty quickly. So five inputs is 120, not a big deal. Six inputs, 720, also not a big deal. Seven inputs, 5,040, also not terribly bad.

[00:03:53] But these numbers really start to grow very quickly. About 10 factorial is when that number of possible permutations starts to really get out of hand, and then, 11, 12, 13. By 15 factorial, we're at 1.3 trillion permutations. Now, if you think about it, you could write your app so that people were constrained in how many possible characters they could type in.

[00:04:20] You could limit that to eight or nine, but how many words are there in the English dictionary that are longer than that? It turns out that the average length of words is somewhere between 11 and 12. So if you limited it to 12 for most common words that people are gonna type, then you're at, what is that?

[00:04:40] 479 million permutations of those words. That's pretty large and it's gonna take up a pretty sizable chunk of memory, not impossible. 1.3 trillion is probably approaching, you're gonna run out of space. At least, on some users devices, of course, this is context dependent, yes.

>> Kyle, and as you're speaking of these numbers, this is one user, one instance, correct?

[00:05:05] You would have to multiply this by however many users are actually accessing this system.

>> It's gonna be done client-side, so I don't think it actually matters. We wouldn't be doing on the server. You'd be doing this only in the browser. But yes, if you were doing this server-side, then you would have to have the system resources to handle this for all users.

[00:05:22] Hopefully, you'd be able to share some of that work between users, but probably not a lot of it. But yeah, we're talking about a purely client-side solution, that is a good point.

>> Thank you.

>> All right, so, and by the way, just some of my analysis on dictionaries.

[00:05:36] There are words that are 20, 25, 30 characters long. Those are pretty unusual words, but there are quite a few words that we use in everyday English that are at least 12 to 15 characters. So it is not completely unreasonable that somebody might wanna use an app like this and type in 12 to 15 characters.

[00:05:56] These aren't crazy numbers, and we're already getting into very, very huge numbers that are quite easily beyond the bounds of practicality, okay? So 1.3 trillion is a pretty big and scary number. And so we can already know in a similar way to when we theoretically analyzed our exponential solution in nights dialer.

[00:06:17] We already know that even though this is a workable solution in the small, it gets really bad, really fast. A couple of other things to point out is that it's actually worse than that. [LAUGH] The worst part is that we actually can't do that permutation once for those large numbers.

[00:06:33] Because there's no possible way that we could put 1.3 trillion of those permutations in memory and keep all of them in memory while we go through each of the dictionary words, potentially 300,000 dictionary words. So if we're gonna do this, we're actually going to have to redo the permutation as we go every single word in the dictionary.

[00:06:54] We're not permuting the dictionary words, but we are permuting the input. And because we can't keep all of that in memory for those big numbers, we're gonna have to redo the permutation for each single word. That means we'll have to redo that permutation potentially 300,000 times. So it's actually worse than O of k.

[00:07:17] It's O of n times k factorial, which k being the length of input. The good news is it's not actually that bad. Still pretty bad, but it's not that bad because the permutation of the input, regardless of how many characters it is, it can stop once we've reached the number of characters for that particular dictionary word.

[00:07:42] So if we're looking at a dictionary word that is three letters long, we don't need to do all 15 factorial. We only need to go 15 times 14 times 13, and it will guarantee that that three-letter word shows up in there if it's a match. So it's much better than 15 factorial.

[00:07:59] But 15 times 14 times 13 in the best case for a three-letter word and in the worst case, it's still 15 factorial. We can also abandon any of the partial permutations as we're going opposively, where the next character that we permute doesn't match. We can just bail out early.

[00:08:17] So there's a lot of opportunities to bail out early. And I don't even know with those complexities involved exactly what the growth of this algorithm is, in a theoretical sense. I don't know what the O of n is. It's not quite n factorial, but it's a lot worse than polynomial or exponential, so it's somewhere in between those.

[00:08:39] And we'll see that play out when we implemented our timings. It'll be another great illustration of, wow, I added one more character and it went 11 times as long. It's a fun illustration. Okay, so that's our algorithm to start with. Again, as the baseline, it's good to have a working algorithm to compare against, even though we know that this is probably not the one we are gonna ship to.

[00:09:03] It's definitely not the one we are gonna ship to production.