Transcript from the "Finding Candidates" Lesson
>> So that's our first improvement to this is that we improved our lookup by storing and using this index. I now want to switch attention to evaluating the check function, our recursive implementation of the check function. We are absolutely still going to use recursion here. But the fundamental question I want us to ask ourselves is, do we really have to keep recursing over the entire elements list?
[00:00:34] Is there something that we could recurse over that would be much smaller? Effectively, what this technique is going to work out to do, we are going to pre-process our data set or our problem set. That's another one of these techniques that you end up seeing is like, if I did an initial pass through some stuff and figured some things out, and then I had a new set of data to work on from that first path.
[00:01:10] My second pass through might be a lot more efficient.
>> The smallest set you have in any individual execution of that function is your input work. And the pairs that make up your input word, you have at most a 2-arity pair, because that is the maximum length of a tuple in your periodic table data set.
[00:01:30] So you are looking at pairwise combinations, ordered pairwise combinations So if we are looking at input word, it would be I-N-N-P-P-U-U-T. Those are your, that is your max input set, is the i, j pairs that you iterate through.
>> True, except there's more complication to it, which is we also have the single letters, there's not just pairs.
>> Yes, you're right.
>> You are right.
>> I am handling that after we have failures of 2-arities, that's my, I didn't say that, but yes, you are right.
>> You are correct. So to put this in terms that are going to map to the code that I'm about to write for you, what we effectively can do is look at any given word like the word yucky.
[00:02:14] And say that there are only a fixed number of possible elements that could ever match for us. We don't know whether and in what order and whether they will actually match up, but we can actually take that list of 118. And drastically reduce it to a list of candidate symbols that could only possibly match.
[00:02:40] Because there's a whole bunch of symbols that could never possibly match. And in our current algorithm, we are reconsulting all of those every single time, which is to say, just like I said earlier. We are learning some information which is to say we learned that there was a non match and we forgot that information.
[00:02:58] And then we relearned it over, and over, and over, and over again let's not do that. Let's find the list of possible candidates, the attendee here was using much more formal language than I'm gonna use, but let's find all of the candidates. That are two letter or one letter candidates that could possibly be in our word, and when we recurse, we're gonna only consider those candidates and not the whole element list.
[00:03:28] Does everybody see why that's gonna drastically reduce the amount of work? And not only the amount of work, but really to an order of magnitude in big O terms, it's going to be less work. It'll actually move the needle in terms of our work complexity okay, so I'm going to write us a function called find candidates And it will probably be a very similar translation of what the attendee was just describing here, which is we're going to create two lists that we're going to then concatenate together.
[00:04:13] The list of all the single letter symbols that could be an R word, and a list of all the two letter symbols that could possibly be an R word. As he described the, I-N, the N-P, the P-U, and the U-T, those are the two letter ones that could possibly be an R word.
[00:04:30] We're going to find all those different candidates and the order in which we can catenate them together, is going to create that preference that we have expressed. Which is that we want the shortest outcome we want to spell the word with the fewest number of symbols. If we change the order of candidates, we would end up with a different response.
[00:04:50] Everybody follow me, generally? Sounds more complicated than it is, so let's just write some code. I'm gonna set myself up with a couple of arrays. Now on the side note, I often get asked by people somebody may ask why you use VARs? Because the VAR keyword is still superior for some tasks, despite what you've been told that you should only use const.
[00:05:46] Okay, so what I'm doing here is I'm going to go through The input word one letter at a time, and I'm going to use the really old school for loop way of doing this instead of a for of loop because I also want the indexes. I could have done a for of loop with the entries and I've done that in other places.
[00:06:11] But here I'm gonna go ahead and do a for of loop so that I have that i index and it's gonna allow me to step through at different levels. Okay, so the first thing is that we want to collect this is the most obvious is that we wanna collect all the one letter candidates.
[00:06:29] How are we going to do that? We're going to say if our input word is in symbols, so this is saying does that single character appear as a property in the symbols object? We could have done symbols.jason property if you like, but I like the IN operator here in terms of its semantics.
[00:06:55] We're saying, is that there And, is it not the case that one letter symbols? Already has it. I'm gonna split this onto two lines, because it's a little easier to read here. So our condition is, is the letter that we're consulting in our input word does it exist as a single letter symbol yes or no?
[00:07:35] If it does exist, have we not already found it? Because I don't want to unnecessarily stick it into that array. Now want to pause for a moment does anybody spot a potential sub optimal or less efficiency with what I just described? I made a deliberate choice here but there are some consequences to the deliberate choice does anybody spot where they might be?
>> We are using, well, and this might not be the one you're thinking of, but we're using NOT on line 129, which if you have a very large set you're checking against. Is going to be incredibly expensive because it's going to have to exhaust the set every time.
[00:08:29] To run that operation, you would probably want to optimize it by matching explicitly there instead of using the NOT operator, but that's getting really pedantic and I think I'm missing here.
>> I think you're in the right direction, but not quite on it, so let me say it this way, I'm effectively checking to see whether or not it's unique in that array, right?
[00:08:55] Did I have a data structure that we talked about earlier today in our workshop whose job it is to keep track of things that are unique sets? So if I just tried to stick it into a set, and then check whether it was in the set, the set data structure would make sure that I didn't reinsert it a second time.
[00:09:20] So that could avoid my condition here, so you might ask, well, why don't you just use a set? I want to see if anybody can spot why I don't use a set here. It's subtle, but it's important, yes?
>> Because it doesn't have indexes.
>> That's not quite it, no, yes Mark.
>> This would cost a lot of memory work because we don't have a unique reference to the symbol.
>> Well, in the example yucky we needed the Y twice, right?
>> We did need the Y twice, but that's okay, because we're just constructing candidates here. We're not constructing the actual answer, good thinking you are right, yes?
>> Several people said you need order.
>> Yes, thank you I'm glad, sets are un ordered, and the order here matters, right?
[00:10:25] Remember I said that the order that we put these things in, is going to matter in terms of what outcome we get, whether we get the longer than not. Within each one, the order doesn't super matter, but it is kind of important, but moreover, the set is the wrong data structure when order matters.
[00:10:46] If we're going to discover these in order, and we're going to concatenate them in order, we should not confuse the reader of this code by using an unordered data structure. Even if it worked, it would create the wrong communication in this code. Since order matters here, I'm going to stick with the array and do the array includes.
[00:11:47] But there is a specification defined ordering if you enumerate a set. I could render that out to an array, or I could construct an iterator over the set. Either one of those is going to create more memory pressure versus just keeping these two arrays around and using them.
[00:12:06] So there is a trade off that I'm making here, which is to say, the includes method does need to do a lookup that's not quite as efficient as a set Has would be. But we're only gonna have a maximum of like a dozen maybe two dozen candidates. If you think about it there's only 118 symbols so we're never gonna have any more than a few dozen candidates in any word So I make the determination that for both code readability, communication, and for performance, memory performance reasons.
[00:12:43] The array was the right choice, but I just wanted you to see that there's a choice being made here versus the set choice. And in other cases, the set might have been the right data structure. Okay, so now that we have checked to see if we can stick it in, then we just simply.
[00:13:05] Push it in. That's collected the one word candidates we now need to also collect the two word candidates. Sorry, two letter candidates. Doing it in the same loop as we go through, we can collect those two letter candidates. So we're gonna say if i is less than or equal to, this is just a little bit of math to let me know that I need to stop.
[00:13:38] One before the end rather than at the end. Because I can't get a two letter candidate, starting at the last letter of the word, I can only get a two letter candidate, at the next to last letter of the word. I'll go ahead and get that so I don't have to just keep re-slicing it over and over, so we'll say slice starting at i and going to i + 2 and the slices Non-inclusive on the end.
[00:14:14] So, it goes from i, to i + 1, and then when it gets to i + 2, it does not include it so, this is only getting those two characters. I'm going to say if two in symbols same kind of if statement is above. If two that is those two characters in the Word, or as the smarter gentleman said the pairwise here.
[00:14:45] If the two N symbols and this is not already something that we put into the two letter symbols array Includes Two letter symbols.push two. So one loop through our input word which is that most what again is gonna be like 5,10 15 characters that somebody is going to type at most 5 or 10 or 15 characters that they're going to loop through.
[00:15:25] And we're going to collect the one letter symbol candidates and the two letter symbol candidates as we go. We've got them in two separate arrays and we want to return one final array with all of them. And this is where the order matters because we want to order the two letter symbols first, and then the one letter symbols.
[00:15:48] That is to say that as our algorithm progresses and recurses over this list of candidates, we want it to prefer a two letter option first over a one letter option.