Practical Problem Solving with Algorithms Performance Characteristics
Transcript from the "Performance Characteristics" Lesson
>> Now that we see that it works let's go back to the question that I started to ask you earlier which was. What's the performance implication the difference that we've now made what's the implication of the difference in performance? It is still the case that we are recursing against the input word, and so the length of the input word, the longer the input word, the more levels of call stack recursion, we're going to end up doing.
[00:00:31] So in both solutions option one and now option two we're gonna end up recursing over the same number of input characters. But for each one of those input characters, notice what we're now doing differently instead of effectively searching the entire elements list every time. We are searching the small subset of candidates that we already know to be possible to be in it, and we're ignoring all the other ones that could not possibly have been in it.
[00:01:15] Remember in the previous solution we had a four of loop over elements. So, if we took N to be the size of the input characters that you type in, which is fixed somewhere between six and 12, or whatever somebody wants to type. If we take that as N, and we take the number of things that we need to look at worst case for each one of those characters, and we call that M.
[00:01:45] The big O of this solution is O of N times M and the big O of the previous solution is O of N times M. So, in a theoretical sense, I think we could say that these two have the same theoretical performance. But in a practical performance sense, we know that the candidates list is significantly shorter than the Elements list.
[00:02:18] Perhaps even an order of magnitude shorter. So there's another takeaway that I want you to get from this workshop which is, it is important to be able to evaluate your solutions. For the theoretical characteristics, it is also important to be pragmatic about things. For example, you could say theoretically, that big O of one and big O of 2000, in theoretical terms are equal one unit of work and 2000 units of work.
[00:02:53] Are equal in the theoretical sense because those never grow no matter how big your data set or input is, you either always do one operation or you always do 2000. And so in the theoretical sense, we consider both of those to be constant time because they don't grow with the inputs.
[00:03:13] But I think you and I both know that doing 2000 operations versus doing one operation is actually going to be measurably different in performance. Even if it's not theoretically different in performance, it's measurably different in performance. So algorithmists definitely play in this space of theory and they validate and design algorithms based upon the theoretical things.
[00:03:39] We cannot forget the practical implementations, if I run an algorithm that can do O of N, and I run another algorithm that can do O of 100N, theoretically those are the same algorithm. I'm gonna choose the O of N, because it's 100 times more efficient than the O of N.
[00:04:07] All right, you follow the difference between we're talking about? You do want to first consider the theoretical space but don't ignore the practical consideration. In this particular case, in theory, these are the same big O as far as I can tell but in practice, there would be a difference, would it be a big measurable seconds delay?
[00:04:28] Absolutely not, we are talking about maybe a dozen miliseconds, maybe probably not even that much right, so not we're not talking about a giant difference. But like we said, paper cuts add up so we don't want to do the extra work if it's unnecessary. Any questions about this first exercise hopefully it got your feet wet with working through thinking about the algorithms and then translating them to code yes?
>> Several people asking for a quick recap.
>> Fundamentally, the algorithmic approaches were very similar. Both algorithmic approaches, option one and option two, recursed over the input strings. The big difference between option one and option two, is that in option one, for each candidate will say each chunk of the input string, we matched it against the entire Elements list.
[00:05:26] Which was the most naive way of doing it It's saying if I look at this one character, go see if I can find it there and we didn't even have an index in place by the way. Don't forget the whole index thing to make it an O of one lookup, so we didn't even have an index.
[00:05:39] We just looked it up in the most naive, dumb way possible. In the second approach, we said, we don't need to look at all the elements for each one of these characters. We can actually do a pre-pass through the inputs and collect any of the possible candidates that ever could match.
[00:05:58] And only if we found those, those are the ones to look at each time we recurse down. So we have the same theoretical big O, but in practice, we're doing significantly less work for each of the input characters. And the penalty that we paid was one loop through.
[00:06:20] You'll notice that the find candidates function has has an O of N complexity to it. So what we end up with and this is a subtle but important point is we end up with a big O (n+) rather than N times. We had to pay the N penalty N being the size of the number of elements in the periodic table, which you could argue that that's not even an N.
[00:06:46] They're not coming up with new periodic table elements, it's pretty fixed. So you could argue that this is O of 118 rather than O of N, but we did have to pay 118 iterations to construct our indexes effectively. We're constructing effectively a one-letter and two-letter symbol indexes into this array, but only the ones that we care about.
[00:07:09] So it's a partial index, that's effectively what candidates is doing. So, we had to pay an O of 118, to do that, but think about potentially the longer the word that you write, the more times there's a call stack recursion happening. All of that saved work of having to go back over the Elements list over and over and over again.
[00:07:31] Many 100s or 1000s of operations saved and doing that, there was another question?
>> How did they compare in terms of space complexity.
>> So in memory, remember there's always a trade off between the performance, the raw time performance versus the memory usage performance. We still have the main data structure elements the same between the two options.
[00:08:03] That would be the big usage of memory that we've talked about and that's equal between the two. Option two did construct a little bit more memory usage, in that we created this symbols object which had properties that were references to those other objects. So we had an object with 118 properties on it, and then we created two arrays, one letter symbols and two letter symbols.
[00:08:34] Total of which probably has on average I'm just gonna guess I haven't actually calculated it but I'm gonna guess on average for these words there's probably 10 candidates max. So we're creating an array with 10 elements and an object with 180 team properties that just have references on them, that's a tiny bit more memory, but it is more memory.
[00:08:58] Notably, from the algorithmist perspective, that's still O of 1 memory. None of our memory grew N proportionally or more based upon the input size. Slightly more candidate growth, but it's definitely not linear, I don't even know how to what it would be, but it would be sublinear, it would be logarithmic approaching fixed.
[00:09:27] Final check in here give me some thumbs up thumbs down, how do you feel about this exercise? How do you feel about having got your feet wet using some recursion and some algorithmic thinking and translating into the code? We feel like the approach made rough sense or do we feel like there's still some difficult parts for us?
[00:09:46] Okay I see a few good thumbs and a few wavering thumbs so that's all right I think somebody asked me right before we started the workshop. And I'll say for the benefit of those watching the recording, you're being presented a highly compressed zip file. This information is something that many people spend weeks, months years learning.
[00:10:12] And you're watching it over the course of eight or 10 hours or whatever it ends up being tightly compressed. So if you have that feeling of I kind of get it but I really don't fully get it yet. You're in exactly the right spot, you're in exactly the right spot so don't feel bad it will take a while to re-review I'll take this moment to remind folks of what I always say in my workshops, take more time.
[00:10:40] Go back over this, try it again from scratch, in fact, I'll give you this suggestion. Right now, wherever you are and whatever time it is that you're watching this, open up your calendar and pick a date a week or two in advance. And mark yourself off a couple of hours of review time for this, and then do another one a couple of weeks after that.
[00:10:59] If you wait until those days come, I promise you your calendar will fill up, but if you reserve it now and reserve time out in the future now, so that you have an opportunity to come back. And try this stuff at a later time, you will thank yourself because it does take time for our brains to unpack and review and it is important for us to go back.
[00:11:21] We didn't have many slides for this segment but go back over what we've gone over so far and retry the code from scratch from yourself. Your future self will thank your current self