Transcript from the "Big O Complexity" Lesson
>> We're not doing any of these parts yet, cuz we haven't written that code. That code's a bit more complicated, but we'll tackle that in a second. But we just know that we got 6 hops. And I want you to notice something that I didn't tell you about before.
[00:00:12] But I inserted into this exercise a little bit of rudimentary performance benchmarking. It's very, very crude. But notice that that operation took us 0.3 milliseconds, it's pretty quick. You can probably imagine that as we increase the number of hops, starts to take longer, right? So I went from 10 to 11.
[00:00:41] I want you to see what happens from 2.7 to, that was probably not a good example. I think the other one was an outlier. Let's go from 12 to 30. Actually, I know that this doesn't start to really show up until we're at a slightly higher number, so I'm gonna start at 20.
[00:01:02] That was 0.19 seconds, that's 190 milliseconds. What happens when I go to 21? So we went from 0.19 to 0.46, that's a little bit more than double, isn't it? What happens when I go from 21 to 22, any predictions? 0.94, a little bit more than double. 23? A little bit more than double.
[00:01:43] You spotting a pattern here? It's doubling a little bit more than doubling each time. We have a word for this in computer science, we call this exponential growth. It's problematic, isn't it? We're only 23 hops, and we're already taking 2.31 physical seconds. What do you predict is gonna happen when I do 24?
[00:02:09] What's it gonna be?
>> Approximately 5, maybe 5.2, 5.3 seconds, we'll see. It's painful to wait. Slightly less than that, all right. My math wasn't graded estimating. Where are we gonna be here, a little bit over ten seconds maybe, maybe slightly over ten seconds? We'll wait the ten seconds, it's worth it just to prove it.
[00:02:35] I'm not gonna go any further than the ten seconds, but I think you get the point. It's steadily growing slightly more than exponential was up to 12, all right? So we have a working solution that's completely impractical for any arbitrarily large datasets. This is where the theoretical math starts to kick in.
[00:02:58] It's where the theorist theoretical gist of the algorithmic job kicks in. Because we could have looked at the code before we ever ran any tests and known that this was gonna be true. And I wanna help you see that, I wanna help you understand that. So I'm gonna go back to the slides.
[00:03:21] In fact, it is an o(n) of not 2 to the n, which is exponential, but 2.222 repeating forever to the n, which is slightly more than doubling each time. And how do I know that it's 2.222? Well, I can actually do the math, because I read this. But there's another way that later we'll come back to, and we'll see that 2.22 coming back to us.
[00:03:49] There's another way of verifying that, but I'll just give you a quick little preview. If you think about all of the places that I can go from any digit. Some of the digits have two places that I can go. Some of them have three places that I can go, right?
[00:04:08] If you average out the number of places on average that I can go, guess what the average is?
>> 2.22, and there bears out for us. It's rare, actually, that we get something that is so crystal clear the empirical evidence that we have matching with the theoretical theory that we have.
[00:04:32] But we could have looked at this problem and said even before I ran it the first time, I could have said I know for a fact this code is going to run at slightly above exponential growth. You follow me, cuz I can just look at what it's doing.
[00:04:47] I know that it's doing for everything, it's doing an average of 2.22 more things, so it's definitely going to grow slightly more than exponential. We call this exponential. Now, I've used a couple of different terms. I've used constant, I've used linear, I've used exponential. What are these? These are ways of articulating the severity of the equation, the theoretical worst case growth.
[00:05:15] This is a chart that I stole directly off of bigocheatsheet.com. You should totally go to that website. It's got a lot of great information about this. So it's good foundational material and back reading. But I want you to look at, I've listed up there on the top, we've got starting from the top, we have kind of the worst of the Big-O's.
[00:05:35] At least the worst of the most commonly named Big-O's, all the way down to the bottom, constant being the best. So we have constant time is awesome. Again, technically the best is zero, cuz we never did it. But if you have to do some work, the best growth is the growth that never grows because it just flatlines.
[00:05:56] That's that black line there at the bottom. Or, the next step up is what we call logarithmic. That's that darker green. It's almost horizontal, isn't it? No matter how far out we go, it's almost horizontal. There's barely any growth at all. And then we have the o(n) is up there at that line, you see that it kind of steadily grows up.
[00:06:18] It's not growing super fast, but it's growing in a line. Then you look up at o(n) log(n), that's what we call linear logarithmic. So it's ever so slightly starting to bend up. It kinda looks mostly like a line, but it's starting to bend up, and the further up you go, the higher it's gonna get quicker and quicker.
[00:06:37] It's going a little bit faster than o(n), actually, fairly faster. We're into that orange, that's kinda that warning sign. If you're in this area, you should be a little bit concerned that maybe it might not be practical in production. Once you get into the red, you're almost never gonna see one of those deployed in production unless you have a pretty fixed constraint on your problem set where you know it doesn't matter.
[00:07:02] It could grow of n squared because n never gets above 5, big deal, right? But if n can ever get out of our control, then we have to be really concerned about it. So we have o(n) squared, which is our exponential. I'm sorry, that's our poly. I just did it.
[00:07:19] I was about to tell you, I always mix this up. That's polygnomic n squared. And then we have 2 to the n. That's the very close one that we just looked at. That's exponential. Polynomic and exponential, n squared, 2 to the n, I always mix these two up, and I did it just there.
[00:07:35] I was even trying not to mix it up, and then I mixed it up. N squared is polynomic, same thing with n cubed, or n to the 4th, or whatever. 2 to the n, or 3 to the n, or 4 to the n, or 2.222 to the n, that's called exponential.
[00:07:52] And then the worst of all is factorials. And there are technically worse than factorial, but they don't really go by any commonly known names. So these are the commonly known names that you need to be familiar with. If you're gonna speak, computer scientist, you're gonna speak algorithm, you need to be able to say.
[00:08:10] Yeah, I think my algorithm grows that linear logarithmic, or I think it's exponential, or I think it's polynomial, or I'm pretty sure it's constant. You need to be able to say those kinds of things. Remember that these are an estimate of the worst case. A lot of times, we like to code based on, well, I'm pretty sure that in the average case, it's gonna do great.
[00:08:34] And the algorithmist is gonna look back at you and say, I don't really care what the average case is. What I care about is is this gonna crash because in the worst case it is exponential, and it takes you a thousand minutes to run or something. Here's a true story on this.
[00:08:52] Way back earlier in my career, almost 20 years, certainly more than 15 years ago, very early on in my career, one of the jobs that I took was as, my job title, it sounds better than it was, but my job title was user experience architect. And I did not work in the engineering department, I actually reported directly to the product manager for a software company.
[00:09:14] And my job was to design and prototype out and do user testing of interfaces and ways to interface in our enterprise software. And then to develop that hand those designs and those prototypes and the tests off to the engineering department who then implemented it. And then I had to work with QA to make sure that they did it right, that they implemented what we had designed and tested it.
[00:09:37] All right, so that was my job. So I didn't report to the engineering director, but I worked closely with them to make sure that they implemented it. That's how in theory it worked, here's how in practice it worked. Nine times out of ten, they took my code and just dropped it into production.
[00:09:52] So I knew that I did need to be careful because they were more likely to just, we're all lazy as engineers, right, lazy as engineers. If it looks good enough, just drop it in. Why re-implement it, right? So I knew that there was a bit of an additional responsibility that I had to make sure I didn't just hand off really terrible code.
[00:10:14] Well, we had this feature that they wanted me to design. And it was effectively a multi-column sorting kind of a thing on this particular data table. And so I wrote up some quick code for it. That was n squared, it was polynomial. I don't think it was exponential, it was polynomial in its theoretical growth to do this multi-column sorting thing.
[00:10:36] It's very quick and dirty code, and I put it was maybe, I feel like it was maybe about 10 or 15 lines of my code, and I probably wrote double that in code comments, in all capital characters saying this code should absolutely positively never go to production. This is the prototype, and it's being tested on a data set that is a max of 15 elements.
[00:11:03] But we don't know what our customers are gonna do with this, and our customers could have thousands or tens of thousands, right? I put all this out in the code comments. I told my boss, I was like, I don't have time to go write the correct version of this.
[00:11:15] We need to ship this feature. But this is our prototyping that we've user experience tested, and it works well in the small. But if they just took my code and shipped it, that would be bad. You probably can tell where this story is going, because there was a lot of pressure to ship this out.
[00:11:34] Whatever engineer it was, I don't even know who it was, but whatever engineer it was, just took my code, took out all this code comments and dropped it right into production. And I told the QA department, use a larger dataset. Make sure that even with a larger dataset than 15 that I had available to me, fake some data, do something, but test it with a larger data set to make sure that the engineers have accounted for the performance of this.
[00:12:00] If somebody clicks on this thing, that it's not just gonna hang or die, or whatever. It was kind of an important screen that we were adding. So guess how many test elements the QA department used? The exact same number as I did, they used the 15 dataset. I told them use hundreds or thousands or something, just so that you'll spot it if there's a performance problem use something bigger.
[00:12:23] Everybody along the line after I had handed it off thought, I can't imagine a customer of having more than 15 elements in this table, we will never have more than 15. So they just used 15 because they thought it was fixed at 15. High profile feature, we did a press release.
[00:12:39] We launched it with our biggest client. It crashes while the CEO is doing a demo to his wife and other shareholders, totally crashes. And guess who got blamed? Even though I said over and over and over again, this code is not ready to go to production. I'm not the engineer.
[00:12:59] They still shipped that code because it turns out that that customer had 10,000 records in that table. And he was like, look how cool this is when I go to sort this. It's just gonna sort real quick. And it probably would have taken 30 minutes to sort it, if I'm doing the math correctly, okay?
[00:13:19] So this stuff does matter, it absolutely matters. I got the blame. Even though I had tried as much CYA as I could possibly muster, I got the blame, because I didn't make sure that QA had done a large enough test dataset. So anyway, this stuff matters.