Complete Intro to Computer Science Spatial Complexity
Transcript from the "Spatial Complexity" Lesson
>> No, it's always talking about time for the most part, right? Or computational complexity. I think, for the most part I call it, in this class, computational complexity. But I think, it's actually probably referred to more, as time complexity, but you can kind of more or less tie those things together.
[00:00:17] They're not totally the same, but for the sake of this course, they are totally same. And it's the idea of, if I give this more inputs, how many more CPU cycles is this gonna take, how much longer is this gonna take, right? And when I talk about Big O, and when people talk about Big O, without referring to what they're talking about, they're almost always talking about time complexity.
[00:00:43] There is another type of complexity that's called space complexity or spatial complexity. And that's, basically, how much RAM are you gonna take, RAM or hard disk space, how much stuff do you need to store, for an algorithm? And sometimes this ends up being really important, this used to be extremely important to me in my job at Netflix.
[00:01:06] Because I was working on things like Roku sticks, and PS3s, and things like that, that had very, very precious little bit of memory.
>> Will complexity be the same for network transfers, as well? Like the size of the transfer you're making, would that be evaluated as spatial complexity, as well?
>> So the question is would you have considered network transfer to be the same sort of thing as space complexity? I don't think so, I think I would consider that in a different vein. You totally could consider that in a Big O way, I just haven't. But no, this was much more like, how much stuff am I storing, locally?
[00:01:52] And then you could probably consider network traffic, and in a yet different thing. That's a good question, I'll have to think more about that, cuz I haven't really thought about it. But, anyway, for the sake of this class, we're just talking about how much RAM you're taking, right?
[00:02:08] How much stuff you're creating. So yeah, when I was working at Netflix and working on the Roku stick, the PS3, all that kind of stuff, those devices actually have plenty of CPU, right? So they're not hampered in that particular sense, but they just have no memory. They don't give any of the programs that are running on them no memory.
[00:02:31] And so from when we were signing people up on Netflix, that's what I used to work on. We had to be extremely careful of how much RAM we were taking because, that could really hamper user's viewing experience, or sign up experience, in this particular case. So that's why, you don't always have to consider spatial complexity, but when it's important it tends to be very important.
[00:02:52] So let's talk about that. So if we're gonna do something that's linear, right? If you remember linear is Big O of N, if it has linear spatial complexity that would be say, like if we had an array of ten items, our algorithm would probably create ten arrays, right?
[00:03:13] So you can see here, when we take an array, and we make more arrays out of that array, that just tends to churn up RAM space. And for something like a PS3, which is very, very memory limited, that can be a bit of a problem. Because if you're churning through a lot of memory, that ends up being a really bad user experience, right?
[00:03:34] So we were always talking about the spatial complexity, and we never wanted to do anything that had linear spatial complexity. But typically, the thing that you're doing, is you're creating additional arrays, that's what causes spatial complexity, it's not always true, right? You can be creating other temporary variables but normally, at least in the terms of sorting algorithms, it's gonna be arrays that are gonna cause that spatial complexity.
[00:04:01] Logarithmic, as you might imagine, you create less overtime, right? So if I have 10 arrays, maybe create 7 arrays, by 100 it'll create 12 arrays, you can see this kind of like this diminishing return. That would be considered logarithmic. Constant would, basically, what you'd say, either if I have an array, I'll create five arrays.
[00:04:21] But if it's of length 1 million or length 10, I always create five arrays, right? That would be considered constant, right? Because it's the same amount no matter what. And let's talk about quadratic, this is actually kind of an interesting one, cuz this is something that I actually did at one of my former companies.
[00:04:40] Let's say we had to evaluate the distance between two zip codes. And a zip code is a, they more or less exists all over the world on the sense of, it's a number that refers to a parcel of land in the United States, right? So for example, I live in 9103, and let's say, which is a zip code in Seattle.
[00:05:02] Let's say I wanna calculate the distance between that and 10001, which is I think is in New York. You can see that I have these two zip codes, and I wanna calculate the distance between those. Now, let's say that computation is really expensive, so we wanna pre-compute, all of those various different distances between each other.
[00:05:26] Well, that says for every item that I add to that database, I have to now create an additional item in the database, or sorry, an additional item for every other item in the database, right? So if I have 10,000 zip codes in my database and they add a new zip code to the database, that means I need to create 10,000 new items and store them.
[00:05:50] This would be quadratic, right? Because every item that you add to the database, that means they have to add even more the next time, right? And it just grows kind of out of control at that point. Now, we actually did this that one of the companies that I worked at, we only did it for zip codes in the state of Utah.
[00:06:07] But the reason we did that is because, it was so expensive for us to call an API to get that information of the distance between two zip codes. That it was cheaper for us to just buy a really big database and put it all in there, right? But that to me was kinda funny, cuz it was an example of a quadratic spatial complexity.
[00:06:26] Typically, you never wanna have quadratic spatial complexities, that's really the only time I've ever seen it be appropriate. So there's a great question on here, how does this apply to functional programming, because functional programming, churns through a lot of stuff, right? There's a lot of arrays being created and destroyed.
[00:06:44] The first thing I wanna say to you, is that, normally it doesn't matter, right? Normally, it doesn't matter that if you're looping through 100 things or 10 things, or probably even 1,000 things, you are creating and destroying a lot of stuff, but modern computers are pretty fast, right?
[00:07:04] Even an old Android phone should be able to handle that fairly well. So normally, it doesn't matter. But the second thing I wanna say, is that, when it does matter, functional programming is usually the first thing you kinda have to refactor out, because it does unnecessarily frequently churn through a lot of memory.
[00:07:24] For example, we had to basically ban functional programming in that Netflix TV code repo, because of that reason, we just could not afford to churn through all of those arrays. So yeah, rule number one, if it doesn't matter, it doesn't matter, and ignore it, right? Rule number two is, yeah, frequently you have to refactor out functional programming style, code when space is a concern.
[00:07:51] Does that answer your question?
>> So basically, we can combine different [INAUDIBLE], yeah, so?
>> Use other style of programming, yeah, okay, thank you.
>> Yep, for sure. And we'll get into, I think, my next section here is actually on trade offs, it is. But typically, you just wanna wait for it to become a problem, before you solve it, right?
[00:08:16] People like to solve problems before they have them, and that's a really bad idea, right? So wait for it to become a problem, and then when it's the problem then go refactor it out.
>> Premature optimization.
>> Yeah, premature optimization, it's a mouthful. [LAUGH] Yeah, so the question is what's driving these constraints, right?
[00:08:37] And I mean, most frequently your boss isn't gonna know where those constraints are gonna be. So that's why it's on you to ask a lot of these questions like, hey, am I running on a low-end Android device in rural America? Or am I running on a super computing cluster that has basically infinite space and computational power?
[00:09:01] Cuz those are gonna be very wildly different answers, right? Or in this particular case, I think the PS3 is a great example of, you know that you're extremely memory constrained, but that PS3 processors was very, very powerful, right? So that means that anything that we could shove towards this time complexity side, I mean we could do really inefficient things with the CPU because those cycles were very cheap to us.
[00:09:24] Whereas we had to be super conservative with our memory usage, right? Whereas, normally, like when I'm coding for the cloud, right? I mean, I work on Azure at the moment. I can just throw more RAM at it, RAM's cheap, right? So I can use really inefficient spatial complexity things.
[00:09:40] The question you should be asking yourself is, where is this running? What kind of devices is this going to be running on? And how important is it to the user? And how big of inputs am I expecting, right? Cuz if the answer is you're only gonna be evaluating two or three items at a time, it just doesn't matter, right?
[00:10:00] In which case, write the simplest code possible. So yeah, just to wrap up here on spatial complexity. Again, this is just another ruler that I'm giving you to measure your approaches to problems. So that you can decide, based on whatever the the task is at hand, to choose what kind of algorithm you're gonna use.
[00:10:19] And it's never gonna be like, well, we're gonna use Quicksort, and we're gonna do that. Normally all this stuff that I'm about to teach you in this course is actually built into the language, and you don't actually really need to know how to do it. It's gonna be much more like, hey, we're creating type ahead for our search bar.
[00:10:40] And we can either pre-compute that with trees, or we can do it on the fly and just rely on the CPU to do that on the fly, which is the correct thing to do. And again, I hope your answer to that question is, it depends, right? Where's this gonna run?
[00:10:55] What kind of devices,? What are my users doing? How big is that dataset, right? Those are the kind of questions that I'm hoping that I'm just gonna ram into your head, so that you know to ask that question continually is like, give me more information.