The Last Algorithms Course You'll Need Data Structures Q&A
Transcript from the "Data Structures Q&A" Lesson
>> It's just something that looks interesting, you put on the wall.
>> It just looks interesting? Yeah, it is interesting, I accept that, interesting is part one. Frustrating? Could be. No guesses?
>> Is it an ArrayList?
>> ArrayList, yeah.
>> ArrayList, great guess. I'll tell you if you're right or wrong cuz we're gonna find out empirically. So I did write this little, I didn't know where to put it so I just put it right in this thing, so you can even follow along if you want to.
[00:00:43] All we're gonna do here is we're gonna simply create an array, right? It's just a number array. I'm gonna be able to measure a function, we're measuring in milliseconds, we don't really care, it's not super precise. We just wanna get a gauge of how fast or how slow things are, right?
[00:01:12] We have push, exact same thing, pop, exact same thing. So hopefully we all see exactly how that works, very, very simple functions, just push, pop, shift, unshift a number amount of times. Lastly, we're gonna have get based on index. So just in case it is a linked list underneath the hood, if we were to get progressively larger indices, we should see a slowdown that keeps on, we should see a linear slowdown, right?
[00:01:34] So then I create this thing, Just some higher order function nonsense. I could push it, I can create something that will just have the same function that does, calls push with the same count every time, right? So I can call it over and over again, I don't even know why I did it this way, but it just feels like it was a great way to do things at the time.
[00:01:51] I can't remember what code comes down below. Lastly, we're gonna test with 10 elements, 100 elements, 1000, 10,000, 100,000, 1 million. Whenever you're doing performance testing, a little tip is you should always do a stepladder type approach. That way you can see how it grows. You can really actually run, say, a linear regression and say, hey, this is kind of more what I'm seeing right here.
[00:02:10] It's not accurate, it doesn't always work exactly, right? So there you go. So you can see for every test I'm gonna do, I'm gonna push in the value and then I'm going to do a get at the last element. So if indexing is linear, we should see a slowdown as the array gets bigger, correct?
[00:02:29] Correct, right? It should be 10x slower effectively every single, 10x times some constant. Pushing, exact same thing. We're gonna push 1,000 items after growing it to a certain length. So first, we're gonna grow it to 10, we're gonna grow to 100, we're gonna grow to 1,000, 10,000, 100,000, 1 million, 10 million, and then at each one of those points, push in 1,000 items.
[00:02:49] If push is based on the amount of items within the array, we should theoretically see a slowdown that happens, right? It should be linearly getting slower in every single step. Same thing with pop, exact same item. Same thing with unshift, and the exact same thing with shift. If any of these are linear-based as we increase the size of our array by a factor of 10, we should see a great slowdown in any of these.
[00:03:12] And so that way we can kinda test all the operations, and at that point we should be able to empirically know what type of list are we using. All right, let's do this source. It's, I did it in TypeScript, npx ts-node. [SOUND] So what do we see here?
[00:03:35] Well, notice that get, it really didn't matter the size of the array, right? It was always super-duper fast. Push didn't really matter, pop didn't really matter. Unshift, you'll notice it just got really, really slow, we're not even done yet. It just got horribly slow at some point. And so what are we seeing here?
[00:03:54] Well, we're seeing all these effectively constant time operations, and there we go, awesome. And we're seeing shift and unshift grow. It actually looks like it's effectively a 30x growth or something like that, but we'd call that a linear growth. I would call that a linear growth at that point, 10x the array, something like 10x to the time it took to run.
[00:04:14] And so let's just write that out. So if get is O(1), if push/pop is O(1), and un/shift appears to be something like O(n), we were correct in saying it's probably an ArrayList, right? That is effectively what we're seeing here. We're seeing instant access, we're seeing instant pushing and popping at the end, we're seeing linear shifting and unshifting, or enqueue, dequeue.
[00:04:51] And a reason of course, we can now reason in our head the exact reason why. Underneath the hood, what is happening? 0 to N, you're shifting over N elements, however that many is. I had 10 million at one point, that should take a long time. If I had to do that 1,000 times, you can see it took multiple seconds on my computer just to execute.
[00:05:10] That's a real problem, right? Right, so hopefully this was actually a cool moment. I was really hoping that this would kinda culminate in a cool moment where we really understand all the different types of lists there are. And then we can empirically look at something in which we don't even know what the execution is, but we're able to suss out likely what it actually is underneath the hood.
[00:05:29] And so, was I correct in saying that what if I had told you const a is not an array? Yes, Morpheus, the answer is it is definitely not, and it is an ArrayList, awesome. I hope everyone's awesome. I always was very curious why they didn't use a double-ended ring buffer, I'm sure there's a great reason why.
[00:05:47] I just don't know why, but I'm sure there's a great reason why that doesn't happen. All right, is there any questions at this point? Cuz I think the next slide is actually, yeah, anything unclear? We're kind of ending the list section.
>> One question, the shift, unshift was exponential, right?
>> I wouldn't put it as exponential, I'd put it as linear.
>> Linear, yeah, sorry, linear, so-
>> Sorry, let me rephrase the question. Shift and unshift was exponential, no, it was linear, sorry, keep going with the question just so.
>> It's a linear, right? The shift and unshift, okay, so what's the conclusion?
[00:06:26] The question was, was that an array or not, right, that question?
>> Yeah, was it an array or not? It's not an array cuz it's curable. You can push and pop, it actually has a notion of an ending and a beginning. Those are all very, very different from what you would get with an array.
[00:06:41] An array does not have a capacity versus a length, an array is contiguous spot of memory, or a static array is what people often refer to them. They refer to a length or ArrayList as dynamic arrays, things in which you have these extra operations on top of it to be able to kind of accordion it according to what you need.
>> It's not an array, but what can it be?
>> It's an ArrayList.
>> Yes, and the kicker obviously is this right here. So for those that don't know, he asked, if it's not an array, what is it? It's an ArrayList because we do get all these sweet operations, but we don't get this.
[00:07:13] But it does provide growing, it does provide the ability to push and pop at the end, it's keeping track of a lot of extra things underneath the hood. Therefore, you can just call this is a dynamic array or an ArrayList.
>> What about slice?
>> Slice, well, slice is just a copy, right?
[00:07:30] You might be trying to say splice, that might be the confusion here. So slice is, effectively when you do a slice on an array, and you're doing it from, say i to j, you're just saying I wanna copy from here to here, right? Starting at i up into, but not including j.
[00:07:48] So that's effectively just like a mem copy, right? So that's a linear operation, all it's gonna do is just take that amount of memory, copy it to a new location, hand it to you, there you go. And so if you're using an ArrayList underneath the hood, you could effectively copy that, you could create a copy method, right?
[00:08:04] Hey, create a new array, copy all my elements in, and then take all my state, and there you go. We have two lists, two separate copies, and now you can adjust them, right? So that would be a slice.
>> Is there a way it's not a copy, where it's a slice and it's-
>> Great question. Actually this, I made a YouTube video just recently, by the way, subscribe to my YouTube, I'm almost at 100,000 subs, my God! No, anyways, sorry, hit that notification bell, I've never, okay, I've never said that. But there is actually this one thing that I really hate, I've made a video on it before, is that there is actually these things that are, it depends on what you mean by a copy.
[00:08:37] And so let's look at this. So if I go const a = new Uint8Array, and let's create a 10 big, right? And then go a
 equal, oops, equals 5, a[1 ]= 5, a = 69 for the memes, there we go, we've appropriately done all the memeing, there we go.
[00:08:56] So now, if we do this and we go const b = a.slice, say (0, 5), and I go b = 5, if I look back at a, notice that 3 is not 5. We have done a deep copy, if you will, of the data, does that make sense?
[00:09:14] Now, here's where things get great. Const, let's call it a buf, equals Buffer.alloc, say (5), we're gonna create a buffer, this is a node item. Here's the best part about it, though. Instanceof Uint8Array, it is a Uint8Array. So by even TypeScript definition, it's a Uint8Array, correct? Now we can go like this.
[00:09:38] Buf., I think it's set, let's see, what is it? My goodness, what are you? Is it write? I always forget these things off top my head, what are you? There we go, writeUint8, there we go. Let's writeUint8 at position (0, 5), and then it's gonna say this, a the value offset range, blah, blah, blah, blah, blah.
[00:09:59] I had it backwards, of course, write value 5, offset 0. Node has always managed to be backwards on all the APIs, I swear. If we write out a buffer right now, you'll see it has 5 in that first position, awesome. So let's do the exact same experiment, const buf2 =buf.slice(0, 5), and go buf2.write.
[00:10:25] Let's go like this, we can go, there we go, and then just put a buf.2, so I'm gonna go buf.2, let's write it at offset 3, and if I look at buf.1, what just happened? Buf.1 was, so this is actually a shallow slice. So buffers in Node are shallow sliced, meaning that they just simply have offsets within this contiguous memory space in which they're actually just saying, hey, I exist between these two places.
[00:10:50] And so when you say, hey, give me a slice, I just simply adjust the indices, and then still point to the same piece of memory underneath the hood. Now, obviously, the danger of course, is where I showed you the whole, where are you? Where did I do that?
[00:11:06] The instanceof, right? Wherever that was, I don't even know where that was, but it, for your APIs, looks like a Uint8Array, but it's a buffer. So if you call it slice, you're getting a shallow slice, which means you could mutate data you think you're not mutating. So very, very dangerous, very, very Node-like.
[00:11:21] All right, there we go, so hopefully that makes sense. Slice, it can be linear if it's copy. It obviously can be constant if it's literally just pointing to new spots. So buffer would be like, you could effectively say buffer is a like implementation of an ArrayList, right, underneath the hood at some contiguous buffer.
[00:11:43] I'm sure even if we print out the array buffer that's backing the buffer underneath the hood, we'd actually probably see a much larger one. We might even see source code and other stuff like that inside the array buffer. Yeah?
>> Just a general comment, I've seen quite a few people asking, where would I apply this in my code, or what are some typical examples that you've run into?
[00:12:06] For instance, you said you use x data structure quite often, so?
>> Where would I actually use these data structures? I've said that I've used them, so where would you use them? I did give the example of say, if you're doing any sort of log or flushing. So flushing in general, things in which you batch operations and then wanna release a bunch of operations, you're using some sort of data structure underneath the hood, and you're gonna wanna be efficient with it, right?
[00:12:29] You don't want something that has a linear add or remove time if you're not flushing the entire thing. So you could imagine that could be really bad for a service. You tend to run into these things a lot more on the back end than you do the front end.
[00:12:41] You tend to run into them on the front end when you're writing a library. So if you look at I'm sure the react code, if you look at any code, I am positive you'll see data structures just around. Because they're trying to keep track of a bunch of state, and they're trying to be as efficient as possible.
[00:13:06] You just have to be able to do all these things because that's what you just had to be able to do those, because at that point we had to be as efficient as possible. So everything needs to be O(1), if possible, we cannot interrupt the UI. And so that was just a part of it, and it's a very, very interesting kind of concept.
[00:13:21] So that's where I tend to run into it is more back end or library dev. Never really, if you're just writing a UI component often, you're not actually doing logic, you're just kind of displaying logic or you have very limited interaction with logic. And so therefore, it tends to be a little bit more shallow on those.
>> When you're dealing with big data, what is your go-to?
>> So I don't deal much with big data, I'm not the best person to answer that one. I use Hive, Hive does a bunch of really sweet things. And my Hive experience is select star from. [LAUGH]
>> Okay, okay, so just-
>> And then I eventually write some joins, it can't compile, then I go to my data scientist friend, go, help me, please. I'm poor in Hive, help! And then they help me and then I get answers. And so that's what I usually do at Netflix at least, so.
[00:14:09] But big data or often applications of these principles over, say, a map and reduce, you have consistent hashes, you have to map things to specific computers that we can run medians, fasts, blah, blah. They're all applications of these algorithms but ran distributedly, and that obviously is much harder.
[00:14:24] Kafka is just a queue, but it's a much more complicated queue. And so now that you fundamentally understand what a queue does, you can imagine what Kafka does, except where you have to somehow solve this whole in order, async multiple computer problem, right? So it's just a distributed queue.
[00:14:41] And so it does help you visualize what's actually happening on the other side. And I think it's good for you, I think it's really good to understand these things.