Complete Intro to Computer Science Heap Sort
Transcript from the "Heap Sort" Lesson
>> Alright, let's talk about heap sort. Heap sort is kind of another really interesting one. You kind of are gonna emulate a tree inside of an array and then by doing that we're able to kind of identify what the largest number in the array is. And then we can use that data structure to then sort a list by knowing always what the largest number in the array is.
[00:00:26] So, let's kind of unpack that a little bit more. So again, a heap is an array. A heap is inherently array, but it represents a tree like data structure. You're gonna find that these heap data structures are kind kind of similar to binary search trees, but not totally.
[00:00:49] Heaps are used a lot in computing, they're typically used to represent things like priority queues. And so, a priority queue is a queue, but every item in the queue has a priority associated with it. A good example is like network stacks, right. Some Internet traffic that you have is more important than other pieces of Internet traffic you have.
[00:01:14] A good example that would be like, if you're doing Netflix and Dropbox at the same time, you want that Netflix traffic to be the most highest priority because you don't want your video to stutter, but your Dropbox can sync whenever right. If your Dropbox drops some packets or it gets slowed down or something like that, that's totally fine.
[00:01:30] You're gonna take 90 minutes instead of an hour to sync all your changes. So, that's all represented by a heap typically, not always but frequently. So, anytime that you're talking about priority queues, your brains just automatically go heaps, right, cuz that's just like those two just kinda go together.
[00:01:56] Okay, so let's talk about the difference between a binary heap which is an array and a BST, which is a tree like data structure. So BST is typically made up of node objects. Typically, I imagine you can model it other ways, but the way that we just did it together, it was made up of nodes, whereas a binary heap is always an array.
[00:02:18] The biggest difference here in a binary search tree, there is a strict rule that if you have a node, everything on the left is smaller and everything on the right is bigger, right. That's a strict rule for a binary search tree. If it doesn't adhere to those rules, it's not a binary a search tree.
[00:02:36] That's not necessarily true in a binary heap. In a binary heap, the only rule is in fact, you can see it here, the numbers that are higher so like the 7 is always bigger than the numbers that are lower, right. So in this particular case, well, I have another one down here, yeah, let's look at this one.
[00:03:02] So this 100, it's bigger than 19 and it's bigger than 36. So the only guarantee that I can make about a binary heap is that, all the numbers underneath it are smaller, right. So if I have 19 here, all the numbers underneath 19 the only thing I know about them, is that they're smaller.
[00:03:18] There's no semantics of what goes left and what goes right, the only thing is that I know is that anything that's lower than it in the tree is smaller, right. So, you can see here I have 3 over here, right, so this wouldn't be a binary search tree because 3 is on the right.
[00:03:36] But it is a valid binary heap because 3 is smaller than 19. Does that make sense so far? Cool, so that's the next rule. If you do an in order traversal of a BST, you get an order array, right. That's one of the things that you and I looked at previously.
[00:03:59] That's not true with a binary heap, right, we could do an in order traversal here, right, and you would not get a sorted list out of that. And then probably one of the last really big differences is that a binary heap is always a complete tree. Let's see what I mean when I say complete.
[00:04:17] It always fills out as maximally as possible, right. So notice that 2 and 7 are here underneath the 17 and 3 has no children. If I moved 7 to be underneath 3, that wouldn't be or let's say 7 underneath 25, even though that would be adhering to the rule that 7 is smaller than 25 smaller than 36 which is smaller than 100.
[00:04:36] It wouldn't be a maximal tree or a complete tree because we have to fill out this part first, right. So if we were gonna add another number to this, let's say we're gonna add 1, right, the 1 would go underneath this 3 here, right. It has to, we couldn't put it anywhere else Cool, so that's what I mean when I say it is a complete binary tree.
[00:05:06] Binary heaps come in two flavors. This is a max heap. Max heaps are useful for sorting numbers, right. So, we'll just see here in just a second how to turn this into a sorted list. But there's also a min heap, right, which you could use to do a reverse sorted list.
[00:05:25] And the only difference between max heap and a min heap, this is a max because the top node, the root node is the largest number in the heap. The min heap would be the opposite, right, so 1 would be the root node here cuz it's the smallest number in the heap.
[00:05:44] Don't have to worry about the date today, we're just doing max heaps. Okay, so how do we represent this tree here as an array? Well for any array in the index, or sorry, in any index in the array, so let's say like 19 here, which would be index one, the way you find this left child is this formula here, which is 2n+1, and then you find this right child has 2n+2.
[00:06:15] So this is 0, so 2 times 0 plus 1, so that's 1. The left child of 100 is at index 1, and the right child is at index 2. If we go down to this one here, the index 1, its left child will be 2n + 1. So that's 2 plus 1, so its left child is at index 3, and its right child as at 2n + 2, so 2 times 1, 2 plus 2 is 4.
[00:06:47] So you can see that here is the array representation of this binary heap. 0 is the route 100 it's left child is at 19 it's right child this here, 36 at index 2, okay, and then it's left child, 17, right, is here and 3, then 36 is left child, 25 is there and then right child is 1, right, 2, 7, right.
[00:07:16] So you kinda just go down the list right that, so it's like a totally flat structure like that. And again, this is the formula 2n +1 finds the left child for any index in the array. So if I had index 10, and I wanted to know where its left child was, I'd say 2 times n, 20, right, plus 1, 21.
[00:07:36] So, the left child of index 10 is at index 21, Okay, that's how we get everything down into a squished array. That's why it's also it's important that it's a complete tree, right, cuz we're just adding numbers on to the end of the array. That's why we don't have any holes in our tree.
[00:08:04] Okay, good so far? Okay, So once you construct a heap, right? So once we make this data structure, and I'll show you how to make this data structure here in just a second. The one thing that you always know is that the index 0 is the largest item in the array.
[00:08:29] 100% of the time, right? Because the rule that this one has to be larger than the things underneath it, lends itself to the fact that the root has to be the largest because it's larger than this. It's larger than this. And these are both larger than everything underneath them.
[00:08:44] Therefore, 100 must be the largest thing. So all we do with a heap sort, is we turn an array into a heap, right? So we turn an array so it looks something like this, right? Then we just take the top root off, so we just remove index 0.
[00:09:07] And we remake the heap. So in this particular case, 36 would be the largest item here. So what would happen, we would remove 100. We'd put that into our new sorted array. We'd move 36 up, right? So now 36 would be the root. And then we'd moved 25 up to be here, right?
[00:09:26] So then, yeah, so we keep moving things up. That process is called heapifier. And then we just kind of continue to do that. We remove items from the array and then we heapify the array, again to make it a max heap again. So, The first step of making a max heap.
[00:09:59] If I have an array that's not a max heap already. So let's go through it step by step here. Let's say this is our initial array right here, and we're gonna do heap sort on this, 5, 3, 2, 10, 1, 9, 8, 6, 4, 7. The first thing that you're gonna do, is you're gonna start at index 4, which is this one.
[00:10:17] Then we're going to work backwards here. And we're gonna call heapify, which is a function that I'm gonna show you how to write here in just a second. On every item in the array, all heapify does is it makes sure that this particular item is bigger than all of its children in the heap.
[00:10:36] Okay, so I'm gonna start this one. The left child is index 9, which is this one value 7. And it's right index would be, sorry, right child would be index 10, which is out of bounds, right? So we have 1 here. We have 7, which is going to this left child.
[00:10:56] And it's out of bounds, so it doesn't have a right child. 7 is larger than 1, so we're gonna swap the left child and the parent. So after that, you get 7 and 1, right? That's all that happened there, is these just swapped places, Okay? Then we're gonna run that again on this one, index 3.
[00:11:20] Its left child is index 7, which has value 6 here, right? And its right child is index 8, value 4. So these two right here are the child of this one. Neither is larger than 10, so we don't do anything. And we just go to the next iteration, okay?
[00:11:42] We're now going to index 2, value 2, which is this one, right? Index 2, value 2, left child is index 5, value 9. And the right child is 6, index 8, right? So these two are the child of this one here. So obviously 9 is the largest here, and it's larger than 2.
[00:12:08] So we're just gonna swap 2 and 9. So now we have 9 here and 2 has gone there, Okay? Then we're gonna go to index 1 value 3. So we're going here. The left child is index 3 value 10, so this one. And the right child is index 4 value 7, this one.
[00:12:38] So these two are the child of this one, Okay? We call heapify on that, obviously 10 is gonna be the largest number there. So you're just gonna swap those two. So we swap 3 and 10. But now we have to call heapify on the one that we swapped with, right?
[00:13:06] Because now we don't necessarily have that guarantee that we had before. So we moved 3 to here, and now we have to call heapify on this one, and make sure that it's bigger than it's children. Its left child is 6, and it's right child is 4. 6 is the biggest one of those.
[00:13:24] So we're gonna swap 3 and 6 here. So we end up with 3 being all the way down there now, and 6 being over here. Okay, last one we have the root. So its left child is 10, and its right child is 9. 10 is the biggest. So we're gonna move 10 and 5.
[00:13:48] Then we're gonna call heapify again on 5, to make sure that it is in the correct place. Its left child is 6, and it's right child is 7. 7 is the biggest there. So we'll move 7 and 5. So now 7 is here, 5 is down here. Okay, we currently call heapify on that.
[00:14:11] Its left child is 1, and it doesn't have a right child cuz that's out of bounds. So that's fine. That's okay. So now that we've done this initial heapifizition, I guess is how we would say it, right? Now this is considered a max heap, right? This is now officially a max heap, we've built one.
[00:14:38] So all we're gonna do now, and this is actually kind of fun in my opinion. Maybe I have a warped sense of what's fun. That's probably true. I probably admit that too much in this course as well. Anyway, you're stuck listening to me, anyway. What we're gonna do is, now we're gonna actually go through and do the sorting process.
[00:14:59] So the one thing that we know, for sure, with this max heap that we have here, is 10 is definitely the biggest number. Anything else we can't super guarantee, but we can guarantee 10's the biggest number. So, all we're gonna do is we're gonna swap 10 and 1.
[00:15:15] So 10 and the last number in your heap, okay? So now we've moved 10 here to the last number in the heap. And you and I both know now, that's the correct place for it, right? Because it's the last index in our array. And it is the biggest number, okay?
[00:15:33] Well, now 1's out of place, right? That's not correct. So what are we gonna do? We're just gonna call heapify on it, right? Cuz we know everything underneath here was a correct heap until we swapped this. But as long as we call heapify on this, right? So that's actually moved 1.
[00:15:51] So you had 7 and 9 here when you moved the 9 up here, right? And then we have to call the heapify again on that one. And that's going to move it with the 8 here. Okay, and then we would call heapify on this again, but it has no children, right?
[00:16:12] So now, again this 9 for sure is the largest value in our heap, right? Definitely, so all we're gonna do again is swap 9 and 4. So cuz now we know for sure 9 and 10 are definitely the two biggest numbers in our array. Okay, now we have 4 up here and we just call heapify again, on this one.
[00:16:35] So on and so forth, just swapping it to the end and calling heapify to get it back to being a max heap. Until eventually, you end up with a sorted list. Okay, [LAUGH] that is how you create a max heap. And how to create a max heap and also how to do heap sort.
[00:17:08] Question then ends up being, what is the big O of heap sort? It has constant space complexity, right? Cuz we're not creating anything additionally. I wanna say it's n squared. I'm gonna show you how many cheated this. There is a website called Big-O cheat sheet, and it's exactly that.
[00:17:34] So heapsort is n log(n). Yeah, I guess that makes sense because not every number has been compared to every other number. And you will get an economy of scale as he started going larger and larger, yep. So heapsort here, n log(n) with spatial time complexity of, sorry, space complexity of constant.
>> So what would be like a good use case for heapsort besides priority queue? Only priority queue, right?
>> Yeah, I mean the question is, when is a good use case for heapsort? It's got a really good average case, so like merge sort. No matter if it's big or small, like whenever you're creating a max heap, whether it's a sorted list or a non sorted list or a random list, it's very predictable in the way it acts.
[00:18:26] Couple that with the fact that it doesn't have any additional memory that it has to incur. And I could see it being useful in a situation where memory is a concern but you still need an n log(n) sort. And you need to be able to handle sorted lists, reverse sorted lists, randomly sorted lists.
[00:18:46] And you didn't necessarily have any guarantees about how your data is gonna look when it came in. I would say in that very specific case, you could have a good use case for perhaps using heapsort, with the only caveat there is perhaps quicksort would still be better. [LAUGH] Yeah, so very likely merge sort and quicksort more or less are almost always the sorts that you wanna use cuz they just end up being better sorts in general.
[00:19:19] But I could see heapsort being useful in this case. I would say just in general, the most useful thing you're gonna take away from this is what is a heap and, how do I build a heap?
>> If the arrays are resorted, it would just return the array, correct?
>> So the question is, what happens if the array is already sorted? It wouldn't just return the array. You would build a max heap, like it would be like merge sort where it would just go through all the steps, right? The heap sort is not gonna care that it's already sorted, right?
[00:19:47] It's just gonna go ahead and build this max heap. And then, do its heap sorting. Yeah, in those particular cases where you suspect the array might already be sorted. Insertion sort is typically the one that you wanna choose.