This course has been updated! We now recommend you take the Complete Intro to Computer Science course.
Transcript from the "Heap Sort" Lesson
>> Brian Holt: We're gonna do quickly, two kinds of sorts. These are really fun, I think these are really cool. They seem a little magical. Just that this is never how I would ever think about a problem but now that I understand how they work, I think they're really cool.
[00:00:21] So part one of this course covers a ton of sorting algorithms, I think you should definitely check it out. I just added two more kinda the two biggest ones I didn't talk about.. So let's go ahead and talk about these, let's talk about Heaps first. So Heaps are an interesting thing.
[00:00:39] So I just wanna disambiguate there, they are Heaps and then you use Heaps to do Heap Sort, okay? So those are kind of two separate concepts but they are tied together. So Heap is a data structure, okay? And specifically, it's actually a kind of a tree, but what's interesting about it is it's a tree that you can represent as an array.
[00:01:02] So basically we're doing it in such a way that we're gonna be able to represent a whole tree just as a flat array. I'll explain to you in just a second how that works. Often you'll hear Heaps referred to as priority queues and you'll hear priority queues referred to as Heaps.
[00:01:21] They use those terms interchangeably, they're not exactly the same thing. However, normally priority queues are implemented as Heaps. And the reason being is that they just lend themselves very well to how priority queues are supposed to work. And again, I will get to that in just a second.
[00:01:42] So, a binary Heap is an array that you kind of squish a tree into. This is similar to a binary search tree, everything I'm about to tell you is similar but they are still quite different. So I just wanted to call out why they're dissimilar and why they're the same.
[00:02:05] So binary Heap is always an array. A BST is made up of node objects typically that doesn't necessarily have to be the truth, but usually. With BSTs, binary search trees, there's a very strict order that everything in the left subtree is smaller than the current node. And everything in the right subtree is always bigger.
[00:02:22] The only guarantee a Heap gives you is the current node is bigger than both of its children. It's the only guarantee. So that is to say that there could be things in the, just because something is the left child of something doesn't mean everything on the right tree is bigger than that.
[00:02:40] The only guarantee is that you have the root node would be the largest number in the array, right? Because the root node is bigger than everything in the left tree, and bigger than everything in the right tree, and that's always true. So that transitive property means that everything in those children would be smaller than also then the root node.
[00:03:00] That's the only guarantee thata you have.
>> Brian Holt: If you're doing in order traversal of the BST, you'll get a sort of list. If you do an in order traversal of a binary Heap, you will get nonsense, so that doesn't help. And a binary Heap is what we call a complete binary tree.
[00:03:19] That means that everything is filled out and shifted as far to the left as possible. So do I have I do, so if we're looking here at this right notice everything has shifted as far left as possible. That's not always true of a BST right cuz it can be sparse I could have a child here I could have a child here.
[00:03:38] That doesn't work with binary Heaps it has to be as compact as possible. And the reason being that, because we're gonna represent it as an array it has to be as compact as possible. They come in two falvors, flavors, I can spell. We're just concerned with max heaps today, but as you may imagine a min heap would just be basically the inverse of everything that we're talking about.
[00:04:05] But today we're gonna be only worrying about max Heaps. So this is why priority queue is typically a binary Heap, is that with a priority queue, the only thing that you actually really care about is the next thing to come out of the queue, right? And a priority queue is basically a queue where it's first in, first out.
[00:04:25] But it's actually a little bit more complicated than that because you're also worried about some priority, right? So if something is really high priority, you want it to go to the front of the queue, right? That's why a Heap works really well for this because it's always going to give you the most important thing.
[00:04:40] And then once you remove it then you're gonna pull out the next thing that's the most important and you're just gonna keep doing that. And that's actually how a HeapSort works, right? Because if we're able to always find the largest number in some array, can't we just keep dequeuing things and then adding them to the end until eventually the whole thing is sorted?
[00:05:01] So that's basically what a Heap Sort is, is you construct this priority queue and then you keep dequeuing things until eventually you have a sorted array. Make sense? Probably not, but that's okay, we're getting there. So let's talk about the magical property that we're representing a tree in an array because that's probably kinda strange, right?
[00:05:22] So if I have this as my Heap, this is a min Heap. But it works the same way, it doesn't matter that it's a min or maxi. And in this case the one is the smallest number in this particular array, right? So you can see here that it's represented at the element, right?
[00:05:41] And then we just kind of insert them in order as if it was like a breadth-first traversal. So then five and three then seven then nine then eight, right? So just go one, two, three, four, five, six, right? And just keep doing that for the entire thing. If you do it like that, then that means if I'm on index one then at 2n+1 excuse me, 2 plus 1 is 3.
[00:06:06] That's going to be the left child, right? So if I'm at 5, then at 3 is going to be the left child, right? And if I'm at 2, right? So I'm gonna be 5, then the, yeah.
>> Brian Holt: If I'm at index 2, which is 3, then if I go to 2n + 1 which is index five it's gonna be 8, right?
[00:06:36] So that math formula is how you find the left and right children, that always holds true, right? If I'm on index 100 then it's gonna be 2n plus 1. So it's gonna be 201 is going to be the left child and 202 is going to be the right child.
[00:06:50] Do we see how that kinda relationship works in terms of how you represent it as a in memory? That's why it's important that it's compact and complete binary tree is so you can hold that formula true, okay?
>> Brian Holt: Okay so,
>> Brian Holt: What you need to do is you need to construct a Heap cuz I'm gonna give you a list of totally unsorted numbers.
[00:07:19] And the guarantees that I want to have about this that the first item in the array is going to be the largest number in the array is not yet true because that array could be anything. So what you need to do is you need to Heapify this array, did not make up that term, that's what it's called, Heapify.
[00:07:36] Heapify. So that's what we're gonna do. We're gonna make a max heap out of the array that you gave me.
>> Brian Holt: So,
>> Brian Holt: Let's talk about Heapify first.
>> Brian Holt: So I'm gonna start at index 5.
>> Brian Holt: Which is going to be value 9, right? So what you're gonna do is you're gonna start at the midpoint of the array.
[00:08:10] And you're gonna work backwards one by one until you get to the front of the array, which will make sense here momentarily, just by the properties of how heapify works. So I started at index 5 and it's value 9. So then you're gonna try and basically move it so that the Heapify the Heap Sort properties of it are hold true.
[00:08:35] So it's left child of index 5 is 11. And it's right child is index 12. Both of those are out of balance for this particular array, right? So we don't do anything. So nothing to do, next iteration. So i-- then we go to index 4, which is 1, right?
[00:08:58] Then the left child Is index 9 value 7, and the right child is index 10, out of bounds. Because the largest index in here is index 9. So 7 is larger than 1, so we swap the left child and the parent. So, 7 is larger than 1, so we're going to swap those.
[00:09:19] You can see here, we have 7, 1. Then we call heapify on index 9 and it does nothing. That's kind of the pattern we're going to do. We're going to keep calling this Heapify function that's going to check to see. I have the parent, and I have the left child, and I have the right child.
[00:09:37] And the greatest of those will be swapped with the parent. So if I have a left child that's 5, and a right child that's 6 and a parent that's 3. I'm gonna grab 6 and I'm gonna swap that with the 3. On the next generation if I have index 10, or sorry, value 10 and that's the biggest one, then I just don't swap anything, right?
[00:09:58] So you just want to make sure that the largest number is always the parent. I go through exhaustively here and do that but the basic gist is just what I said here. Let's just go here. So, say this is our index at this point. So, I'm gonna call heapify and value 3, which is 0, 1, 2, 3 which will be 10.
[00:10:25] So the right child is 4 right? And the left child is, where am I getting mixed up here? Heapify index does nothing, value 3. Value, so that's gonna be 1, 2, 3.
>> Brian Holt: I must have right, okay, value 3 index one, right? Index one is 3, right? So you can see here it's gonna swap 3 and 10 because 3 is smaller than 10.
[00:11:03] And 10 is larger than 7. So that is why you see here 10 swapped with 3 and then because we did that we don't have any guarantees about what's below that in the Heap. So we have to call Heapify again on 3. That make sense? Eventually you're going to end up with an array that looks like this.
[00:11:25] So this is what a Heap looks like. So if I go to 10, 10 is larger than both 7 and 9. And let's just take a look at 7. If I go 7 two to the right gonna be at 6 and 5. It's definitely larger than both of those.
[00:11:38] And then if I go to 6 that's index 3. So I have to look at 7 and 8, right? So 4, 5, so 3 and 4, right, it's larger than 3 and 4 as well, right? So that's what a Heap looks like. So the only guarantee that I have in here is that 10 is definitely the largest number in here, right?
[00:12:00] Now what's cool, when you de-queue something from a priority queue, all you have to do is swap in this number. So what I'm gonna do is I'm gonna swap in the one to the 10 like I do here. 10 is definitely the largest number on the array, right?
[00:12:14] Because that's what the Heap told us. And now I just call heapify on this, and It's going to say, swap with 9, and then it's just going to move down until 1 ends up down here. And now this is what my priority queue looks like 9 is the largest number.
[00:12:30] We have that guarantee, so we're going to swap that. And you're gonna end up with 9 here, and you're swapping that with 4, just call Heapify on 4, so on and so forth. Until, actually let me show you the completed one just so you see what it looks like.
>> Brian Holt: Where is my output for this? There it is, okay? So I called it with this huge array. And you can see here over time all of this right here is just constructing the heap, right? Until eventually you get 50 being the largest here. And then I just de-queuing things until eventually you can see this this just starts growing until eventually the whole list is sorted
>> Brian Holt: So the first half of heap sort is creating a max heap, and the second part is just de-queuing things until eventually you have a sorted list. So that is the general premise of what Heap Sort is.
>> Brian Holt: Any questions about in general what the process of what Heap Sort is?
>> Speaker 2: When comparing AVL trees with binary Heap a sort of apples to oranges comparison.
>> Brian Holt: Yeah, that's a good question. Is the comparison between AVL trees and Heaps apples to oranges? They're used for very different purposes. They share a commonality that they're both representation of trees, in some capacity.
>> Brian Holt: You construct AVL trees to be able to search something very quickly. So I have database index, and I wanna be able to find something in the database very, very quickly so that I would construct a AVL tree for that, or some sort of binary source tree variant.
[00:14:28] And I need to construct a priority queue, so therefore I construct a Heap. They are trees, but they're different flavors of trees.