This course has been updated! We now recommend you take the Complete Intro to Computer Science course.
Transcript from the "Hash Table" Lesson
>> Brian Holt: Okay, so Hash Tables, are actually really, really powerful and use extensively in programming languages, and just about everywhere. There are key value stores, right? So we implement maps or, and or, sets, but the interesting thing about hash tables is, we use the key itself, as to be the index of where to find it in memory.
[00:00:28] And why that's really cool, is that, as long as we know the key, we know exactly where it is in memory. Yeah.
>> Speaker 2: Just to follow up on that last question from Drew, would it be faster to recreate the trees?
>> Brian Holt: My guess is yes, but that's an ignorant guess, because I don't know how else to do it personally.
[00:00:48] I don't know. Yeah I don't really know. Okay. So, the way that we, would hash tables that we find where to put it in memory, is we run it through a hashing algorithm, right? So basically we say, we run whatever our key is, usually a string or a number of some sort.
[00:01:06] We run it through a hashing algorithm, and the hashing algorithm spits out somewhere in memory to look, right. So the key is where it is, right? And such very powerful because, if we know what the key is, we know where it is in memory, and that makes it very fast for lookups.
[00:01:41] Or delete, you look up in memory and delete that memory space and it's deleted. So it's super fast. That's why people really like them. But again, remember hash tables have no concept of order, right? Because you're just telling it where it is a memory you're not actually saying it.
[00:01:55] What order there is, so you can't do order with hash tables. So, there's a couple things. You have to have a good hashing algorithm.
>> Brian Holt: We'll talk about. You need. First of all, you need a large memory footprint, right? Because if you have a hashing algorithm, you need a sufficiently large amount of spaces that you can have.
[00:02:18] A unique place to put every single object, and they need to not collide, right? So, if I run object a, through my hashing algorithm, and object b, and there are different objects, and they should have different spaces in memory and the hashing algorithm makes them the same spot memory.
[00:02:34] You're gonna accidentally overwrite it, and that's a problem, right? So collisions are actually really, really important, that your hashing algorithm is good enough to spread them out. So let's look at a hashing algorithm really quick. Like this one right here, basically if we did, like, what's called a Caesarian cipher, right?
[00:02:55] Where we just say a is 1, 2 is b, 3 is c, right? All that stuff. And then we add them together, we take that as the index of wherever it's gonna go. So 'az' would be, 27, but 'by' would also be, 27, and 'za' would be 27, right?
[00:03:11] So, as you can see that would be a problem, because a lot of, potentially similar, like, these similar strings would end up with similar indexes, right? So you need to have a sufficiently complicated algorithm, to be able to spread them out well. And so like, I wrote a slightly above crappy hashing algorithm, and it's not a good algorithm, but it's better than the one I just showed you right there.
[00:03:38] Hash, right here. So basically what I did here is, I did the Caesar cipher, so I did the input charCodeAt, right, which is gonna give you back a number for whatever character we're looking at. And then I typed it by it's index, so if it's, a at value one is different than a at index two, is different than a at index three, right?
[00:04:00] So they have these multiplication right? And then, what we're doing to make sure that it always lands within the space of our hash table, is we're using the modulus modulo operator. Which, if you're familiar with long division, so if I do 10 divided by 3, right? It's gonna be, 3 remainder 1, right?
[00:04:19] Remember that from elementary school? What modulus basically says, I don't actually care what the first number is, I don't care about the 3, I just care about the remainder 1. Which is really interesting, because if I have a table of length 3 and I do modulus, you can guarantee that they can be 0, 1 or 2, right?
[00:04:34] Because as soon as it comes 3, then divides whole, it goes back to 0, so that makes it circular. Right, and that's exactly what we want. We wanna make sure that our index always fall in the range of the hash table. That make sense? Okay.
>> Brian Holt: So, hashing algorithms have to have a bunch of things.
[00:04:53] They need to be, what we call idempotent or pure. There's a bunch of, so basically, you just, if I call double here, 10 billion times, with two, on the 10 billion for the first time what I'm gonna get? Still gonna be four, if I call it with two.
[00:05:14] That's because this function is idempotent, it doesn't modify any state around it. It just does one thing and it's done. And modify it, yeah, doesn't modify any, has no side effects is what we would call that. However, this one, if I call this one ten billion times with two, what I'm gonna get on the ten billionth and first time?
[00:05:32] Some insanely large number, right? Because this multiplier keeps changing with every single time I call it. That's called the side effect. And we really are hashing algorithm cannot have that, because if I give you object a, now and then I give it to you, you know much later, I need to get the same answer out of you because I need to look at the same space in memory.
[00:05:53] So that's important. And use to spread them out while, we already talked about that a little bit. Yeah, question.
>> Speaker 2: Yes, Dave is asking if there's a way to maintain constant lookup time with a hash table and maintain order?
>> Brian Holt: I don't know to be totally honest. I'm sure there's abstractions, you can put on top of hash tables that do that.
[00:06:21] But a hash table in and of itself, like a pure implemented hash table does not have order.
>> Brian Holt: Good question though. Okay. you just have a good distribution of value, we talked about that. And the other thing is hash tables are built to be really, really performance. So if your hashing algorithm is cryptographically secure, which is means you're using extremely large prime numbers to combinate to make these crazy things, you're totally defeating the purpose, right?
[00:06:54] Because at that point, your hashing algorithm is no longer performant. And thus, your lookups are gonna be really gross, because you have to go through this crazy algorithm to get the correct index. So making sure you have something really fast is really important. In other words, you're not gonna use something like, two fifty six or something like that, that's not a good one to use for this, but you might use MD5, right?
[00:07:18] MD5 is actually pretty fast. So, okay, so looking here at our hash table, we just create this new giant array, right? And to add something we just say, to our table hash whatever our input is, and put that wherever it needs to go, right? And then check, we just need to check is like, hey do you have something in memory there?
[00:07:47] So in this case we're implementing a set using a hash table. You just ask, hey do you have something there? Okay. You do? Then yes, it's there. Otherwise, no, I don't. That's what the double exclamation point means. It's just converting this to be true or false. That's all.
>> Brian Holt: And then hash, right? I just did that hashing algorithm. Like I said, it's slightly above crappy. Definitely below mediocre. And yeah, just some test to make sure that, that's the case. So, good stuff to know. Any questions about hash tables? Like if you're doing something that needs, like, sets or maps, hash tables are definitely a good way to go.
[00:08:32] Providing you have a good hashing algorithm. That's kind of the key to a hash table is your hashing has to be really great.