Transcript from the "Handling Collisions" Lesson

[00:00:00]

>> Bianca Gandolfo: So this is all fine and dandy until we have a collision. A collision is what happens when we run our hash function and we pass hello into it, and then we also get a 3, right? There's no perfect way of implementing a hashing function for this to never, ever, ever happen within realistic constraints.

[00:00:24] So we can expect that this will happen at least once if we have a certain number of inputs, right? It's never gonna happen if you just do one input, right? But with a certain number of inputs, you're always gonna have a collision. So we need to account for that.

[00:00:46]

>> Bianca Gandolfo: And when we were talking about it earlier, we were like, you know what? Maybe,

>> Bianca Gandolfo: We'll just put two things into our storage, right, like that.

>> off screen: Mm-hm.

>> Bianca Gandolfo: Seems reasonable, right?

>> off screen: How do you get out the right one?

>> Bianca Gandolfo: Yeah, and that's the next question.

[00:01:10] How do we get out the right one? What do you think?

>> Bianca Gandolfo: Does anyone have an idea?

>> off screen: Do you have to store the key as well?

>> Bianca Gandolfo: Yeah, so to get around that, we'd have to create a tuple.

>> off screen: That makes sense.

>> Bianca Gandolfo: So.

>> off screen: It's still fast, cuz you can.

[00:01:39]

>> Bianca Gandolfo: Thingamajig, comma, true. Let's put this on a new line so we can see it. And then, hola.

>> Bianca Gandolfo: Hello comma hola. And this works, and it will make it a little bit slower, right? Because then every time just say, in the worst case every time we land here, we have a terrible hashing function and every time it gives us 3.

[00:02:20] And so for n, right, we have it's all in the index 3 and then we have to just loop through either our linked list or whatever it is that we have in here, to find the matching one. So, in the worst case, it's linear. However, with modern hashing functions and things like that, we don't have to worry about this happening in real life.

[00:02:53] Does that make sense? So realistically, if you're using the right hashing function, this shouldn't happen, or it's so unlikely that it's not even worth accounting for. And then, the other thing that we do is we resize our storage once we reach a certain capacity. And you'll do that in the exercises.

[00:03:15] Once we reach 25%, 50%, 75%, you'll resize the array and redistribute it. And also, that sounds like an expensive operation. But if you design it well enough that it happens so infrequently, that it kinda averages out so it doesn't affect our constant time. Does that make sense? So there's some mathy stuff behind there, and we can just kind of trust that if we're doing it the right way, that it will average itself out and not be an issue.

[00:03:47]

>> off screen: So when you resize your array, then the hashing function will return a different value.

>> Bianca Gandolfo: Yeah, yeah, yeah.

>> off screen: And that's why it works.

>> Bianca Gandolfo: Mm-hm, exactly. So that's why I was saying our hashing function-

>> off screen: Interesting.

>> Bianca Gandolfo: Was gonna take some extra data, which is, what is the range?

[00:04:04] So for this one it's gonna be 0-4, but maybe we wanna double it. Usually, when you resize your hashing table, you double it every time, and then you redistribute it. And so you'll rehash everything in your hash table to a new value, maybe between 0 and 8 now, or something like that.

[00:04:28]

>> Bianca Gandolfo: That make sense? Cool, that's the only difficult part about a hash table. Everything else is straightforward because it's just an object.

>> Bianca Gandolfo: Cool?

>> Bianca Gandolfo: All right, so when we go into coding this out we need to think about collisions, right? We need to account for them in our code.