Transcript from the "Maps" Lesson
>> Fantastic, where does this move on, all right? I'm ready, are you ready? All right, so, there's a common saying in interviews, just use a HashMap, right? I think Joma Tech even did it. Again, this the second time I represent him. He always does these shorts, and one of them was HashMap, right?
[00:00:16] Prepares for an interview for a month, right, when the person asks a question his answer is HashMaps, right? It's just always the answer. So what are maps? It's too easy for us to think of either squirrely braces or new map and just assume we know what they are, right?
[00:00:33] That's what we do. The reality is, we probably know how to use them, we probably don't know what they actually are. And more so, this is not even a map, right? That's not a map, it's an object that has different properties associated with it. So, just things to think about, everything's a little bit different.
[00:00:51] So we are gonna go through, and we are actually going to explain what a map actually is. So here's some terms first. There's the load factor. That's the amount of data points versus the amount of storage that we have, yeah? Yeah, so if we had seven items in our map, and we had ten available units of storage, we'd have a load factor of 0.7.
[00:01:13] A key, a key is the thing that is hashable, right? We use the key to create something in which we can access our value. A value of course is the thing that's associated with the key, and a collision is when two keys map to the same cell. Okay, there we go, we've made some progress.
[00:01:33] All right, so let's whiteboard all these things. This is where things get kind of fun cuz we're kind of finally done with this whole graph. Everything's a LinkedList, everything's a graph kind of section. We're kind of just into something new. Actually, joke's on you, it's an ArrayList. All right, so again, we're back down to an ArrayList.
[00:01:51] Somehow these foundational structures that we discussed just keep coming back up. They're inside of a heap. We use a heap inside of Dijkstra's shortest path. We are now on to Maps, guess what, ArrayList, right? It always keeps on being the same thing, always down. So effectively how a Map works is that, we have a key and it needs to map to value.
[00:02:10] And what the key is mapping to a value has to be, is a consistent hash, consistent hash. All right, so what this means is that when I give case of 0, right, some key, that is uniquely defined, it will always gimme the same answer back out. If it doesn't gimme the same answer back out, that's an inconsistent hash.
[00:02:51] We can have a number and use the number as a way to kind of create a hash for the string. But what we don't have is an object. If you just hand me an object, I can't create some sort of unique identifier for that object in the sense that if you create an object that's like foo:5, right, my goodness, great five.
[00:03:41] I'm sure there's someone that could probably prove me wrong, but I'm quite positive you just can't uniquely identify objects in this sort of sense. You can only uniquely identify them based on properties and their associated values. So if you have two with the identical, good luck. Which means that our key lookup really breaks down in this case, right?
[00:04:00] Because you could actually have two separate objects that are meant to be two separate objects, but mapped to the same value. So for our case if we do implement one, we'll simply just use strings or numbers as our entrance into it, which I don't think we're gonna actually implement one.
[00:04:15] So, let's just talk about how this works. So, we can do something like this, is that we need a hashing function, right, that takes in the key and needs to produce out some sort of unique number, right? And this is the important part, is that we need a number.
[00:04:30] And the reason why is, let's just say k comes in and it returns out some great number, and it can be a really big number. It can be a really small number if there's not anything specific to it. Then what we can do is, we can take it, and we can modulo it based on say, the length of our data storage, right, so say 10.
[00:04:50] And what that means is that if we have a 10-item data, okay, say we have this many items, I realized I can't count as fast as I can, right? That means our key, whatever we give it, can uniquely map to one of the bins inside of our array structure.
[00:05:05] And we can just say, hey, we'll store the key plus the value inside of that structure right there. Then we map another one which say, goes over here, and then we put the key and the value. But every now and then, you're gonna map two very different keys.
[00:05:30] And since we say only have ten slots, the chance of them colliding is uniformal. If they uniformly go to any number, it would be a chance of one over ten, right? It wouldn't be that great. So if you had a perfectly uniform hash generator, you get a 10% chance of collision.
[00:05:45] Which means that what happens when we collide here? Well, we need a way to be able to store both. Now when I went to college, I know, I can't believe I'm saying this whippersnappers. What we used to do, at least how I learned it is that you'd actually do something called linear or exponential back off.
[00:06:01] Meaning we would literally go down to the next slot and put it in there. And we always store the key with the value, and we'll see why on retrieval. Which means that if you did this, you will fill up more and more slots. So as things become less and less efficient, or more and more full, you're gonna need a larger amount of storage with a smaller amount of load factor to prevent collisions, right?
[00:06:25] Cuz the moment you have five items in here, the chance of you hitting it and then having to linearly go down gets greater, and greater, and greater. So in more modern ways or what I've seen is that, instead you use a list underneath the hood to be able to add in items.
[00:06:41] So you have an ArrayList, that has an ArrayList, that can add an items and then we just walk the ArrayList. Sounds good. We could technically also use a LinkedList underneath the hood because you do linearly walk it every single time. And so long as the keys sufficiently map to a bunch of them, what we're gonna see is little sub amounts with only just one item in it.
[00:07:01] Maybe sometimes two, right? Long as you have a good hashing function, you're gonna have very few. So how does retrieval work? Well, we go through our magical hashing function, right? We pass in a key, we go through a hashing function and produces an index, we then modulo that with the size of our storage area, and we now have an item into it.
[00:07:19] Well, the problem is, is there could be multiple items inside of any one storage unit, correct? So that's why we store the key plus the value. We can use our key and say hey key, are you equal to that key? You're the same key? Well guess what, that's your associated value then, get on out, right?
[00:07:38] We can actually do that and get it on out, and we can use that specific value. It's a little bit tricky, but then as you think about it becomes really simple. Maps tend to be pretty simple creations, and that's really all you're doing. Deletion, exact same thing, you go to that spot within your ArrayList, you find the associated key with the value, remove it from the entry, and there you go.
[00:08:02] You now have it removed, of course you have to decrement length. But there is always one problem. So, I believe the ideal load factor is 0.7 at this point. That's kind of what I believe is considered the ideal one long as you do this ArrayLists or this LinkedList style storing.
[00:08:18] If you do that, once you exceed that load factor, we need a way to be able to redo this, right? So something we can do is that, we can take our current data storage, what it is, and we can iterate through all of the keys that we can find.
[00:08:35] And then we can rehash them and restore them into a larger storage arena, thus cutting down our load factor, say by half if we double the storage, and then we have a bunch more in which we can store. And this is the effective way in which maps operate on systems.
[00:08:53] There we go, they're not actually all that complicated, they're actually surprisingly uncomplicated. It's actually surprisingly boring, kind of, it doesn't feel that cool. But now that you know what ArrayLists are and LinkedLists, this is very, very easy. If you didn't know those two terms, this might be a bit harder to kind of understand how this works.
[00:09:09] So an ArrayList allows you to kind of just simply increase the length. But usually you just use raw arrays underneath the hood, and then double the size when you need to or increase the size by 50%, whatever you think is good. Always a trade off between how much do you want to store versus how efficient you want to be with memory.
[00:09:26] So it's always a game. That's why often you'll see maps come with some sort of capacity. Because they want you to give them the hint of how big we should be so that way the storage and retrieval is as efficient as possible. Cuz the more amount of collisions, the less o of one it is.
[00:09:42] And so why is it o of one? Well, we assume that, again, the key has some sort of length that is not infinity, right? You're not storing large text files as the key. Instead, you're storing some small string, right? You could hash a text file, and then boom, you have this key that you use all the time to get that value.
[00:10:00] That's kind of how the internet works. You do those kind of things, it has some sort of hash associated with it. You don't store gigantic pieces of data, you still have small things that point to other things. And then usually the thing you point to is actually the big thing, so we assume hashing takes over one.
[00:10:14] I'm sure there's a great proof out there that says, this is why it is. And so therefore, we have this. And we only have to do that plus accessing the data, and then going through a relatively small amount. So long as our collisions don't gather up into one single point, you will always have effectively one item in there, two items in there, also again considered constant.
[00:10:34] And so that's why it has a constant lookup time. There was a pretty funny bug that somebody found that some high performance piece of code, no one could figure out how to make it fast in this one condition. And someone went through and looked at the buckets of the map.
[00:10:48] And effectively all the keys somehow were just mapping to this one bucket cuz they had a bad hashing algorithm. And that caused this one thing to just operate at linear time when you assumed it was operating at constant time. And it was just super slow when it should have been super fast.
[00:11:17] The people who made the V8 are clearly so talented they created a engine that can run a slow language swiftly. So they probably did it right, I assume they did it right. And yeah, that's my argument for why this happens. So yeah, that's really all a map is.