Transcript from the "Indexes in MongoDB" Lesson
>> Let's talk about Indexes. So again, I talk a lot about queries being expensive. Another one is you want queries to be fast right? So let's say you're doing like a pet adoption website and you people ask for, I wanna see all the dogs right? If you have, a million pets in your pet adoption database, that can be quite slow, because it's gonna go read all of the objects are all the documents in your database, and it's going to filter for all the ones that have type dog on it.
[00:00:32] It'd be better, if we had that indexed, so that the database can basically take a shortcut instead of having to look at every single document in the database can say. Just build me like a tree that allows me to go search for all of just the dogs in my database.
[00:00:52] So this is called an Index. And this actually exists in all of the databases we talked about today, except rattus. So we'll be talking a lot about indexes today. So, let's say say db.pets.find(( name: "Fido" okay? So I'm gonna just search for all the pets named Fido in my database.
[00:01:22] It's fine, it works ok, I only have ten thousand items in my database, which from I going to b it's not that many used to handling millions and perhaps even billions of records. But let's do a dot, explain on this. And we're gonna ask for the execution stats.
[00:01:41] So this is actually gonna pass back to from Mongo and say, here's how I'm going to find that. So, it does what's called a execution or a query planner, you'll hear those terms thrown around quite a bit. So, the first thing it does is say, all right, here's the query that I'm looking for.
[00:02:00] I'm looking for name, where it's equal to Fido. And then it has a winning plan. Sometimes the database will try multiple different strategies to try and figure out okay, I can search this way, or I can search this way, or I can consider this way. In this case, there's only one way to find it.
[00:02:16] It's gonna do what's called a call scan. And this is your worst nightmare right here call scan is like, worst case scenario. I'm going to look at literally every item in this or every document in this collection to find all of your pet's name to find out, this is not what you wanna see.
[00:02:33] If this is a query that you run a lot, you need to index this, that you don't see Colescott if you run this like once a day, that's fine, right? Once a day, you can take a performance hit. It doesn't really matter, but if you're running this 10 times a second, you, need to index this yesterday.
[00:02:50] Okay, so that this is the part that I want to be looking at here is the winning plan. And you can see here, the call scan is exactly what you're trying to avoid. So here are the execution stats. This is actually gonna go through and tell you what it did, how long it took.
[00:03:06] And you can see here, this is, again, we have 9,644 items in the collection and it looked at all 9,644 items in the collection. This is not what you wanna see. So, how can we fix that? Well, we're gonna do that by creating an index. So the first thing we're gonna do is, we're gonna say db.pets.create index name colon 1.
[00:03:38] And what we're doing is we're creating a simple index on name. And that's it. Now, something else I probably need to throw out there. Don't just log on to your server this is actually a fairly heavy procedure to do. So don't just do this in production. Like this can cause downtime.
[00:04:03] This can be very expensive. Again, we're doing it for 9,600 and some odd items in a collection. Not that big a deal but 4 million, this could take down your entire app. All right, so now, if we run that same query again find Fido with execution stats. Notice up here, we're gonna see Stage : Fetch.
[00:04:25] This is what we wanna see, this is much faster, this is much better. This is basically said like I found an index that uses or that I can use to find whatever you're looking for. It goes way faster. So you can see here, it gives you a lot more information of how this index works.
[00:04:44] But keys examined trying to find the right one to point you out. So Doc's examined 964. That's the one that you don't wanna see. So total Doc's examined here 1070, right? That's significantly better, but I also think if I remember correctly dot count. Sorry, not dot count. If I say count here, that's actually the correct answer.
[00:05:17] That's how many dogs are actually, or sorry. That's how many pets are actually named Fido. So 10 70 is the bare minimum. So, yeah, it went significantly faster looked at significantly less documents. And if you've taken my data science or not my data science but my algorithms class, this actually built a tree in memory, so that whenever you're doing these queries, it's actually going in memory.
[00:05:44] So it's not writing to disk. And it's looking it's going through a tree rather than looking at every single item in the in the database. So this should give us some performance stats as well. Again, I'm on a pretty beefy MacBook Pro. How long did this take? Just took two milliseconds.
[00:06:08] LSC up here. How long did this one take? Four milliseconds so not that huge a difference again, considering not that many items and in this collection and also it's on my MacBook which really isn't doing really. Anything else right now, but even still 50% query increase not bad right?
[00:06:29] We're not upset by that. So at anytime is all you can say db.pets.get indexes and you can see here we have one on name you will always have one an ID is always indexed just by default. And yeah, that's how you do indexes. So you can do compound indexes, so let's say you're frequently querying by both maybe age and pet at the same time.
[00:07:01] You can actually make a compound index that uses both of those things. You kinda just have to understand the shape of your data that it needs to be in. And how you're gonna query it. Normally I, kind of wait a little bit to figure out what queries I wanna do, or sorry, what indexes I wanna do and kind of see how I'm using the database and how the code is written.
[00:07:22] And then I'll go make indexes to kind of index those kind of hot code paths. Because if you have an index that's not doing anything, then it's just taking up space and memory, yeah, don't have useless indexes as well. That's another good piece of advice, because it's just waste space and performance.
[00:07:41] Let's talk about something else. Let's talk about unique indexes. So right now, in our database, you can have in fact, here's a good example. Cause I think I did this earlier. Do these index find, index 10,000? No, okay. So I actually only have one item of index one 10,000, but this index should be unique, right?
[00:08:14] No one should have the same index in this particular collection. So we want that to be unique. Or maybe you can imagine, like maybe you're doing a users collection and you want the username to be unique, right? So you put a unique constraint on that particular field. You can do that with unique indexes.
[00:08:34] So what we're gonna do is we're gonna say db.pets.create index. And I'm going to index, that's funny index referring to this particular field in our collection, right? It's not nothing special, but the word index. And then all you have to do is you add another thing here and say, make sure it's unique.
[00:09:02] Okay, so I went and created a new index, it's on index. So now, if I say db.let's say 1, that query will go much faster, cuz it's not gonna scan all of them. It actually goes directly to that particular thing, but the other interesting thing is I say, db.pets.insert(1) I try to insert a index 10,000 like that.
[00:09:26] It's gonna give me an error it's gonna say right there duplicate error key, right? So it's actually not going to let me insert duplicate keys here. That's what that unique constraint allows me to do. Whereas before it would have just happily accepted that right another useful thing about this is now index is quick is indexed right the the index is indexed.
[00:09:52] That's just kind of a necessary byproduct of doing a unique constraint is it does have to index that.