Introduction to MongoDB Compound Indexes
Transcript from the "Compound Indexes" Lesson
>> Scott Moss: We're gonna talk about creating different types of indexes, and more specific, like I said, compound indexes. So a compound index is an index on more than one field. So we've been adding indexes on one field, like unique true on first name, unique true on email. Now we're gonna add indexes on a combination of fields, which is very, very, very useful.
[00:00:19] So for instance, if I have a school here as part of a district. Schools are part of districts, and let's say a district is a type of id type. So mongoose.Schema.Types.ObjectId, and ref, we don't have anything called district, but imagine we had a model called district. It's not that important for this example, but if we have one, it would reference that district, okay, but, thank you.
[00:00:51] Yeah, I thought that looked weird, I was like, there's something wrong. It's kinda weird to have two schools at the same district with the same name, it's kinda weird. But it's totally fine to have two schools in different districts with the same name, cuz they might be in different states, so that seems feasible.
[00:01:05] So we want our names to be unique, so if I just came over here and said, okay, type string for name,
>> Scott Moss: And then I put unique, that would not be good. Because now, you can never have two schools with the same name, even though they're in different districts.
[00:01:24] That's not what you want, you might have a school in New York who has the same name as a school in Atlanta. And they should basically have the same name. But if they're in the same district or county, they shouldn't, so how do you scope that? And this is where compound indexes come in.
[00:01:38] So without compound indexes, again, you'd have to do this yourself inside of middleware. So what we're gonna do is, we're gonna use a compound index for this. So what we wanna do is, we wanna make name unique, but scoped to a district. So the way we would do that is, we would use our schema object for school, so we'll say school.index.
[00:01:56] And here, index is gonna take an object because we're doing a compound index. And basically, it's just gonna take a property name that you wanna index, and then it's gonna take a value. It's gonna be a number, and I'm gonna tell you what this number does in a minute.
[00:02:09] But basically, the first thing we wanna index is the district. So district: 1, and again, I'm gonna tell you what that 1 means in a minute. And the next thing we wanna do is, we wanna index the name, and we'll also put 1. And as a second argument, what we're gonna do is we're gonna say, unique: true.
[00:02:28] Okay, so now I'm gonna talk about all this. What's happening here is, obviously, these field names correspond to the field names on the school schema, so I want to index those. The 1 value, it's basically telling Mongo what order to, it's a compound index. So every time you set an index, you're gonna be reducing the size of the query that has to happen.
[00:03:19] But once they figured that out in a V8 node environment, they're able to determine that you declare district first, and then name second. Cuz it's the way that I wrote them, if I switched it, it'll be opposite. So what they're doing is, they're gonna create a compound index where this is the first half of the index, this is the second half of the index, it's very complicated.
[00:03:39] But the reason why this is important is because I need the name to exist underneath the district. If I put the name first, that's really not gonna help me. Now I still have unique names, and then now the districts are there for some reason, that really doesn't help me.
[00:03:54] But I wanna say, names have to be unique for a district, so they're not unique by themselves globally, they're only unique underneath a district.
>> Scott Moss: So basically, the short answer is, always put 1, you can put -1, just don't do that, just put 1. I think they're working on this, so just put 1.
[00:04:12] And I don't wanna talk about cardinality and how to figure out which ones go first. But in reality, it's such a complicated topic, you really just want to index by the one that has the highest potential of change. Right, which one of these fields might have the highest difference of values?
[00:04:36] Is district gonna have more chance of different values, or is name gonna have a different chance of different values? We're seeing how name might be conflicting, I would say districts might have a chance of having more different values. There's an option there where districts might have way more cardinality than names.
[00:04:54] Okay, it's very complicated, basically, yeah, you just wanna scope the big ones first, and then the little ones underneath it. And that's how you do a compound index, this is the same thing. So for instance, if we just wanted to index just the name, remember, we put index, we would put unique: true right here.
[00:05:14] You don't have to do that, you can just do this, it's the same thing. In fact, I always write my indexes like this, I don't put them up here, cuz I like to just see them all in one place. So I'll just do schema.index and write my indexes down here.
>> Scott Moss: So this is a compound index, so this is an index with two fields. So now if I try to save a school in the same district with the same name, it'll fail, no, those gotta be unique.