Complete Intro to Databases Aggregation
Transcript from the "Aggregation" Lesson
>> Let's get into aggregation. It's actually one of my favorite parts about MongoDB. I have a lot of favorite things when it comes to databases. This is a favorite thing, as well as the other things that I was favoriting before also favorite. All right, we're gonna be talking about going back here aggregation.
[00:00:20] This is one of the first things that actually got me like really excited about MongoDB. It felt like I could do, I don't know, it felt like I was able to get interesting, unique insights out of my database that I felt like I couldn't have done previously. So here's my example of when I used this before.
[00:00:39] I used to work at a like classifieds Type website. So you can picture like Craigslist. I didn't work at Craigslist. I worked at a different company that did similar types of thing, and we had a rule that people could only post five ads. Otherwise they had to pay money to be able to post more ads, right?
[00:00:56] That's kind of the monetization strategy for this company. But we'd have people that would just go in and create multiple accounts, businesses, right? People would dishonestly circumvent the rules and go and post multiple classifieds. And so I actually used this technique to go in and find ads have the same address in them.
[00:01:17] And then I was actually able to automatically ban people that were posting multiple ads across different accounts, right? Again, I was a brand new developer I was writing PHP for at the time, right? And this felt like a superpower to me that I was able to write this like automatic banning mechanism for people that were circumventing the rules and it's through this power called aggregation.
[00:01:38] I'm going to show you the new way of doing this in MongoDB called the aggregation pipeline. This is the preferred way, the better way to do it. But there's an old way of doing it called MapReduce. And if you're a functional programmer, and you like those kind of things that sounds very appealing to it's still available, you can still do MapReduce, it's just lower bit harder to understand.
[00:01:59] So we're going to focus on the aggregation pipeline, but also know that there's the MapReduce framework out there as well for doing MapReduce against MongoDB. So, what the aggregation pipeline allows you to do basically allows you to take your data and slice it up into various different buckets.
[00:02:19] And then you can kind of do averages. You can do additions, basically allows you to derive insight from your data, allows you to combine them and unwind them in interesting ways to kind of figure out, I don't know anything that you want to know about your data. So well then I'm going to show you right now is I want to find out how many puppies are in my database.
[00:02:38] How many adult dogs are in my database and how many senior and super senior animals are in my database. This is possible through the aggregation framework. So let's give her a shot here. We're gonna say DB.pets.aggregate. And then it does this in stages, right? So you can give it multiple different stages of like, do this first, then do this, then do this.
[00:03:05] So we're going to give it one stage first of bucketing. So the bucket allows you to break your we're going to break our pet collection down into buckets of by age. So with the bucket here, we're going to say. GroupBy, Dollar sign age so the dollar sign is just saying that's the name of the field that I want you to look for.
[00:03:39] So dollar sign age is going to be grouped by their age. Okay then the boundaries. This is going to be defining the kind of boundaries of where we want to put our dogs. So the first one is going to be down to zero, right? So, anything between zero and two, I don't know if I'm correct, but I'm asserting anything between zero and two.
[00:04:05] I'm going to consider a puppy, everything from three to eight, I'm going to be considering an adult and everything above that. I'm going to be considering a senior animal, so zero. So it includes the bottom and excludes the top. So zero to three sets us everything from zero to two.
[00:04:21] We will be in the first bucket. Everything from three to eight will be in the adult bucket. And everything from 9 to 15 will be in the senior bucket. Then we have to give it some sort of default because we do have some animals that are even older than that.
[00:04:36] We're just gonna put those into like very senior, or you could also call that like 16 plus, whatever you wanna call that last bucket. Basically, you're saying anything that doesn't fit my boundaries stick in this other bucket. And then you have to define what kinda output that you wanna put in there.
[00:04:54] I just wanna count but if you want it to like I don't know you can put the animals into the buckets and then do more aggregation on them later. There's a bunch of stuff you can do with the output here. It's kind of like the projection that you want out of your animal here.
[00:05:09] So the output all I want is a count. And I just want you to keep a running total, so I'm gonna say sum: 1, so that means for every animal that you find in this bucket, increase the count by one. I got to keep my I can never keep these damn curly braces straight.
[00:05:26] So that's for bucket. That's for the one that contains bucket. And then I want to close the array and close the function call, yes, got it. First time, like a champion, all right, so ID this would correspond to the boundary, right? So from zero to two, I have 1071 pets, from three to eight I have 3218, from nine to 14 yeah, so I guess 16 wasn't plus wasn't right, it's actually technically 15 plus.
[00:06:08] Anyways, in my super senior category, I have 2141. In fact, let's just fix that so I can feel good about myself. So in this 15 plus one, so that's 15 or above, we have 2141. Okay, so that's interesting. That's actually not technically what the question was. The question was is, how do we figure out how many senior dogs, right?
[00:06:33] So actually, this is all the senior animals so I bucketed all animals in the pet. Collection, not just dogs, so I actually need to add another stage to this. And the way I'm gonna do that, well, I'm just gonna copy and paste it from here and explain it.
[00:06:51] All right, so this bucket, this is the same one except it's called very senior here, right? We add another stage we're gonna put that stage before the one that where we go and bucket them and we're gonna just say hey, match everything that's dog, drop everything that's else.
[00:07:08] And so that's the first stage will be that match stage. The second stage will be the bucketing stage. So you can see here now these numbers are much smaller because we're just looking at dogs. We're not looking at the rest of the pets in the collection. So let's add one more thing.
[00:07:22] Let's say that for whatever reason, we wanted to sort these by which one had the most, right? I mean, obviously, we can do that visually. But that's less interesting. Let's do it programmatically. Add one more stage. So the pet, the match stage is the same, the bucket stage stays the same, but adding one more stage at the end.
[00:07:46] Now we're just saying sort by count descending, which basically says we're going to have the one with the most first second most second, so on and so forth. So you can see here we have the most adult animals, second most senior animals, third most very senior animals and the least amount of puppies.
[00:08:08] This is just one very, very simple use case of the aggregation pipeline. This is a infinitely flexible, you can do lots of really amazing things with the aggregation pipeline. I just wanted to kinda like get your head around it a little bit. So you can think like, hey, I need to derive this insight out of my data.
[00:08:25] I need to figure out, how many reptiles were born on Tuesdays, right? I don't know stuff like that that stuff that the aggregation pipeline could totally figure out for you. You're kind of getting into almost like datasets. Actually not kind of you're literally getting into like the data science part of what you can do with databases, which is kind of some of the cool stuff like I'm not a data scientist, but anytime that you let me do data sciency stuff, I feel like a wizard.
[00:08:51] So I like doing that kind of stuff.