Check out a free preview of the full Complete Intro to Databases course

The "Group By" Lesson is part of the full, Complete Intro to Databases course featured in this preview video. Here's what you'd learn in this lesson:

Brian explains that GROUP BY is a command that allows engineers to determine how to group query results by chosen values, and is used in the example of the board messaging database to query the top 10 most commented on message boards. How to use GROUP BY with the SELECT, INNER JOIN and COUNT commands is also discussed in this segment.


Transcript from the "Group By" Lesson

>> Let's talk about subqueries. So subqueries allow you to do a query within a query. It's query ception. Let's say you wanted to query all of the comments left by someone named Maynard. I don't know, maybe that seems useful to you. We don't actually have that available to us, and there's no really useful way to join on that.

So what we can do is we can do a query to find Maynard's user ID and then use that user ID to query for all the comments. So we can say SELECT comment ID. Let's just make this on multiple lines. Comment_id, user_id and our comment, LEFT(comment, 20) FROM comments, WHERE, and then here you're gonna say, user_id = and then I'm gonna put parentheses.

This is way of me signaling to SQL I'm gonna do a query within a query. And I'm gonna start writing another query in here. SELECT user_id FROM users WHERE full_name = Maynard Simonich. So then I'm going to close that sub query. And then I'm gonna put a semicolon to signal the end of the query altogether.

So now you can see here, this will go and this query gets run first. It's gonna go out to users. It's going to find this person's user id, it's going to return that. And then it's gonna be used in this query to find all of their comments. So that's all you need to know here.

If you need to do a sub query to go get something first, just put parentheses in here. You can nest them as much as you want. So you can have queries inside of queries inside of queries. That's what how you do subqueries in SQL. Let's get into group by.

So now we're getting into what we were using the aggregation pipeline for in MongoDB. SQL kind of just has a built into the language, so there's no differentiation between the two, but let's talk about group by. So I'm gonna say, SELECT boards.board_name. Yeah, so let's talk about first, what we're gonna do with group by.

We wanna find the top ten most posted to boards. So we wanna go query the board's table, find all the boards, and then we want go look in the comments tables and figure out which one has the most amount of comments. So we're gonna use group by to figure out that kind of aggregate number of comments per board and then we're gonna do a group by with that.

So we're gonna want to see our friend COUNT(*) again to just count how many of each one that we have, and then we'll just call that AS comment_count. SELECT boards.board_name. COUNT(*) as comment_count FROM comments. Here we could say INNER JOIN on or I'm just gonna say NATURAL INNER JOIN, cuz I know this is gonna work.

Boards, and they're gonna say GROUP BY so we're gonna tell it how to aggregate things together. Boards.board_name, you could do this probably on id as well, it should work. Then we're gonna say ORDER BY, this is how you do sorting, right, ORDER BY. Comment_count, DESCENDING, cuz we want the top 10 and then we're gonna say LIMIT 10, cuz we only want 10 results back.

And then you can see, again these board names are auto generated so I don't really know what is which one. The cloned and the budgetary management boards have 18 comments. These ones all have 16 through 13, right? And if you wanted to see ascending, you can see here this one has four.

But we have a problem here, so think about this for a second. This query that we ran right here, what happens if a board has zero comments? And actually, let me tell you, there is a board in here that has zero comments, cuz I went and made sure of it, cuz I wanted you to discover this problem.

Because we're doing INNER JOIN, what happens when there's no match? It gets left out of the data set, right? So what do we do? What do we want to join? Yeah, go ahead. Hold on, I think I have you muted One more time.
>> You want to left join?

>> Close, we wanna right join on this particular problem, which is super weird. And I know I said you almost never use right joins. And then I just turned around and slap you with a right join, but that's what I did. But I like where your head's at.

That's exactly the kind of thinking that you want to go through. So awesome. So we're gonna do a natural right join, cuz we wanna include the boards, right, so left would be on comments, right would be on board. And that's where we want to get the thing that could be left out.

So now we have fire, and it says it has one comment but we know that's not true, right? It doesn't have one comment, it has zero comments. Why is that? Well, that's a problem because it's including something, it's including that board with nothing on it. And it's counting that in our Count(*).

So this Count(*), it's called an aggregation function, aggregation pipeline. Hopefully, you're seeing some parallels here. It's just counting all of the lines. And there's a line being included from that one, which is not the right thing. So what do we actually want to count? We don't want to count any row.

We want to count rows that actually are really comments, or we can just count comment ids, right, cuz if it has a comment id, it's definitely a real thing in comments. So I'm gonna say comment_id. Lo and behold, this fireboard must be about the fire festival. Doesn't have any comments, right?

But I wanted to show you how to kind of work through some mental gymnastics of SQL bugs, cuz this is the kind of stuff that you kind of have to think about when you're dealing with SQL. You have to be like, what am I including? What am I not including?

Where's the data going and where and why, right? Cuz it would have been super easy, one, for us to just exclude this board altogether. And then it would have been super easy for us to say, there's one comment in this, despite that there's actually zero comments in this.

Any questions before we move on? The question is when is limit evaluated, right? So if I do a join, does a limit get calculated first or is it join? Am I gonna be cutting things off before I'm using them? There is a particular order of operations with SQL that I do not remember off the top my head and I have to look every single time.

This one I think, I'm pretty sure, limit is the last thing, so it does everything first and then it just returns to you the first ten rows of that. I'm about 95% on that one. Do check my math. But if you just search for SQL order of operations, it'll give you a pretty complete breakdown of, here's the complete order of how things are evaluated.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now