Complete Intro to SQL & PostgreSQL Filtering Aggregates with HAVING
Transcript from the "Filtering Aggregates with HAVING" Lesson
>> So another thing that a lot of other SQL courses will talk about way before now is what order things happen. So, let's say I put, let's just put this query in here. I'm gonna put Ctrl L. This one here. I think it's the same one, no we'll type it out.
[00:00:26] SELECT, type. COUNT(type). FROM ingredients WHERE COUNT. (type) is greater than or equal to 10. GROUP BY, Type. This query doesn't work either. So I haven't talked about the order of execution of things in SQL statements because when I was learning SQL, all it did was confuse me. Right?
[00:01:10] Do I need to care, does a SELECT happen first, does the FROM happen first, does the WHERE happen first? To this point, it hasn't mattered at all to any of the queries that we've been doing so far of what order things happen, which is why I elected to not tell you about it until now when it actually is going to affect you.
[00:01:29] Here it's gonna be like, this is the problematic part. I think it's reasonable for us to ask. Let's say we're trying to aggregate by type. But let's say we have 600 types of things, and 550 of them have just one. And we wanna eliminate all the ones that have just one, and we wanna see larger macro trends in our data rather than just groups of one, right?
[00:01:52] So it'd be useful if we could say, hey, give us back all the counts that are greater than one, or greater than ten, or something like that. But we can't do WHERE because the WHERE happens before the counting. So the WHERE happens before the aggregation, so you can't say WHERE COUNT type is greater than or is like, filtered here, right?
[00:02:17] Because the counting here actually hasn't happened yet. So if we're asking for WHERE here it doesn't make any sense because it can't give you back numbers because it doesn't have the number yet. And if you think about this, this is actually kind of a useful thing for you because you can filter before it gets counted.
[00:02:35] Make sense, right? Let's say we wanted to count all the types but we wanted to eliminate things that didn't have images. That's where you'd use WHERE. Let's see, ingredients. Right, so I have image here. Let's say for whatever reason I was trying to say, where image IS NOT NULL.
[00:03:05] This is a sensical thing to ask, right? I mean, I don't know why you'd wanna ask it but you could. So this is going to be filtering on the rows that are going to go into the result set. So we need another thing, another concept here to filter on after aggregation.
[00:03:26] So again I think this is maybe the same cuz I think literally everything has an image but you kinda get my point here. Actually, is that true? It is, but, So how would we do that? Well, you can add an additional thing here called having. So I can just copy and paste, I guess HAVING COUNT(type) < 10 Or we can go the other way to and say > 10.
[00:04:24] So you can think of HAVING as basically the post aggregation WHERE? Does that make sense? So the WHERE is going to filter the things that go into the result set, it's gonna get aggregated, and then you can ask questions about, and then you can filter further on the result set using HAVING.
[00:04:52] So, yeah, you can use both of them together. Or this is another one I had in here, instead of saying WHERE image is NOT NULL because this ends up being the same thing, but we could say WHERE id is > 30. So, you can see that actually does modify the aggregation of it.
[00:05:10] If we're only looking at things where the id is greater than 30. So this is my, Maybe pragmatic advice to you. I don't live and breathe in SQL. I use it as like the toolset that I use to write applications. There are people out there like data scientists and DBAs that they live and breathe SQL.
[00:05:36] I don't really bother too much of learning the exact order said things happen in SQL cuz it's a really complicated decision tree of what happens when, right? If you really dig down into it. I only look at it when it affects what I am trying to do. So in this case, it made a big difference because we need to know the WHERE happens first, and then aggregation, and then HAVING.
[00:06:01] But do you really need to know the exact minutiae of it? Usually not, usually doesn't affect you, which is why we're not gonna talk about it, I don't think for the rest of the course. But if you're curious, you can go look it up. It's like explained out in depth on the docs and Wikipedia, and stuff like that.
[00:06:18] The other thing I'll just tell you upfront, I have everything here in the correct order. Right, you have to have HAVING at the end, if you're try to put like HAVING in front of WHERE, it's gonna say like, you're doing this wrong. I never remember the order of things coming.
[00:06:31] I'd have to test and retest until things work. And I have to look it up frequently because it's not a part of my brain that I just. I don't commit it to memory cuz it's not important to me. So I'm throwing out there that I think that's fine.
[00:06:46] I don't think you need to memorize the order of which SQL things happen. I'm probably making some DBA really mad right now, but I have so many other like Pokemon facts that I just have to keep in my brain that I just can't fit it. God, I wish that wasn't true but it's actually a little true.