Transcript from the "Window Functions" Lesson
>> That's it for transactions. I promised you we would talk about window functions, I'm finally going to explain window functions for you. We talked about that hours ago. You probably forgot about, I probably could have just not talked about and you wouldn't have cared. Well, surprise, we're back.
[00:00:18] A window function is basically, like I have a bunch of these rows, but I wanna do something collectively with all of these rows. So let's say, for example, I have my movies, and all of these movies have ratings with them, or not all of them, but many of them have movies with them.
[00:00:35] And yeah, sorry, I reconnected to the omdb here, I did \c omdb, you should do that. Now, we should be back in Movie territory, I'm gonna clear this. So I have movies, SELECT name and vote average, From movies Limit, I don't know, 20. So you can see here, people really didn't vote very high on A New Hope.
[00:01:09] I liked that movie. They liked Pirates of the Caribbean: The Curse of the Black Pearl. Who are these people voting for these things better? Anyway, How dare they? So, I can get these vote averages out of my database. What if I wanted to compare that to the average, right?
[00:01:31] We're trying to do something like, is it like a standard deviation better or a standard deviation worse, or something like that? We can use a Window function to do that. So I can, SELECT name, kind, kind is like, is it a movie? Is it a TV show? The kind of thing.
[00:01:56] Vote average, and then I can do an average function here of vote average. And then you have to tell it over what. Well, I wanna do it for the entire result set, so I'm gonna say OVER (), this is gonna be the entire results set. But you could limit it here if you wanted it to.
[00:02:20] And we're gonna do that AS all_average. FROM movies, and we'll say LIMIT, I don't know, 15 or 25 or something like that. We'll say 15. So you can see here, let's just run this again with, there we go. You can see, this is the average 6.25, so you can see all of these ones that are showing so far besides Jarhead, which I admit I saw and I admit I didn't know that, no.
[00:02:58] Kill Bill, no. Yes for Jarhead, didn't really care for this movie. So that tracks with what I think. We can see the average of all films versus the ones that we're looking at. That's cool so far if you do over with empty modifier there, empty kind of partition is what it would be, you get it across the entire result set.
[00:03:23] So if I put like a WHERE here, WHERE kind equals movie, it would not use those in the average. Maybe let's just even do that to prove my point here. FROM movies WHERE kind = 'movie'. So you can see, it didn't change a lot, but you can see these two numbers are not the same.
[00:03:52] So the filter is going to affect that, right? It's over the result set, not over all movies, everything in the movies table. It's going to be just over the result set. Differentiation makes sense to you? Okay, let's take this a step further. What if I'm interested in the average by kind, right?
[00:04:10] So I wanna compare, maybe I have this hypothesis that people like movies better than they like TV shows, or vice versa. So we can compare to the average of its kind as opposed to the average of the entire result set. Well, the way you're gonna that is we're going to say SELECT name, kind, vote_average, AVG(vote_average).
[00:04:41] And then this is where you're gonna use the OVER, we're gonna say OVER, and you're gonna use this (PARTITION BY), I have to make sure I spell partition right. It's one of those words it's just sometimes messes you up. PARTITION BY kind, right, because we're trying to slice and dice by kind.
[00:05:02] And we'll say that as kind _ average, okay? From movies, Limit 20, something like that. So this isn't super useful so far because all we're seeing is movies, but just maybe we get a larger set here by 100. Movie, movie, movie, movie, movie, movie, movie, movie. Okay, you are not helpful.
[00:05:40] Well, let's just add a WHERE so you can see that it also works for things that are not movies, WHERE kind is not equal to movie. So you can see here, movie, the average is 6.2, but if you change it to series, its average goes up to 6.7.
[00:06:02] So people in general seem to like series better. AND kind is not equal to series, Seasons, it seems like people don't care about seasons. Okay, and we should say AND, vote.average IS NOT NULL. Okay, I'm trying to get some more episodes and stuff like that, that's different. Anyway, hopefully you're catching my drift here where these numbers are now different, and they're different by kind, right?
[00:06:53] And they're partitioned by that kind. What if we're just not even interested in comparison individual titles, we just wanna see the average of each one of these, the average movie, the average series, the average of that? Well, you can do that with tools that we already know. SELECT DISTINCT kind Average, AVG, vote_average OVER, PARTITION BY kind AS kind_vote_average FROM movies.
[00:07:41] Average series 6.7, looks like there's no voting whatsoever in seasons. Episodes 6.2, movie series is the devil's number, and movie is 6.23. I like this, I like things where I can ask a reasonable question about my data, and I can use a query to get insights back, because then my only limit is the imagination of questions.
[00:08:06] I have now my ability to retrieve the data. I know, it's a cycle that I really enjoy, and that's 95% of what window functions are and what they buy for you. If you're trying to derive insight about groups of rows inside of a single row, that's when you're gonna use a window function.
[00:08:28] But it allows you to use things like aggregation functions on an individual row, you just have to tell it what you're aggregating.