Complete Intro to SQL & PostgreSQL

Complete Intro to SQL & PostgreSQL The Movie Database Solution: Most Revenue

Learning Paths:
Check out a free preview of the full Complete Intro to SQL & PostgreSQL course:
The "The Movie Database Solution: Most Revenue" Lesson is part of the full, Complete Intro to SQL & PostgreSQL course featured in this preview video. Here's what you'd learn in this lesson:

Brian demonstrates the solution to "Which 5 people were in the movies that had the most revenue?"

Get Unlimited Access Now

Transcript from the "The Movie Database Solution: Most Revenue" Lesson

[00:00:00]
>> Which five people were in the movies that had the most revenue? And for me, when I wrote this query, it was just like a slap in the head, like of course, that's true, right? Keep in mind that it's not like starring, right? It's just like the person that appears in the most movies that makes the most money.

[00:00:19] I don't know, I thought that was fascinating. All right, so we're gonna say SELECT, p.name, person dot name. COALESCE, this again, might not be necessarily again. Sum of m.revenue. 0 AS total FROM movies m, Again, we're gonna look INNER JOIN, casts c ON
>> Would it make more sense to have your COALESCE inside your SUM to rule out anything that is null, and call it a 0?

[00:01:12]
>> Yeah, I mean, that's probably why I was just copying, pasting from other queries. Which is why this doesn't make any sense, which I think just means that it's not necessary. Cuz once you start trying to do math with null, it's just automatically gets made into a 0, yeah.

[00:01:30]
>> Yeah, I did it without the COALESCE, and it worked fine.
>> Yeah. I was probably still hungry from all those recipes that I was writing, so I was just not thinking straight, typical Brian stuff.
>> Thinking about Pokemon?
>> Just thinking about Pokemon. Okay, so INNER JOIN cast c ON movie ID = cast.movie_id.

[00:01:55] Then we're gonna INNER JOIN people p ON c.person_id = p.id. We're gonna have to throw a good old-fashioned GROUP BY in there. We're gonna group by person_id and person.name. Reason being is, people can have the same name, right? So we wanna make sure that the person itself is named, because this p.name does not guarantee unique.

[00:02:34] I'm sure that there are actors and actresses in here that have the same name. That would be a great query for you to write at some point as well. Find me all of the people that have duplicate names in the database. You don't have that right here, but you could totally write that.

[00:02:49]
>> Why doesn't grouping by p.id alone rule out that possibility?
>> It could, but you need to put a name in here, because you wanna use it up here. In this case, it's we were talking about those nonsensical GROUP BY statements, if I try and select for things that aren't gonna be grouped.

[00:03:13] In this case, we know that ID and name are relatively unique, right? So by adding that p.name, you're not actually changing how it's grouping by. But you still have to let Postgres knows, I'm gonna use this up in the SELECT as well.
>> I only grouped by p.id, and it worked.

[00:03:31]
>> It does?
>> Yeah.
>> That might just be habit of mine, then.
>> Weird.
>> Right, so we'll try it without GROUP BY p.id, p.name, ORDER BY the total, which is this one up here, right? And then I'm just limiting it to 5, but you could put like 15 here, or something like that, whatever you want.

[00:03:53] Let's try that. L, Yeah, and then we have to do total, and then we have to also do NULLS LAST. You know why? So this is why I had this COALESCE here, it's so that it didn't have to do the NULLS LAST. Because that would coalesce them to zero, which means I could sort by this without any additional NULLS LAST logic here.

[00:04:23] That's why that COALESCE is in that query. Which one is right? This one, I would say, is slightly more right, but they both end up with the correct answer, so. All right, so let's try that again. So Stan Lee, which you think about it, cuz he's in every Marvel movie, makes sense his movies made the most, right?

[00:04:47] Because those Marvel movies make stupid amounts of money. Peter Jackson, the directories on the King Kong and all those kinds of movies, that obviously makes sense as well. George Lucas, Star Wars, Samuel L Jackson, cuz he's just in everything. Which I'm not complaining about, he could be in more movies, and I'd be fine with it.

[00:05:07] Kevin Feige, he's the Marvel guy. Hans Zimmer, makes sense? Pretty cool, right? So those people were associated with the movies that had the highest box office. Like I said, obviously, Stan Lee is not starring in any of these Marvel movies, but he appears in all of these. I'm also gonna say that this database, I think, is pretty incomplete, so don't be surprised by that.

[00:05:32]
>> It would be nice if we could add a WHERE clause for just acting in a movie, or something like that.
>> That's actually a good point. And how would you model that data? Where would you say acting? Where would you put that piece of information in this database?

[00:05:53]
>> So you've got the people, and then you have the link between them, what was that, casts? Probably in the cast somewhere I would have like role ID, or something in a role table.
>> Yeah, so let's take a look at that, cuz actually it might even be there?

[00:06:11]
>> Job_id might be it?
>> Actually did there.
>> Does it say they have a role in there, but that's just the name of their character?
>> Yeah, but job_id-
>> job_id is probably that, yeah, yeah.
>> LIMIT 10, okay, so what's the position?
>> Jobs, maybe, job names?

[00:06:33]
>> It's just jobs.
>> Okay, you really could figure out the acting. So this is the point that I actually wanted to get to here. One of the things you could call these are edges, which is more like a graph database term, that it's describing a relationship between the tables.

[00:07:05] So, for example, in recipes and ingredients, an ingredient is just in a recipe, we didn't actually describe that relationship at all. But the relationships between movies and people is actually a bit more nuanced, right? It's not that just a person belongs to a movie, you have people like Stan Lee that just shows up, right?

[00:07:24] He's not really in that movie. You don't think of Stan Lee as the biggest actor in the world, despite the fact that he turned up at the top of our list, right? So what you would have to do is you'd have to go and describe it in the edge, basically like, here's the relationship between this person and this person.

[00:07:39] So here it actually does that, didn't realize that it did, but that would actually be a great. Another step that you could take here is I want you to go query for the person that acted in the most highest revenue generating movies. That would be another inner join that you could take, actually another two inner joins, right?

[00:07:56] Actually, no, it's just one, cuz it's described here, anyway. You'd have to join to position to jobs to figure out where it was actors, right? Or you can go see who was in the camera department of the most best movies. More interesting questions that you can ask here.

[00:08:20]
>> Most lucrative cameraman.
>> Most lucrative camera person ever, yes, I would like to see that. Someone let me know, before this class ends, I want someone to send me a message saying who is the most lucrative camera person of all time. Okay, any questions about this query here?

[00:08:45] So just to circle up why I had COALESCE here. SUM 0, like this, and then I didn't have NULLS LAST here. Same thing as last time. You have to tell what to do with null, in this case I didn't wanna tell when I was writing this query what to do with null, so I just made all the nulls 0.

[00:09:10] And then we end up getting the same thing. Also notice that this query is a little slow, right? I have a brand new, this is an M1 Max. Notice it's taking a hot half second to run a query, which for my Mac is quite slow, right? It's a big query, right?

[00:09:28] You have two JOINS, and then you're doing an aggregation step, and then an ORDER BY step after that, that's quite a few steps. Again, we're not quite to this section yet, I think it's next. But if we put the analyze on here, the 166,000, and we'll talk about costs here in a second.

[00:09:50] But that's a big query, [LAUGH] and I'll show you in the next couple steps here how to optimize that. We can see it took over a half second. And it will usually tell you how many rows it looked at. So this looked at, at least in this part, 261,000 rows.

[00:10:12] This one looked at 444,000 rows. It's pretty wild stuff, right? I mean how many people, or how many rows, are in casts? SELECT COUNT(*) FROM casts. So there's a million rows in the casting table, it's wild, right?