Complete Intro to SQL & PostgreSQL Indexing a Derivative Value
Transcript from the "Indexing a Derivative Value" Lesson
>> So fathom for a moment, you're at your job and you work at this OMDB company that's showing people movie statistics. It's some IMDB type company. And you show your boss this really cool query that you can actually show people, how much profit movies were making and people were really interested in this query.
[00:00:24] So you're like hey, using this query, I can show this database and I can show you profit of movies by movie, by category, by actor. And people like this are amazing, people are gonna wanna see this and you show them this query, which is what this does. Let's just take a look so you can see what the results are.
[00:00:47] You can see here this is revenue, this is budget, this is the profit. I'm going to just caution you for a moment that this is all Hollywood accounting it's all reported numbers which is to say none of it makes any sense. There's I love to do the article here of what a Hollywood accounting is and basically it's just flat lies.
[00:01:08] That let's say your coworkers are very into this, your boss says like this is amazing great job go make a dashboard that we can put on our front page. And literally all 10 million of our users are gonna see it every day, several times a day. You go back to your desk you'd run, explain, analyze, and you see, I cannot run a 7000 cost a 53 millisecond query 10 million times a day.
[00:01:36] That's a bad thing, right? So now you're like freaking out. What am I going to do about this? All I've ever learned how it is to index individual rows, I haven't figured out how to index any sort of derivative information. You could do something like put it in Redis or something like that.
[00:01:54] You could query it and then populate a cache or something like that. But who wants to do that, especially if you don't have that stood up already. So there is actually a way you can actually index on derivative values here. You see this derivative value I have here which is revenue minus budget, you can index on that.
[00:02:12] So I'm going to say create index idx_movies profit on movies and then you just give it that part that you're gonna index which is coalesce revenue-budget,0, right? Cuz we all want we want movies that had no budget and no profit to be zero which is fine and then I need another one right because and I also have to have one in front which I did.
[00:02:54] What do I have two at the end here that's my question. Right cuz you need one around this as well parentheses they'll get you. Okay, now I have this index around movies profit and that now I can actually query that and if I do an explain here. Again doing the same query I went from doing was I say about 6000 7000, something like that 7300 up here to now I did it again and I'm getting 1.33 because it's actually able to index this.
[00:03:29] And now anytime that I insert a new movie, it'll actually calculate this in advance and it won't have to run these calculations on the fly. So I went from, 7000 to 1.33. I went from 53 milliseconds down to 0.132 or 6, something like that.
>> So does it make like a hidden column?
[00:03:53] Essentially, like it's storing that whenever you update either those columns, right?
>> Yeah. It is storing that I'm hesitant to call it hidden column because it's not, it's actually just totally separate. It's in a totally separate data structure, it doesn't get stored necessarily as the movie, it just gets stored as a index, right?
[00:04:16] It's stored in a B-tree, it's not stored in the table, I guess that's my point. But if you wanna think of it as, there is a index profit, which is a derivative value from the table, that's what it is, right. So I've kind of disclaimed this throughout my entire part of query performance.
[00:04:35] I am not a DBA. I have shown you a couple of very simple ways of doing indexing anything more complicated than this, you should really work with your team to figure out what's the best thing to do. Because really big difficult queries could make that really difficult. So, but for personal stuff and for simple tables and simple querying purposes, this would be enough.
[00:05:03] I'm more concerned that you know what's possible and you can identify problems. That actually being able to articulate this is precisely the index and how I would use it more you can say I know this is messed up and I know it can be fixed. And if you're using things like PG admin or you can go look at DynamoDBs, or Aurora, or Cosmos DB, or any of these cloud databases.
[00:05:30] It's gonna ask you to track slow queries for you, and then later you can go write Indexes for. Some of these cloud companies are even starting to suggest, I see you run this query a lot. I see that it causes you a lot of trouble in your app.
[00:05:45] We suggest that you try this index. So they're actually getting smarter about it as they go.