Transcript from the "Identifying what to Index" Lesson
>> So, I just wanted you to see a much more complicated query. Try to remember what this one is. This might be one of the answers to one of the previous ones. Yeah, this is the movies with the most categories I think, it is. So, let's do an explain analyze on this, because this is obviously going to do a lot more.
[00:00:23] So you can see here it's cost is 53,000. Keep in mind that the one that we did in the previous one was what? About 5000, right? So this is 10 times slower, as opposed to work like our index lookup, which was, well, like 0.75 or something like that?
[00:00:43] No, it's 8.44, so this is magnitudes slower. But here you can see that the sort cost this, most of the sort's cost was in its set up, it had to sort 183,000 rows, that's a lot. It sorted by the count, it did a heapsort, I don't know anything about sorts so I don't know how it chooses that.
[00:01:10] Grouping, it does the grouping by m.id. Did a merge join, did a merge condition on this, you can see here we did hit an Index Scan here using the movies_pkey, that's cool. Index Only Scan, another good thing to see. You had some, anyway, Planning Time, Execution status, that's always something interesting to look at as well.
[00:01:38] So let's talk just a moment about the postgres planner. One of the reasons I chose this database for us to use in the course of this class is there's a lot of rows on it. Which means that we're actually going to hit some occasions where, indexes are gonna be really useful.
[00:01:53] The postgres planner is really smart and usually smarter than you. There are times is like you're explain analyzing some query and you have some index that you put in there and created for it. And postgres's plan is like no, I'm not gonna use that. I'm still gonna use a sequential scan.
[00:02:11] And it is usually right, because it'll look at how many rows it has to filter. It'll look at how many rows are on the table. It'll look at all sorts of other variables that you and I are not looking at. And usually, it will select the correct thing that will be the fastest.
[00:02:26] So that's the point of the postgres planner, is to look at all the tools that it has to work with, and then choose the correct one to do your query. So, in other words, if you're working on a really little table with. Like if we were trying to do indexes on the recipes table, it will almost never use our indexes because it's such a small table.
[00:02:47] There's like 300 total things in that table. Postgres can just burn through that really quickly with sequential scan, so it would just use sequential scan and everything. Cuz again remember, the setup for indexes is not free. Which means, frequently the cost of setting up to use an index, will exceed the cost of doing a sequential scan.
[00:03:12] So, it'll elect to not use your index, does that make sense? So don't get frustrated if it's not using an index, that's my communication to you is frequently the planner is a lot smarter than you are when it comes to query performance. So, the reason why I show you this big complicated thing, let's say that you were running this a lot.
[00:03:34] For whatever reason, this was a really important query to your application. You could run explain analyze on this and say, well, this is kind of expensive, I don't like how expensive this is. You can go in piece by piece and figure out, where the slow parts are. And then you can try and optimize those parts of the query for indexes.
[00:03:54] Cuz I can't, this is too complicated of a query, just say index this query, that's doesn't make any sense. You have to index parts of the query. So, if I was looking at this, and this was a really important thing to me, like this part, right. This has no cost, so you would totally ignore this part of the query, you would not try to optimize this.
[00:04:15] This one right here, you would totally ignore this, because that's really fast. You start getting to these ones, okay, these are things where I can look at them and say, there are games to be made here, possible. So you look for these parts that have big numbers and you're, okay, I'm gonna try and see if I can figure out how to write indexes for this.
[00:04:38] But just like general programming practice, you don't wanna go and try and optimize everything from the get go. Generally you kind of wanna have a problem before you solve a problem. Cuz, if you're trying to solve problems before you have them, you almost always solve the wrong problem, right?
[00:04:56] So that's my advice to you with indexes. They're not free, because now the postgres planner has to look at every time it's, should I use this index or not, that's not free. And some of these indexes get huge, right? Cuz you have to think about it. If you have 175,000 rows, there's 175,000 entries in a B tree.
[00:05:16] That's not free, right? And it has to keep that up to date, every time that you insert a new row and has to go insert that then rebalance the tree. It's additional overhead. So, if you're creating indexes, make sure you're using them, if you're not using them, then drop them.