Check out a free preview of the full Building Custom Data Visualizations course
The "Using Heatmap" Lesson is part of the full, Building Custom Data Visualizations course featured in this preview video. Here's what you'd learn in this lesson:
Shirley demonstrates how to generate heatmaps with Vega-Lite displaying the release date and genre of blockbuster movies.
Transcript from the "Using Heatmap" Lesson
[00:00:00]
>> Shirley Wu: So that's a little bit that I wanted to mention. And the one last thing that I want to do, this idea of a heat map. And so, what we have here is being able to do a heat map by the genre on the y-axis and the months on the x-axis, and then colored by how many movies are in there.
[00:00:34]
So the intensity of the colors would be how many movies fall into that cell.
>> Speaker 2: My idea with this was actually popularity.
>> Shirley Wu: Okay, good.
>> Speaker 2: Based on the meta votes, the number of votes.
>> Shirley Wu: Let's keep it a little bit simpler. I think when we're doing exploration, let's not reload it too much.
[00:00:55]
I think the simpler the better. And then when we're designing, we can pick and choose what attributes are good and then design around that. But for this one, let's keep it simple. So,
>> Shirley Wu: Yes,
>> Shirley Wu: For heat maps, I haven't quite had a chance to use them before, so hopefully it's as straightforward as before But let's look at this example together.
[00:01:30]
So I found an example on Vega-Lite. And what it's telling me is that it needs the data source. This one has a title and some configurations about, I guess some of the axis or the scales, but I think hopefully we don't need to bother with that. The important thing, it seems, is that for a heat map, the mark is the rectangle, which makes sense.
[00:01:58]
And then the x-axis, they're using a date too. They're using dates, but they're making the dates categorical instead of temporal so that, I think, so that they can bend by the date. So for us, let me start writing this down, actually. This one is a heat map where the x-axis is the month.
[00:02:29]
And then the y-axis for us is the genres.
>> Shirley Wu: [COUGH] And let's do it this way. So in terms of the x-axis with the months, what we want to do is have a, we still want to use the field,
>> Shirley Wu: That's the release date. But in this case we can say time unit, and that's another one of those things in Vega-Lite that you can specify.
[00:03:03]
But we'll use time unit as month. And then that goes together with type ordinal.
>> Shirley Wu: So that it knows to bucket by months. Each of the cells are bucketed by months. And if we want it to be fancy, and this is a final visualization, I think that's why they have the axes and they're labeling it a specific way and formatting a specific way.
[00:03:35]
But we're just trying to do exploration, so we can probably ignore that part. And the y-axis for them is date, but for us should be genres. So I think the type is probably nominal, cuz it's not ordinal because genres don't have a specified order. Where as in this one in the x-axis, the months are ordinal.
[00:04:00]
And then the field should be genre. And then the color of each of the cells is, they're doing it by, actually, Mark, let's do it both ways. One, that's just the count, and another that is the meta score. Let's see if we can try both of them. So let's try for the first one, having it being type quantitative and having it be the field, so having it be aggregated by count.
[00:04:56]
And then the field being everything, so this asterisk. So basically you're just aggregating each record of a movie. And so we'll try that first, and then I think after that I think we should be able to change the color to use the field. Was the look up? So maybe we can use the, instead of aggregating by max or by count maybe we can aggregate by the median meta score.
[00:05:29]
>> Speaker 2: I was thinking the actual IMDB votes is a proxy for how popular something is.
>> Shirley Wu: Okay, so IMDB votes?
>> Speaker 2: Yeah, IMDB votes would be how many votes, seems like it would give you an indicator of how popular a video is.
>> Shirley Wu: That's a great point, so lets aggregate median, let's still use median votes, aggregate by the median, but the field is the IMDB votes.
[00:06:06]
>> Speaker 2: And we can try both or different things as far as ratings, but I was looking at popularity over a month.
>> Shirley Wu: I think you're right in that vote are going to be much more-
>> Speaker 2: Interesting.
>> Shirley Wu: Yeah, or much more reflective of popularity. That's probably a really safe guess.
[00:06:21]
So let's do all of these together. So let's get the data ready, and in this case we're saying,
>> Shirley Wu: This is by month, genre, popularity.
>> Shirley Wu: And let's go through the data again.
>> Shirley Wu: And so for each of them I want to return a date, that is the release, released.
[00:07:01]
And then the y-axis is genre, so again is d.genre.split to turn that string into an array. And then I want to just take the first one, and then for when we do the median votes, let's just get that in there. And that one would be just turning the IMDB votes into an integer.
[00:07:38]
Does anyone see what I am doing wrong?
>> Shirley Wu: Let me see. Okay, so that is what I am doing wrong. So you can see that the votes actually have a comma in between, that's why I can't just turn it into an integer. Okay, so let me replace that comma.
[00:08:02]
>> Shirley Wu: How do I do that globally, if I'm not using a regex?
>> Speaker 2: Your first argument, and I think you can put regex as your first argument to replace.
>> Shirley Wu: Should I just do that? Okay, let me just do that then, let's do a regex so that I can, ooh.
[00:08:19]
I'm missing a parens, okay.
>> Shirley Wu: So just a simple regex to find the commas globally and to replace my empty string. And now we have the votes, awesome. Okay, so let us,
>> Shirley Wu: Do the the Vega-Lite. Let me pin this and we'll do the Vega-Lite. Okay, so we said for Vega-Lite, again, pass in the data, values, a month, genre, popularity.
[00:08:59]
And then the encoding, sorry, the mark is rectangles, and encoding we said we had x, y, and color. X we want it to be the field is date, the type is ordinal. And we said the time unit, we want the ordinal to be aggregated into months, so let's give it month.
[00:09:26]
And then y is, the field is genre, and the type is nominal. And then the color, we want it to be, the first one is basically how many movies are in there. So let's do an,
>> Shirley Wu: Let's see, type is quantitative because it's the number of records, and aggregate is count.
[00:10:01]
And let's aggregate it by all of the fields so we don't have to.
>> Shirley Wu: So actually, look at that action. This is really interesting. So there's action movies pretty much in all the months except for October it seems. And then action movies, apparently they are 31 action movies in May.
[00:10:45]
That's ever in the last two decades blockbusters, apparently action movies I guess there's a lot of them in May. And there are some others like comedy, this is great.
>> Shirley Wu: So that really does make me really curious about the popularity. So let me just copy and paste this one over really quickly.
[00:11:14]
I'm going to move this down a little bit. I'll pin this code and then this should be as easy, hopefully, as changing the color aggregation to the median and the field to votes. And hopefully that works. I don't really know if I agree with this color scheme. [LAUGH] Yeah, look at that.
[00:11:41]
>> Speaker 2: That is a bit weird.
>> Shirley Wu: It is a bit weird. So we have to train ourselves a little bit.
>> Speaker 2: Blue and black are not popular.
>> Shirley Wu: Less, yes.
>> Speaker 2: Radiating through turquoise. [LAUGH] Yellow is really popular in October, December, and January.
>> Shirley Wu: October, December, and January.
[00:12:01]
So the thing that I want to be careful of when we look at this is these are medians. So but I think I-
>> Speaker 2: Can we try max?
>> Shirley Wu: Yes, we can-
>> Speaker 2: Just to see the difference.
>> Shirley Wu: We can see max. And so the max is 2 million, and it's July and action.
[00:12:24]
But I do want to be careful of the fact that, so this is great because this is a number of movies and then this is the maximum number of votes. So that means maybe some of them here are, in July there are the most popular movie was released.
[00:12:44]
But then there aren't as many action movies that were made in July. I think it's the combination of the two would probably be quite meaningful.
>> Speaker 2: Technical question, as we're changing these aggregation functions, are these the ones that are defined by D3, do you know which? Cuz I know D3 has their account and has their statistics library.
[00:13:10]
>> Shirley Wu: That's a really great question. Let's look at the documentation. So in the aggregate,
>> Speaker 2: I also noticed that the genre is listed by alphabetically.
>> Shirley Wu: Yes.
>> Speaker 2: So it's gonna lean towards a is action.
>> Shirley Wu: Yes, yes.
>> Speaker 2: Versus h, horror, obviously there's more movies, but because we're only grabbing the first category it's gonna lean towards action.
[00:13:41]
>> Shirley Wu: Yes, and that's going to be a design decision later. I mean, it's part of the data exploration too. I think if we have more time we should definitely be looking at all of them. And once we look at all of them, it's a design decision of if we were to build something around genres, do we only build it with just the first genre, which like you said is biased towards alphabetical, unfortunately.
[00:14:13]
Or do we take all of them and somehow try and reconcile the fact that there's three genres per movie. So that's also a design decision. Yeah, so I think,
>> Shirley Wu: This is quite interesting I think. So again, it seems like action is just watched a lot and then action is also-
[00:14:39]
>> Speaker 2: Can we just each over the genres and then create more data?
>> Shirley Wu: Yeah, so-
>> Speaker 2: To get all the genres in there.
>> Shirley Wu: So you wanna do, basically let's do this really quickly together.
>> Speaker 2: The genre array.
>> Shirley Wu: So I would do it as-
>> Speaker 2: Let's create lots of different records for- Yeah, so for the sake of data exploration, let's do that.
[00:15:08]
And so what we want to do is within that we want to for each of the genres.
>> Shirley Wu: So basically for this part,
>> Shirley Wu: We want to iterate over that. Whoops, map. Return an array.
>> Shirley Wu: For each movie return an array of its genres with date and genre and votes information in all of them.
[00:15:43]
And then it's going to return an array of arrays, so flatten that. And then hopefully we'll be able to see, let's see, so now there's 570 records. [LAUGH] And then that should give us an updated version that's a lot longer, a little bit longer. And it's true that action and adventure is still most popular, but we can see a little bit more of, that's really interesting.
[00:16:15]
These crime and dramas that were released in July have almost 2 million votes also. Even though if you look at crime and drama in July there weren't necessarily that many movies that were released, but they turned out to be extremely popular. What I usually do after all of this is I kind of write down what I learned, and then what I want to create my design, what kind of themes I want to center my design around.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops