Building Custom Data Visualizations

# Code Solution, Part 1

Check out a free preview of the full Building Custom Data Visualizations course:
The "Code Solution, Part 1" Lesson is part of the full, Building Custom Data Visualizations course featured in this preview video. Here's what you'd learn in this lesson:

## After explaining the distribution of the data, and introducing D3 Areas as the main method to form the visualization, Shirley live codes the chart solution.

Get Unlimited Access Now

Transcript from the "Code Solution, Part 1" Lesson

[00:00:00]
>> Shirley Wu: All right, let's do this together. Let's draw a curve for each of the movies. And to do this I want to go to one of my favorite functions, D3arc. So,
>> Shirley Wu: d3, it's under d3 shape. And so we can look at the documentation, and it's D3 Area.

[00:00:33] So, we want to draw one of these area charts, and what it needs is an array of objects. And then we need to tell it where the x-position is. We want to tell it what the upper y-position is and the lower y-position, and so let's talk about what that means for us.

[00:01:05]
>> Shirley Wu: So what we need is, for each of the movies, we want to draw this sort of a curve. So this is a movie, and so we want this to be the date. So the x-position is the date that the movie was released, and then the y-position is the box office figure.

[00:01:35] But actually, it is the box office figure subtracted by the mean box office, and that is my y-position. And so but that is only this point, and to be able to draw a curve with the area chart, we need this point. And we need this point, and we need this point.

[00:02:09]
>> Shirley Wu: So the way that I am going to do that is basically I want to create quote unquote fake data to make this curve. So we have this dataset for each movie that gives me the (x,y) position of this top point. But for this one, let's make that the date plus, let's say two months.

[00:02:46] So that's the x for that particular point, and the y is the mean box office. And for this one, let's make that the date of that movie minus two months. And the y is also the mean box office.
>> Shirley Wu: And if we have that, then we have enough data to pass into the area generator.

[00:03:22] And so instead of having a array of all these data points, what we're going to have is actually, we're going to have an array of the movies. And for each movie, we're going to use the area generator. And so for each movie, we're actually going to be passing in an array of objects, where the first object is this date minus two months.

[00:03:55] The second object is this kind of the date and the box office number itself. And then the last one, the last object in the array is this date plus two months and the mean box office again. So, let's talk about that, and so to do that, we need to do some data preparation again to get it into the state.

[00:04:30] So let's go to our code. Okay, so here is,
>> Shirley Wu: A co-sandbox that I've set up for us. No, spoiler alert, that's what it looks like in the end.
>> Shirley Wu: Okay, so,
>> Shirley Wu: Yes, hopefully if you open it up, it should take you to starter.js.
>> Shirley Wu: And I have some things that I've defined up here.

[00:05:13] I've just imported d3 and to low dash. I have US inflation to help me kind of adjust for inflation for the box office numbers since that's going to be really important for us. And I have some other things loaded in for later. I also I decided to only look at the last ten years instead of the last 20 years just because we're doing something pretty simple.

[00:05:42] All we want to do is put a date on the x-axis, so then 20 years is a little to long. It's a little to much data, so, for this particular exercise, I decided to do just the 10 years. I also have for us a few of the colors we might use for the genres and a few of the colors for the holidays.

[00:06:04] I've loaded in the data here. I've set up our SVG with a width of 1200 and a height of 300 and given myself some margin to pad the visualization with. And then I'm creating my SVG element, and I did a little bit of work in terms of,
>> Shirley Wu: Processing the data.

[00:06:32] In particular, I kind of just took only the data attributes I need, and so in this case I just have the title. I have the date that's been used. I've wrapped the release date into a JavaScript date object. I have the box office adjusted for inflation, and for simplicity and I don't like to do this.

[00:06:55] And if it was a full visualization that I'm trying to do, I would try to do something with all of the genres. But to make this exercise relatively simple, we're going to take only the first genre that we see. But if this was a full on data visualization, I might go and find a different source that doesn't sort the genres by alphabetical and then that sorts it by maybe the most relevant genre, etc.

[00:07:33]
>> Shirley Wu: Yeah, and then I filtered it by all of the box office with the year after the 2008 start year I have.
>> Shirley Wu: Okay, so that's our data. So to get our data ready for this, what I want to do is figure out the mean box office figure, as well as the top three genres.

[00:08:01]
>> Shirley Wu: And then after that, I think I want to make a scale for the x-axis, the y-axis, and colors.
>> Shirley Wu: So let's do that together. So const, and to get the mean box office, I believe all I have to do is do d3.mean, and then pass in the data I have.

[00:08:34] So I call my data movies, and I just do, I tell it that I want D3 to take a look at the box office figures and give me back the mean for that. Let me console.log to make sure I did the right thing, and then,
>> Shirley Wu: So right here, a little bit hard to see, but this is the mean box office, which, I don't know, does it blow your mind that it's \$330 million is the mean?

[00:09:24] I mean, it's skewed because we're only looking at the top ten blockbusters, but that still blows my mind that it's 330 million. I don't think I think in that realm of money. [LAUGH] When I was doing this, I was like, holy. So now we have the mean box office, and then I just want to get the top three genres.

[00:09:48] So I tend to use Lodash for this because I like being able to chain my,
>> Shirley Wu: My functions together, but I haven't quite figured out the best way and a better alternative to figure out the top three of something. But stick with me as I do this. I'm going to chain and pass in the movie's dataset.

[00:10:16] The very first thing I want to do is I want to just grab the genres out of there. So this is going to give me back a flat array of all of the genres. And then what we're going to do is sort those by, actually, I'm sorry, I don't need to do it this way.

[00:10:40] I just want to be able to countBy the genres in each movie, and let's console.log what that looks like too, so let's console.log in genres. In this, so in Lodash, if I use countBy, it would give me this object, with key being what I asked for it to be counted by, in this case, it's genres, and the value being the number of movies with that genre.

[00:11:10] And then what I want to do from there is I want to grab only the top three of these, so what I'm going to do is sort by the values. So to do that, I'm first going to use this thing called .toPairs. That changes it into an array, basically an array of arrays, where the first one is a genre and the second one is a number of movies under that genre.

[00:11:42] I'm then going to sort by,
>> Shirley Wu: That, I'm going to sort by that second. So I'm going to sort by the second value, that's the numbers. And then finally, I'm going to take the first three of that result.
>> Shirley Wu: And now we have just the top three genres.

[00:12:09] And then you'll notice that it's still an array, so I'm just going to get back the,
>> Shirley Wu: The first element of each array, and get back an array of the top three genres. And please tell me if there is a better way to do this. [LAUGH] I just end up doing this this way every time and it's really ugly.

[00:12:37] I'm pretty sure there's probably better way to do this out there, so please let me know if there's a better way to do this. And I would love to know. But at least we have genres now, and that way we can go and make the scales together. So in this case, the very first one is an x-scale, and that should be timescale.

[00:13:01] And so for x-scale lets create scale time, and we're going to use, my bad, before we can do scales, we need domains. So let's calculate the domains first, and so for that, I want the minimum and maximum date. I'm going to put that in. So I'm going to ask D3 to figure out the min date and max date of my dataset.

[00:13:39] And I'm going to tell it that to do that it needs to look at the date attribute of each of the movies. And now I should be able to give a domain. But here's the thing, I kind of want my values to be nice, so maybe here the minimum date is going to be, I don't know, January 25th or something.

[00:14:02] And maybe the maximum date is like December 20th, 2017, or something like that. And I want that to be nicely rounded to January 1st, all the way to December 31st. And D3 has this really nice thing called d3-time that will help me, this is one of my favorite modules in D3, this d3-time.

[00:14:19] And I think it does similar things as moments, I think, I haven't double checked this, but I think it's probably smaller than moments. But in particular, what I'm going to do is I'm going to choose a time interval and floor by that. In this case, I'm going to say, give me,

[00:14:48]
>> Shirley Wu: The floor by the year. So let me show you that. So what I'm gonna do is say, d3-timeYear.floor, and pass in the minDate. So whatever that min date was, it's going to be, let's say that minDate was January 25th, 2008, it's going to floor it down to January 1st, 2008.

[00:15:16] And then let's do the max date, but this time instead of flooring let's make it go to the ceiling, so that would be .ceiling and then maxDate.
>> Shirley Wu: And then the range I want It to map to is I want the minimum date to go to the most left var count from margin, so margin.left.

[00:15:48] And then I want the maximum date to go all the way to the right, so width- margin.right. And let's make sure that we did this correctly. So I'm going to take a look at, this is how you can check if you have any bugs. This is how you can check that what you pass into the domain and range of the scale is computing correctly, by saying xScale.domain, and not passing anything in, or .range and not passing anything in.

[00:16:22] And you can see that it seems like we did it right. Where, as expected, we have January 1, 2008. And then my data set ends, I think, in December of 2017. So this goes to January 1, 2018, to kind of give me that nice start and beginning. And as we expected, there is the range looks good too.

[00:16:53] So let's go and figure out the yScale. So this is d3.scale. The y is going to be the box office, so what it's going to be is d3.scaleLinear. But we first have to figure out the box office extent. So min and max of the box office, d3.extent, movies, get the box office, sorry.

[00:17:28] I'm sorry, it's the difference between the box office and our mean, so I'm gonna actually go back. Let me think about this. Actually, let's do it this way. Let's just do d dot, our boxOffice number minus the meanBox, the mean bugs, that we calculated before. [LAUGH] And then pass that into the domain(boxExtent), and then pass that.

[00:18:00] And then let's give it a range from, I want the minimum box office to be at the bottom. So that would be, height- margin.bottom. And I want the highest to go all the way to the top, so I'll make that margin.top. Let me double check that that's done correctly.

[00:18:25] I wonder if I can disable this, cuz this is probably really distracting for when you're writing. But console.log, yeah, let's go ahead. Let's keep going ahead with it. Let's take a look to make sure that we calculated the domain correctly. And indeed we have. So the minimum is, there is a movie that did 150 million less than the mean.

[00:18:55] And there is also a movie that did 600 million above the mean, which should be about \$950 million in box office. I actually didn't confirm if box office was for opening weekend or the whole time, but 900 million. So yeah, so we have the scales. And then the last one we wanna do is the colors.

[00:19:23] So, const colorScale = d3.scaleOrdinal. And the domain is the genres that we calculated earlier, the array of three genres. And the range is the genreColors that I defined up top. And that's an array of three different colors. Okay, we're done with the scales. Okay, now let's go back.

[00:19:53] Now what we need to do, now we need to create the area generator. So let's do that together, const areaGen = d3.area. I don't think there's any other kind of settings that we need to pass in for it, so I think this is all we need to do.

[00:20:20] And now, let's draw some curves. So what we need to do here is some basic D3 enter append. And so what we want to do is, I'm just gonna call these curves is equal to, let's grab that SVG. And for the SVG, let's select all of the path elements on the screen that don't yet exist.

[00:20:51] And bind the data that we calculated, the movies data that we calculated, and then enter append a new path for each one them. And actually, I'm going to selectAll(".curve"). I'm going to select all of the paths that have the class curve, just to make sure that D3 doesn't go and select all of the paths in the SVG.

[00:21:19] Which is going to be problematic once we are creating axes and stuff that also create paths. So let's make sure to specify that I only want the paths with the class curve. And of course if I do that, I need to make sure that I give that class, .classed("curve") is true.

[00:21:45] And if we’ve done this correctly, then we should hopefully see maybe 100 or so path elements on the screen. So let me open this up, and there we go. So we've done that correctly. And let's make sure also that the data is bound. So if I choose each one of them, let's make sure that we can look up properties.

[00:22:13] We can open up the property for each of the path. Scroll all the way down, __data, and this is where D3 does the data binding. And you can see this is our data. So this is our boxOffice figure. This is the date that movie was released, the genre, the title, the year.

[00:22:35] So now we can go about drawing each of the curves. And I just realized that there are some settings that we need to be passing in. And in particular, we want to be telling the area generator how to access the x-coordinate of each of the movies, as well as the top y-coordinate of each of the curves, and the bottom y-coordinate of each of the curves.

[00:23:08] So let's do that right now. For each of the movies, I want the x point to be xScale, and pass in the date to translate from date to the x-axis. I want the, sorry, top of the area to be yScale, d., the difference between the boxOffice minus the meanBox office.

[00:23:42] And then I want the bottom part of the curve to be yScale and 0, as in there is no difference, this is the meanBox. So just that flat I want the bottom of the area to be this flat line that represents no difference from the main box office.

[00:24:13] So that's why I Y scale, and I'm pass in the zero. And so, for each of the curves, let's then pass in the d attribute. And for the d attribute, what I want it to do, I want it to calculate a curve, and so I am going to use areaGen.

[00:24:35] And I am going to pass in an array, and this is where the array of the three points comes in. So what I'm gonna do is do an object for the first one, and this one is I said I wanted to be the date.
>> Shirley Wu: Actually, you know what, here, I've defined this incorrectly.

[00:25:02] So I'm going to create data right now for each of these points, and I'm just gonna call it one is the date and one is the value. For the very first one, the date is the date of the movie and subtract two months. So again we're going to use D3 time, and now we're going to be using d3.timeMonth.

[00:25:32] And we're going to use this thing called offset, which is this other really cool function that helps you. If you give it a certain date and if you give it a plus one, in this case, if we give it the date and plus one, it would give you back the date plus one month.

[00:25:51] If you say minus one, it would give you back that date minus a month. A really cool function, so we're gonna do that d.date and then pass a -2 to tell it that I want back the date, that's the date I have subtract two months. Then, I want the value to just be 0 because I want it to be that flat part right here.

[00:26:21]
>> Shirley Wu: Next one, date is our actual date right here in the middle, and the value is the box office minus the mean box office. And finally, for that last point, I want the date to be,
>> Shirley Wu: The date of our movie, but plus two months, value 0. And if we did this correctly, it should draw a bunch of black curves.

[00:26:56] Did it draw a bunch of black curves? Yes.
>> Speaker 2: [INAUDIBLE]
>> Shirley Wu: Yes. [LAUGH]. Yes! Look at that.
>> Shirley Wu: Yeah, so we're there. We're getting there. And you know what? If spikes are your thing, we don't need to adjust it more.