Building Custom Data Visualizations Chart Brainstorming Solution, Part 2
Transcript from the "Chart Brainstorming Solution, Part 2" Lesson
>> Speaker 1: Let's kind of figure out how to translate this into basically Vega-Lite language. So each of the marks, so these are bars. So we would tell Vega-Lite that the mark is bars and then in terms of encoding, because we want it to be on the x axis, we want it to be.
[00:00:31] Do you want it to be IMDB rating. Do you have any preference of mediscore, metacrique, rotten tomato. Should we do IMDB rating, well, let's do metascore, cuz it's out of 100. And maybe it would give us a little bit more granularity. But I think when we're actually doing this, we would kind of think through about and do research about like what each of these scores taken into consideration.
[00:01:00] And then, IMDB rating is probably all like general audience. I think metascore, I'm sorry, I think rotten tomatoes takes into account critics. I'm not quite sure what the difference is. But when we're actually doing the synthesis actually a project for us, we would take that into consideration to make a decision about which one to use.
[00:01:20] But right now, let's just go with metascore. Cuz it's probably the easiest to clean. So on one end is the middle score and so for this one, we wanted to be, in the Y axis we wanted to be the total number of wins. And so let's think about that.
[00:01:44] That means that this is a, it has to be bin, right? And,
>> Speaker 1: Then the y, so the x-axis is a bin. And let's, I'm going to look up on how to do a histogram, with Vega-Light. Let's do that. So there are some, look at that IMDB rating they've used this dataset too [LAUGH].
[00:02:27] Okay, so what we have to do when we want a bin is apparently all we have to do is say bin is true. We have to say the type and metascore type is quantitative which is what we figured out before.
>> Speaker 1: And we figured out that the field we want is metascore.
[00:03:02] And then in terms of the y, we're aggregating, so we're aggregating the movies, but I think we're going to be aggregating by a field and that field will be the number of wins. And we have to calculate the number of wins. So I think what we need to do is aggregate by count.
[00:03:35] And we need to,
>> Speaker 1: By the type, again, the number of wins we agreed was quantitative. And we need to be able to aggregate by count for only the field of wins. So let's give this a try. Actually, is this going to work out?
>> Speaker 2: You need to pull wins from for drive
>> Speaker 1: Well, that, but also when you have something that's bind, it's going to be the number of movies. It's not going to be the number of wins. So it might have to be that we, I think this is one of the things you suggested, right? Coloring by the number of wins or something.
[00:04:29] And so, it might have to be like a quantitative, like one of those sequential scales that go from lighter to darker depending on how many wins. But now I'm actually wondering should this be a scatter plot?
>> Speaker 2: It seems like we could, this is going to represent the number of wins just fine, but if we want to represent the wins by category We'll have to add the array.
[00:05:02] It seems to me like this will work for our original task of producing a histogram, because winds will be our y axis, so we'll get the total.
>> Speaker 1: Well, so the wings, remember, with a histogram. The range is going to be the number of records that are in that bin.
[00:05:21] So if we want to have the y axis be the number of wins, that's not going to be the same. That won't be a histogram anymore. So we can only do what we want to do probably with a scatter plot because both of them are quantitative.
>> Speaker 1: Let's give this a try anyways, and let's see if we can map the number of wins to colors.
[00:06:03] Let's try and do that.
>> Speaker 1: Yes, so let's try and do bin still and then aggregate is count. But let's try and see if we can do a color. So we can do color. If you look at documentation, I think you can do, one of the things that you can do is
>> Speaker 1: set color as
>> Speaker 1: So for stack bar charts, you can take a look and they'll use color. And you can say, the field,
>> Speaker 1: Would be the number of wins. And the type would be quantitative, and let's see how they draw that and let's see if that would be a good chart to explore our question.
[00:07:22] I think if not, then what we would want to do in a real situation is try out a few other charts that might reflect this data better, but I think if not let's move onto a few of the other charts. So let's do this together
>> Speaker 1: so if you go into observables the way that we would.
[00:07:48] So the first thing we wanna do, this question was exploring if there is any relation. The question is any relation between number of wins and popularity.
>> Speaker 1: Between number of wins and popularity. So the very thing we're going to have to do is get the data and for each of the data we want to just pull out the number of wins.
[00:08:33] And then, we want the metascore. So let's do that together.
>> Speaker 1: And I'm going to call it data, let's say winsMetascores. You can call it anything else you want. And for this exercise, I'm going to use LoDash, just because I'm the most comfortable processing data with LoDash. But you can use anything you want, Vanilla JobScript any other utility library.
[00:09:10] And for this one I think all we have to do is map the data and then get the number of wins. So that should be some regex, I think, so we just want to match wins is equal to and I'm going to need your help with regex, d.Awards, is it awards?
[00:09:42] Yeah, okay, so d.Awards, and then I'm going to copy over one of these as an example. So the string looks like this, d.Awards.match and the regex should be.
>> Speaker 1: Let's see, I want to grab. And so that should be in the group. So that should be in parens.
[00:10:13] And I want to grab whatever digits, right? And then wins, after.
>> Speaker 1: That should, do you think that would work? That should work maybe, right? And then let's see, and I don't need that to be case sensitive and I don't need that to match globally because there's only one place that I want it to match.
[00:10:42] So actually let me make it digits that I want to grab and then win. So let's see if that works at all. And I am going to inspect element, so that I can see what is console logged.
>> Speaker 1: Let's see if this works. Ooh, cool, that did work, look at that!
[00:11:12] So then I want, I'm going to say let wins of equal to either, well equal to, so the one that I actually want is the index one. So that should then get me. Wow, look at some of these that 173, 93? So actually, can I make a suggestion?
[00:11:43] A change to this chart. So I think as soon as you say something that's relation, that probably means a scatterplot. So let's try and turn this into a scatterplot instead. And so, let's say, let's also make sure that the wins is a integer. Actually, hold on. So let's say wins, make sure that wins is a number.
[00:12:15] And then the score, the metascore should be pretty easy. Hopefully, it should just be d. Turn that into an integer also and then. Did I spell that correctly, d.Metascore with a capital M. And let's see what that gives. Cannot read property one of node, so that's good. Okay, so sometimes there's no matches it seems.
[00:12:44] So what I'm gonna do is say wins is equal to, either they match something, and if they do, let me get the number and if not, if there's no matches, then let's just make that 0.
>> Speaker 1: And there we go. Wow, one of them has 111! Just because I'm curious let's also, we're not gonna use title, but let's just also grab it.
[00:13:18] 111, well Titanic, apparently. So [LAUGH] and this is fun and I hope is fun for you too and look at that 111 wins and 75 metascore. That's really interesting, okay, so how is this so far, this part is this makes sense, right, the data part? Okay, so now let's figure out the scatterplot together, and actually, I think a scatterplot should be quite easy.
[00:13:55] There we go.
>> Speaker 1: Let me pin this, so that you can see that code. And then what I'm gonna do is, so to use Vega-Light and observables, they have pretty good integration. All you have to do is say vegalite and then you pass in, I believe an object.
[00:14:19] The very first thing you need to do is tell it a data source. So you say, in the object, you say data and you say, in this case, we're passing in the JSON values themselves, so say values. And then we called that data wins metascores. And then, if we were to do a scatterplot, it's going to be.
[00:14:51] Oops, sorry. It's going to be a bunch of points, right?
>> Speaker 1: And then, in terms of the encoding
>> Speaker 1: The x axis, we said earlier, was going to be the metascore. So let's say x axis, and remember we just need type, and that is quantitative. And we nee the field and we decided that was the score.
[00:15:29] And then the y axis we said the type is also quantitative. And the field is the number of wins.
>> Speaker 1: Wow, that's cool.
>> Speaker 1: There's something that has 200 plus wins. So one of the things that I haven't quite figured out how to do with Vega-Light yet is have a hover interaction where you can see that individual node.
[00:16:06] So like I'm really curious about who has 200+ wins. And then almost 100% score. But I haven't quite figured out how to do. I'm sure it's somewhere in the documentation. I haven't quite figured it out how to look into it yet. But like this, what you think about this?
[00:16:25] Is this interesting? Yeah, I think there seems to be generally an upward trend and it makes sense, because the ones that don't like look some of them are blockbusters that have no wins and only 20 something percent score. And a lot of them are down here. A lot of the ones, I think, that are 80 or less are, in the, it looks like, kind of like 20.
[00:17:00] And that's still a lot. That still feels like a lot. But as soon as it hits 80s, look at. There is a positive correlation, so that's good to know. And look at how many wins they're scoring over here. I wonder if our view would be a little bit different this way.
[00:17:24] Yeah, if you want to look at it a little bit differently, look at how easy it was and Vega-Lite to like make that switch. Whereas like, maybe if you were building this out. Well, so to actually give context about why we're doing it in Vega-Lite actually before these few months when I started researching for this workshop and then I started thinking about how I want to teach data exploration.
[00:17:51] The way that I actually do data exploration used to be build something from d3 from scratch. I'll do exactly what you just did, where you draw some, I would sketch some ideas. I'll think about a question and I'll sketch some ideas, but I'll build something from debrief from scratch.
[00:18:08] And you can imagine that was really really inefficient, and it would take me like half a day. And then after half a day I might have something interesting, but oftentimes I would have something that's not quite interesting. And then, I would have this iterative process where I learn about the data more and more as I kind of iterate through it.
[00:18:25] But it still took a week or two for me to kind of find something that I want to grab. But as I was researching for this workshop and I started teaching myself Vega-Lite, so that I can teach you Vega-Lite. I'm really loving how quickly I can just sketch an idea and put it on to the screen, get the data in there, see the data, see the shape of the data, be able to switch these encodings really quickly, switch between charts really quickly.
[00:18:55] And it's actually really brought down my data exploration down from a week or two to a day or two. And so I'm really happy I did this. And I hope, even if Vega-Lite isn't where you go, and I suggest Vega-Lite only because it works so well. I think it really makes sense for web developers because it's pretty much in the JSON format.
[00:19:21] But even if Vega-Lite is not for you, there's R, there's Python, there's a lot of different options where you can do these kinds of data explorations.