Transcript from the "Introduction to Different Data Types" Lesson

[00:00:00]

>> Shirley Wu: I want to go through with you kind of basic chart types and when to use them. So that if at work you're ever handed a design with a chart visualization, you'll know if [LAUGH] it's the right chart for the job or not. And once we have that, I want to break down how to draw the chart with SVG in the browser.

[00:00:24] And then I want to talk about how to go from raw data to those SVG shapes that we're drawing in the browser. And then, [LAUGH] and then, fourth one on the list, [LAUGH] is where we'll be pulling React back in and using React to render both SVG and Canvas in the DOM.

[00:00:43] And then, finally, we'll talk a little bit about exceptions to the number four, and finishing touches to make a chart. So we're gonna get started with the content right now. To start, I think it's really, really important to, when we're talking about data visualization, to talk about the data types, your raw data types.

[00:01:05] And here are some of the types that I come across quite often. The first being categorical. And so one of the easiest examples of a categorical data type is, let's say for movies, movie genres could be one. Even movie actors could be a categorical data type.

>> Shirley Wu: The second is ordinal.

[00:01:29] I think of ordinal like categorical, but there's an order to it. So kind of the most traditional, I guess, example for that, it's t-shirt sizes. So small, medium, large.

>> Shirley Wu: Then there's quantitative. And quantitative, it could temperatures, degrees from 0 to 100. It could be for the movie example, again.

[00:01:56] It could be the meta scores of the movies from 0 to 100. And then temporal is to do with dates. Years, months, any sort of data that are dates. And spatials, so cities, countries, regions. And these data types are really quite important when we start talking about charts and what charts could be good for what.

[00:02:27] So I want to go through some of the basic chart types that I see often and what to use them for. So bar charts are probably what we learned in middle school and high school. And so bar charts are for categorical comparisons. And so I think of it as, so I'm calling it domain and range.

[00:02:49] And I just kinda mean an input, one axis that's the input, that's independent. And the other axis that's the output that's kind of dependent on that input axis. So domain for bar charts are categorical. And range is quantitative. So for this particular example, it's a bar chart of movie genres on the x-axis.

[00:03:16] And then on the y-axis is basically the number of movies that have that genre.

>> Shirley Wu: And then histograms, I have to admit, I never really thought about the difference between bar charts and historgrams until I was preparing for this course and the next course. [LAUGH] I was like, they're all just bar charts.

[00:03:42] But clearly, they're not. [LAUGH] I mean, I hear about bar charts and histograms, but I never thought about the difference. So histograms are really good for categorical distributions. And what I mean by that is that the domain typically are quantitative bins, so it's some sort of data attribute that's quantitative.

[00:04:05] And it gets lumped into these bins. And the range is the frequency of elements that go into those quantitative bins. So, for example, that could be on the x-axis. It could be the movie metascores. And then that could be bucketed by let's say every ten points or something like that.

[00:04:29] And then the range, the kind of height, is the number of movies that fall into that bin of metascore. So that's a histogram. And then scatterplots are for correlation. I find this one extremely useful for data exploration, although I don't think I see it as much as a final chart type.

[00:04:52] And so scatterplots are basically two data attributes and the relationship between their quantitative values. So this particular one is also a movie data set. And it's trying to look at the correlation between if the number of votes for that movie on IMDB has a correlation with the number of votes, sorry, a correlation with the rating of that movie on IMDB.

[00:05:16] I don't know if that's a very meaningful correlation, but [LAUGH] that's the example.

>> Shirley Wu: And line charts are quite useful, I think we see it all the time for stocks. But they're for temporal trends. And so one, the domain is temporal and the range is quantitative.

>> Shirley Wu: I think we see trees quite often, too.

[00:05:47] They're for hierarchy. And in particular, they are for hierarchy that have parent-child relationships across multiple tiers

>> Shirley Wu: And this is my personal favourite, node-link diagrams. And so these are to show connection. If you want to see the relationship between multiple entities, this is a really good one. And the difference between this and the tree is that these are good for cyclical, sorry, cyclical links.

[00:06:17] And then the other one is for one directional. The trees are for one directional relationships. And I think the reason why I really like this one is because for node-link diagrams, if you want to show in a network who's more central and who's more on the outlier, this is really easy to show that.

[00:06:42]

>> Shirley Wu: And a little bit about maps. So the type of map size you see quite often is these chloropleths for spatial trends. And the domain for this is spatial regions and the range is quantitative. And I think I want to talk a little bit about what they're best for or not good for, cuz I think it's one of those could be misused sort of charts.

[00:07:12] And the place I'm getting this from is Datawrapper Academy, which if you've never heard of them, please check them out. They're really amazing. Well, their product is for journalism and helping build interactive visualizations really easy, making that really easy for journalists. But they have this really good section called Datawrapper Academy.

[00:07:44] And one of the things I really enjoy about them is they'll write things like, what to consider when creating choropleth maps. And they'll talk about when is the best time to use them, how to make them better, what are good examples, etc. And they'll have those for a few different chart types that are easily misused.

[00:08:00] So I'm only going to be talking about two of the ones that they've talked about today, but if you have the chance, please check them out. So for choropleths, they are really best for showing regional patterns and showing only one variable in terms of the color. And they're best for showing relative data, so normalized across the population, or else you're just going to be showing population.

[00:08:28] I think this particular example is unemployment rate. It's colored by unemployment rate by counties, I think. And they're really not good for subtle differences in the data, just because the color. People can tell the difference between different hues, but if it's really subtle changes in hue, people can't tell them as easily.

[00:08:49] So not good for subtle differences in data.