Transcript from the "Marks & Channels" Lesson
>> Shirley Wu: So one of the first things that we were talking about was how there's so many more questions that we could be. The more that we look at this dataset, the more questions we have for it. One of the really good examples that was brought up was for example, these box office figures.
[00:00:22] I mean, this one's not. But for these box office figures, over time, over the years, over the last two decades, movie tickets have gone up in price. So how does that effect? We have to take that into account for the box office. For my examples, I've adjusted for inflation for the box office figures.
[00:00:49] But that's only like a partial stop gap for taking into account the rise of movie tickets and we can talk a lot about how to account for that. I think it's also a potential design decision if we decide to use box office numbers, but that then led into a really interesting discussion about like are there films that come out more in recessions or are they more popular?
[00:01:22] Do they make more at the box office? Or even like what kind of genres during recessions? And all of these are really great questions. And from that, what I want to emphasize is like this is a really, really simple dataset. It only has 200 movies. And in those 200 movies like there's only a handful of attributes, it's not even like one of those quote, unquote, big data datasets that have thousands or millions of rows.
[00:01:52] And even for such a small dataset, we start having so many different questions. So I think there's just so many different avenues for exploration. So I want to reemphasize the fact that one of the things that we want to focus on is kind of that's why listing all the questions are so helpful.
[00:02:12] And then from those questions like make sure to focus on only one at a time to not get distracted. And also once we finish with the exploration to really just grab onto and focus onto one or two themes that we want to build our visualizations around or if we want to build a series of visualizations, then a series of related themes that we can kind of like kind of tell a story around.
[00:02:39] Don't go for every single possible thing you can do with the data or at least don't build a visualization of every possible single thing. Build them around themes that makes sense. So that's one of the first things and the second thing that I wanted to make sure to cover was about data sources.
[00:03:01] So a really great question that came up about like where to get data, because there are really good data sources out there like with US Government Census. There is like a different news organizations, I think like ProPublica is one of those that over the years have kept a GitHub repo of all of their data sources and there's all of these different.
[00:03:30] Yeah, so I think I'm not sure if they're up to date or not. But there's all of these really great different resources. But the thing that I really want to emphasize is that if you are trying to build a visualization, most of the times how I go about it is instead of having a dataset and trying to build a visualization around it.
[00:03:56] I recommend find something that you're interested in. Find a topic that you're passionately curious about and then go out, and find the corresponding data set for it. And I say that, because a lot of times, if you do have that interest, if you do have that curiosity, that will kind of first of all, you know that dataset a lot better like you'll probably understand it a lot more and you'll have better questions for it.
[00:04:27] But also that excitement will also carry you through the whole data visualization process. It's quite easy to get distracted, cuz it's a very long process where you have to data design code like you might get discouraged and you point of it. But if you have that excitement, because you're trying to build this visualization to answer one of your curiosities, one of your questions.
[00:04:50] If you think of yourself as your the primary consumer of your visualization, that really helps motivate you along the whole process of building the visualization. So those are the two things I wanted to make sure to talk about. And now that we've talked about it, let's talk about the design.
[00:05:16] So I want to talk a little bit about how to translate from the data, the data exploration we did, to the design. And the way that I like to think about it is when we start thinking about the design, concentrate, always, always concentrate and keep your take aways of themes that you want to communicate across.
[00:05:36] Always keep those in mind.
>> Shirley Wu: And then when you're thinking about that, that kind of like sentence or two, start thinking about what does that mean in terms of the data? Does that mean showing individual or aggregated elements? Which attributes does that mean? If, for example, we wanted to show that if the message we wanted to kind of communicate across was the seasonality that would be kind of the release dates or so.
[00:06:10] Start thinking about just those attributes and elements that we want to show, what does that mean. And then the number one thing about data visualization is mapping those relevant data and data attributes to visual elements. And to do that, I want to kind of go through a little bit of I guess guidelines to help with mapping data to visual elements.
[00:06:44] So one of my favorite ways to think about that process is this thing called Marks & Channels and this is my own definition. The way that I think about Marks & Channels are Marks are where you map an individual or aggregated data element to a geographic element. So map individual or aggregated data elements to marks and then map data attributes to channels.
[00:07:22] And so what marks are, are kind these simple geometric shapes. So they could be like points. They could be like lines, areas. And so I like to think of them as usually I'll map a one of those. Let's say, one of those movies to one of these points or one of these if we aggregate a bunch of movies together I might map them to an area or something along those lines.
[00:07:52] So that's how I think about marks and these examples are from a really, really great book from Tamara Munsner who's a professor over at University of British Columbia. And she has this textbook called Visual Analysis in Design. And there's a whole chapter on marks and channels there. And that was one of the most illuminating things I feel like I've read about visualization design.
[00:08:18] So check out at least that part of her book or any of her presentations about marks and channels. And channels are anything from, the examples that she gives are positions, color, shape, tilt, size. And so these are kind of, I think of them as how I map my attributes, so dates or metascores, etc.
[00:08:46] And I map them to one of these visual channels. And the way that I kind of organize them in my head is that for quantitative data types, I tend to map them to positions. So any of these positions along the horizontal axes, positions along vertical axes or maybe even both x, y positions.
[00:09:10] I also like to map quantitative attributes. Quantitative data types to size. So maybe to length or area. I think this is ordered in terms of effectiveness. I think length is pretty effective area and volume is a little bit hard to compare against each other. And we'll talk about the effectiveness of these channels in the next slide.
[00:09:37] So position and size. And color, color if they're sequential, so if they go from lighter to darker or diverging, so if they go from one color, meet somewhere in the middle and then another color. You'll see that done a lot in heat maps or in chloropleths in maps when you want to color the intensity of something.
[00:10:04] I do not recommend rainbows. [LAUGH] And there's a really great, so yesterday I mentioned this blog, the Datawrapper blog. And I mentioned yesterday, they have a great series that are like, what to consider when doing something, something. And I want to mention this one, what to consider when choosing colors for data visualization.
[00:10:33] And they'll talk about all of those considerations, if it's something quantitative or something categorical. How to improve them, etc. So this is a really great read. And I also want to mention, I actually read through quite a bit of what they talk about, like what kind of questions to ask, what questions to ask when creating charts, etc.
[00:10:58] There's some really great resources in the Datawrapper blog. So make sure to check that out also. I'm not sponsored by them, I just really like them. [LAUGH] I realize I've been plugging them a lot. I love the work they do. It's a lot based on academic research. So I like reading through them.
[00:11:21] And I think they make me a better designer, because I'm not a designer. So I'm trying to teach myself. So for categorical data types, I think shapes. So shapes are really great, color, discrete colors. I think I saw sketches that used both shapes and colors earlier. I put one in there myself.
[00:11:49] Textures, sometimes I'll use textures. There's a really great library called Textures.js that kind of gives you some basic cross-hatches or little dots. I wouldn't personally recommend anymore than two or three. If you have less than two or three categories, I recommend this. Anymore and I think it gets way too overwhelming.
[00:12:16] But sometimes I'll find myself using textures. And then temporal animations, I put that in there. Even though it's not in here, sometimes I'll use animations to denote the passage of time. So if I have something that's, for example, for the movies maybe I'll animate movies in depending on their release year or something.
[00:12:46] And so, that's an option that's really unique to the web. And that's something that we can do in the web. And I think definitely don't do animation for the sake of animation. But I think one of the places that really makes sense is when showing the passage of time.
>> Speaker 2: By Sarah Drasner.
>> Shirley Wu: By Sarah Drasner, did she do one for you or-
>> Speaker 2: Yea, she's got a in SVG animation course, which covers GreenSock, but she's coming to do a version two soon, so she'll be-
>> Shirley Wu: I'm gonna plug Sarah right now. Sarah Drasner is where I learned so much. Sarah Drasner and Val Head, I did one of their web animation workshops.
[00:13:47] And I feel like that was one of the things that changed my visualization game last year. I think it had a huge impact on how I think about visualizations. So GreenSock, SVG animations, Sarah Drasner. I'm a huge fan of Sarah. I really love Sarah. So a plug there.
[00:14:08] So yeah, we just talked about marks and channels. Let's kinda go together a little bit and talk about just these kind of really simple ones and what marks and channels are being used here. You might have noticed that this is very similar to how Vega-Lite structured their kind of, I guess, settings.
[00:14:29] And when we were doing it in Vega-Lite, we were telling it marks, and we were telling it encodings. So you can kind of think of that as the marks and channels here. And so if we're looking at the bar chart, the marks would be,
>> Shirley Wu: You can, you can.
>> Speaker 3: Rectangles.
>> Shirley Wu: Yeah, rectangles, bars, the channels here, x positions, y positions or either x or height, or the way the that Tamara describes it is the marks are bars. And the channels in the x-axis are categorical. And in the y-axis is quantitative.
>> Shirley Wu: For this one, the marks are points.
[00:15:17] And again, there's an x and y direction, in this case both of them are quantitative. This is kind of similar to what we were talking about before with different chart types. This one, she added another channel and so she added color to denote category of a potential data element.
[00:15:40] And then, finally, she added size as another channel to denote something quantitative.
>> Shirley Wu: So when we're talking about these channels, we just talked about a lot of different kinds of channels. But obviously there's Differences in the effectiveness of each of these channels, how we process these, because at the end of the day with visualizations what we're trying to do is comparisons.
[00:16:15] Comparing between different points in the data. And so to do these comparisons, some of them, some channels are going to be more effective for doing comparisons than others. And so on this lefthand side, we have the channels, she calls them magnitude channels. I think of them as like the quantitative data types and so she says the ones that are positions on common channels are the most effective.
[00:16:47] The ones in online scales, I don't quite know what sort of situations these are used but positions on online scales are less effective. She goes all the way down to, for example, for when there's quantitative like using color isn't quite as effective as, say, using position but if you do want to use color, use luminance from lighter to darker or saturation.
[00:17:17] Those are effective. And then at the very bottom, the least effective is using curvature and using volume. It's harder for us to be able to compare areas or volumes and see kind of exactly how different they are, relative to each other. And so this is kind of that scale and, on the other hand, with the categorical attributes, you can see my color is actually quite high on there unlike when it was used for the quantitative with categorical colors really, really effective.
[00:18:02] And this one was really interesting. So motion, I was like, what does the motion mean? And I learned that we, as humans, perceive motion really well. Our eyes go to anything that's moving and kind of a crowd of things that's not moving, our attention is drawn there right away.
[00:18:25] But one of my beta testers told me that even though we detect motion really well, we aren't really great at detecting subtle differences in the motion, so motion is not great if you want to do it in one of these quantitative Attributes of like if you want to be showing like a little bit of motion versus a little bit more of motion and that's not great but if you just it's either is motion or isn't motion then it's really great.
[00:18:54] So I learned something new there that I really liked so and then she says shape is a little bit less.
>> Shirley Wu: Effective than the rest of them. So I think this is really great guidelines and especially helpful because this is how you can map the data attributes that are the most important.
[00:19:24] To these, and to these channels that are the most effective. So like, for example in our earlier exploration. Let's say I'm gonna come back to the seasonality example. But like, let's say seasonality to us is the most important thing that we want to be communicating across. Well then in that case let's Let's put the release states, let's map that onto a position on common skill.
[00:19:54] So let's map that to the x-axis. So something like that. But for the sake of, Argument I'm maybe the metascore is something that we're not we want people to know about it but it's not the most important thing for us to communicate across. Maybe that's something we map to the color saturation or color luminescence down here because even though we want them to know about it We don't want them to concentrate on it as much as say the seasonality part.
[00:20:28] So this is kind of the scale of effectiveness and what you can keep in mind as your figuring out which parts of the data and how to map that data to channels. And it's just a really great guideline. So when I think about marks and channels I think of it as a one to one mapping of data to channel.
[00:20:52] So that dates to an x access or the genre to a categorical color etcetera a one to one mapping But, you can have multiple mappings of a channel to a mark. So for example maybe you do have those points and you can map, in the latest example, you can map X position, Y position, size, color, excetera to just one of the bars But, do not ever map multiple data attributes to the same channel.
[00:21:27] Like that could be potentially really really confusing, right? Like if you decide that genres are gonna be categorical colors, but also actors are gonna be colors also. That's just gonna be like you're It's going to be an over, not an oversaturation, but it's going to be confounding to be mapping those two to the same channel.
[00:21:52] So that's how I think about it. One to one of data to channel. Many to one of channel to mark