Data Visualization First Steps Grammar of Graphics
Transcript from the "Grammar of Graphics" Lesson
>> The tool that we're gonna be using today is a library called Observable Plot. This is, again, by the creators of Observable and D3. So it is meant to kind of go as a complement to libraries like D3, again, sort of on a higher level of abstraction. You can read all about it at observablehq.com/@observablehq/plot.
[00:00:22] It's also open source. You can see all the source code on GitHub and dig into it and even make contributions or raise issues if you see any problems. But essentially, Plot is based on something called the grammar of graphics. So this is a system for understanding the abstractions of data visualization.
[00:00:45] That folks who have spent their career developing notions of data vis have come up with as a way of kind of breaking the task of visualization down into some core concepts. And a few of the core concepts that we're gonna explore today are these four that are up here.
[00:01:06] So on the one hand, we have marks. In a grammar of graphics, you have the notion of different types of marks that you can draw on the screen. So this might be, for example, ticks like in that first test jobs visualization we saw before. Or it might be bars or rectangles, like in a bar chart.
[00:01:26] It might be dots or other symbols, x's, what have you. So these different types of marks, you can then compose and put one on top of the other or use different marks for different features of the dataset to convey meaning in different ways. The other notion that we already talked a little bit about are scales.
[00:01:47] So these are, again, those mappings of essentially how we get from abstract data space to concrete graphical space. There's also a notion of transforms, where these are sort of similar to what we mentioned in the data wrangling side of things. These are transforms that might do things like tell us the average over a moving window, let's say, if we have some time series data.
[00:02:10] Or the sum of all the values that we care about in this dataset, etc. So we're gonna take a look at a couple of those, especially for grouping data, aggregating it together. And also for kind of separating data out when we have a continuous space and we wanna chunk it up into different bins, let's say.
[00:02:30] Then there's also a notion of facet. So speaking of separating data out, we'll look at some of the use cases where it can be really useful to do what's called faceting. Where facets are essentially small little versions of the visualization that we're trying to make that are focused on a particular subset of the data.
[00:02:53] This is also called small multiples. So marks, scales, transforms, and facets, these are a few of the kind of building blocks, the conceptual building blocks that the grammar of graphics uses. And that plot employs or implements so that we can compose custom visualizations out of these core concepts.
[00:03:11] So we're gonna dig into these through our hands-on exercises, yeah.
>> Is tidy.js basically R's subset of tidyverse packages?
>> Is tidy.js R-
>> R's set of tidyverse packages.
>> R's set of tidyverse packages. I am not familiar enough with tidy.js to want to answer that without saying anything incorrect.
[00:03:39] So the tidy.js package, I believe, is also open source. So I would definitely think that if that isn't covered in the documentation there, it is certainly a question that can be asked of the maintainers there. And there are also quite a lot of Observable notebooks around tidy.js, so it's probably something we can discover from the notebook ecosystem as well.
[00:04:22] So we're gonna look later at some examples of how to use Plot with Observable, which is what we're gonna be building with today. But also how to use Plot just in your vanilla JS projects or in projects using React, for example, and other frameworks follow essentially the same principle.
[00:04:41] Okay, shall we jump in? Cuz I've been yakking for a long time and we should get our hands on some data. So today, we are going to have a little intro. It's done, yay, I can stop talking now. We're gonna work through a few different projects to wrap our heads around some of those core concepts, as we said.
[00:05:01] The first one is going to be looking at some of those test jobs in CI. And so by asking ourselves, are tests passing, and how quickly, and how often, we're going to be learning a bit about features in the dataset. And how we can map those to what are called encoding channels or different properties of the visualization.
[00:05:20] Then in our second project, we're going to examine some usage data, or mock usage data, to understand what kind of devices, what sizes folks are browsing with. And that is going to teach us a little bit more about aggregating data and splitting plots up into facets so that we can answer the right questions that we care about.
[00:05:44] Finally, our last project is going to be looking at some mock API response data, some API logs data, essentially. And by doing that, we're gonna explore how to add interactivity to our visualizations so that folks can explore the dataset themselves and get the answers that they particularly care about.
[00:06:05] So, essentially, how we can expose that flexibility to our users. And then finally, we will take a look at some examples for using Plot in different ways around the web and wrap up with some takeaways and some links for next steps. So all of the projects that we're gonna be doing are linked in these materials.
[00:06:26] If folks have been having any difficulty accessing the materials, please let me know. But each of them is going to be in an Observable notebook, and so that is what we are going to jump into.