Data Visualization First Steps Setup Project Data in Plot Exercise
Transcript from the "Setup Project Data in Plot Exercise" Lesson
>> Okay, so at this point, we are going to jump into our first project, which is called are tests passing? And if you were able to click that data visualization for steps link at the top of the original notebook, you should see this sidebar here. And so right now, we are in the notebook Project: Are tests passing?
[00:00:23] So if anybody had any difficulties accessing that, please let me know, but hopefully, we're all literally on the same page. All right, so this is going to be our first exposure, as it were, to working in an interactive Observable environment. So what we're looking at is an Observable notebook, which means it is a combination of code right up here, in this case, text and some headings and things like that.
[00:00:54] And visualizations, which is what we're gonna be building. This is the starting point, not the ending point. So what we're gonna be doing is, filling out some to-dos essentially in this notebook. And I'll walk you through kind of what that is going to feel like. So first of all, a little context around this project.
[00:01:17] We're going to be visualizing some data from the GitHub Actions API. If folks haven't worked with GitHub Actions, these are essentially kind of automation tasks or processes or jobs, as they're called, that you can set up to run on your GitHub projects. For example, to run tests in continuous integration, among many, many other things.
[00:01:44] But what we're gonna be looking at are, again, as we saw briefly in the intro, some data around tests running in CI as GitHub Actions. And so what we've got here in Observable, we can attach data files to our notebooks, as I mentioned briefly before. What we've got is essentially a JSON file already attached to the notebook.
[00:02:09] That represents the data that you might get if you had downloaded some data from the GitHub API about jobs running on the open source public SvelteJS repo, just as an example of something publicly available. And in this case, we've actually already done a little bit of wrangling on this data to exclude or filter out some of the data that we don't need for this exercise.
[00:02:36] And to do a little bit of processing to make it easier to visualize. In the next exercise, we're gonna do a tiny bit of data wrangling ourselves. But for this one, we've assumed we've already got that taken care of, and so we can jump right into the visualization.
[00:02:50] So what we're doing here in this first cell is we're parsing that file attachment using a built-in function from Observable. And we're just telling Observable to read it as JSON. So then we get essentially this testJobs object, it's an adjacent array of objects that have information about different runs of different test jobs.
[00:03:53] So for example, I can add a comment or I can say, I don't know. Let's say just get the 0th indexed element in this testJobs data. And once I've edited something in one of these codes, I'll see a little blue play button show up on the right. So I can click that or press Shift+Enter as a handy keyboard shortcut that then you're gonna be doing all day.
[00:04:22] So we'll get used to it. But I can either click this play button or press Shift+Enter to update the, of course, this is an interval now [LAUGH] to break the code, and then press Shift+Enter again to unbreak it. So this is what we're gonna be doing today. We're gonna be editing some of these cells to put in our own code, and then reevaluating or rerunning these cells to see the value update.
[00:04:45] So we'll take a look at this in our first actual exercise. So here we've got our first plot visualization. This is a very simple and not yet very useful visualization of this test job data that uses a mark. So remember we said marks are these different elements that we can draw on the screen.
[00:05:09] In this case, we're using a mark called a cell, which are these little kind of rectangles here, similar to cells in a table. And so what this plot is doing is essentially drawing a little cell or a rectangle for each row in the dataset. And each run number kind of gets its own positioned, in this case, column of all of the different jobs, the different tests that ran in that particular numbered run.
[00:05:39] And here, the run numbers are sequential. We could, for example, be visualizing something else like timestamps here, but here, we're just visualizing them run by run. So let's take a look at the code that makes this visualization tick and understand what's happening. So this is how we create a new plot.
[00:05:56] We call Plot.plot. And then inside of the call to the method lowercase plot, we're going to give it an object that specifies the options or the configuration, as it were, for this chart, this plot. So the first thing we're gonna wanna do is pass in an array of marks to tell plot which marks we want it to draw.
[00:06:20] In this case, we're using a cell mark. And we're gonna explore some different types of marks in our future projects, but for now, we'll just stick with cells. Each mark basically takes in the data that we want to draw the marks with. In this case, it's called testJobs.
[00:07:03] And it's going to say for each of those output channels, what is the input value that I should be mapping or sort of scaling to this output? So in this case, the run number value is gonna go on the x-axis, so that's these numbers running horizontally here. And then the name of the test job, which is called name in our dataset, is going to be on the y-axis.
[00:07:27] So that's why we have all these different names of jobs vertically here. So that's it for our marks array. That's the only mark we wanna draw in this visualization, but in others, you could have multiple different types of marks going on. And then we're going to have some just sort of general properties of the way plot is drawing its chart on the screen.
[00:07:49] So the width here, in Observable, we get a little automatic variable called width, which essentially scales to the size, the width of the cell, this sort of area here on the page. And then we have just some margin setup so that we make a little extra room for these text labels.
[00:08:11] We also have some information passed to the x scale to tell it to rotate the labels 45 degrees. So that's why we get these angled labels on the x-axis. So this is one example of a simple plot. Essentially, we've got the information for each individual mark. In this case, we just have one, the cell.
[00:08:34] And then we have some information about the plot as a whole and the scales that configure the different output channels that we are drawing information to. So let's talk about channels a little bit more. When we're working with plot, and again, this is inspired by the grammar of graphics.
[00:08:53] So this is a more general data viz concept than just plot library. You'll see this also implemented in other data viz libraries. When we're working with marks in plot, again, we are going to be expressing values in the data as visual properties of different types in our visualization.
[00:09:12] And those different types are called channels. So we saw, for example, we saw we have the x channel, which is essentially a horizontal position. We have the y channel for vertical position. And then if we dig into the plot documentation on GitHub, which is linked here, we can see some of the other channels that we have available to us, and there are a few.
[00:09:35] But a few that we're gonna care about here might be fill, which is essentially the color inside of the rectangles or other shapes we might be doing like a dot mark that might also have a filled interior. Or the fillOpacity, so the opacity of that inner shading. Similarly, we might have the stroke or the outline of each of those cells or other marks, dots, what have you.
[00:10:01] Or, for example, if we're using a line mark, then that would be a stroke. Stroke would be the relevant property for the color of that line. And similarly, strokeOpacity, so the opacity of that outline. We also have a channel called title, which essentially configures tool tips on each of the little marks, so in this case, each of the cells.
[00:10:23] So what we're gonna do in our first exercise is to use some of these other channels to make this chart more informative. Because right now, I am not getting a whole lot of information from these squares. So if we scroll down a little bit, our first to do here is to use the fill channel, so the fill color of each of these rectangles to visualize the conclusion status of each of the tests.
[00:10:49] So did they succeed, fail, or were they canceled? So what we're gonna need to do is take the column in the data that corresponds to that. Which if we look up here, we can see that column with the success or cancellation or what have you is called conclusion.
[00:11:06] And so that's what we're gonna need to map to the fill channel to get this going. So as our first exercise, take some time to fill out this part of the code here. Put your own code in here. Again, you don't need an Observable account to be editing this.
[00:11:23] But if you already have an Observable account or if you're interested in saving your work in your own copy or your own fork of this notebook so that you can come back to it later. Feel free to scroll up to the top and you should see a little sign in or sign up button at the top right.
[00:12:05] So if you're a little confused or stuck, take a look at the Hint button. And then if you wanna see one potential solution, there's also a button for that. So let's take a couple of minutes and try to map the conclusion column to the fill channel.