Data Visualization First Steps Scatterplot Exercise
Transcript from the "Scatterplot Exercise" Lesson
>> So that is aggregation and sometimes very useful to answer these big picture questions, like do we have more users on mobile or on desktop, for example. But when we look at this aggregated chart, we're hiding a lot of information. By nature of aggregation, we're obscuring a lot of the details and the finer points in our dataset to just bring out the big picture of, okay, total number of users, which is bigger.
[00:00:29] However, earlier, we talked about things like, for example, for future development or for performance reasons. I might need to dig into the nitty gritty and ask myself, okay, based on my usage data for my site, what are the most common resolutions that we see? Doesn't matter if they're on tablet or desktop, right?
[00:00:50] Probably we're gonna have not so much overlap between mobile and desktop, let's say. What are the main widths, let's say, that folks are browsing our site in so that I can make sure I have really strong designs for those browser widths, for example. Or what are the aspect ratios that are most commonly used?
[00:01:11] Maybe the width itself doesn't really matter, but I wanna make sure that my site looks really great at that particular aspect ratio that we see people using over and over again. Or vice versa like, for example, which resolutions do we see folks using on multiple device types, which I would miss if I was only aggregating the data and breaking it down by device types.
[00:01:31] I would miss that pattern of, actually, folks are browsing really narrow on desktop. I don't know, maybe they keep the window open as a sidebar on their side of their desktop or something like that. So by digging into the details, we can answer different questions than the big picture questions that we can answer with those aggregations.
[00:01:54] So we wanna be able to do both. So we've just looked at how to aggregate data together using those group transforms. And there is an analogous transform in plot, analogous to the group y. The group transform works for categorical variables, like we said. There is an analogous transform called bin that works on continuous variables.
[00:02:16] So if you have, let's say, a numeric range, a numeric series of values that you wanna be aggregating by, you might wanna bin it into, I don't know, hundreds or thousands of intervals. So there is an analogous grouping for that. But now we're gonna move on from aggregation to digging into the details.
[00:02:36] So let's try to answer some of these questions like which are the most common aspect ratios, which are the most common widths and heights, whether across one device type or multiple. And we're gonna use a dot plot or a scatter plot is another word for that to try to answer this.
[00:02:53] So what we've got in our next exercise is kind of the basic Canvas, as it were, for a scatter plot. So this has a sort of preconfigured some general settings for this plot, like for example, we have a grid on both the x and y axes. And we have a certain value range for each of these axes that might not be the entire dataset.
[00:03:23] For example, there might be some resolutions out there that are more than 4,000, I don't know, 200 pixels wide. But in our case, we're just gonna look at the ones that fit in this span of the data. So this is all configured here, again, in the options passed into our call to dot plot.
[00:03:46] So again, we have that default width setup to make sure that this plot extends the full width of the cell. We are scaling the height based on the width so that it's kind of two to one width to height ratio. And then we're setting up the x-axis and the y-axis with grids and with a slightly constrained domain.
[00:04:07] Meaning, the set of values that we're gonna be visualizing from the original dataset. That domain of input values from the data space is gonna be slightly constrained from what we actually see represented in the data. And again, plot's default is to look at the values in your data and make sure it can fit all of them.
[00:04:25] So it will set up your axes accordingly. Here, we just wanna tell it to kinda cut off after a certain point. And there's one other thing in here which we'll talk about later around the opacity scale, which we're gonna be using for the next exercise. So your task here is to take this starting point and add some code into the call to Plot.dot that creates a scatter plot.
[00:04:54] So here we're using the dot mark, which as you might gather is essentially little circles. And we're going to be trying to set up the x and y axes or the x and y channels to convey the width on the x-axis horizontally and the height on the y-axis.
[00:05:14] And then see if you can play around with some of the other channels to visualize the device category. And last time, we looked at some users. Why don't we look at the number of sessions this time? So that we can see how many individual browsing sessions we saw for each width and height in our dataset.
[00:05:33] And then, again, if you want to get really ahead of the game, try to add some tooltips to add a little bit more detail about each of the dots that we're gonna be drawing here. So this is now we're gonna stretch the plot muscles we've been working out so far and we're gonna try to map width against height on the x and y axes using this kind of layout as a starting point.
[00:05:58] But feel free to tweak it and see what changes.