Data Visualization First Steps Scatterplot Solution
Transcript from the "Scatterplot Solution" Lesson
>> We've now gone the opposite direction from making a grouped aggregated chart to lots of teeny little points on our chart hopefully. Let's see what we got. And again, there are a lots of different ways to do this. We're getting more and more freedom as we're getting more comfortable with these different channels, and mappings to the property in our data.
[00:00:23] So, let's start out with, our first task was to get the width on x-axis and the height on the y-axis. Yeah, so by this point we're probably used to, and we're gonna put this right in our call to plot.dot in the options here. We're probably used to this syntax now, we're gonna put in the name of the channel that we want and then the name of the property, so x we're gonna want the width, right, of each of these data points which now we have.
[00:00:53] Thanks to our wrangling from before. Otherwise we would have to be splitting it up as we go here, and then height on, with quotes height capitalized on the y. if I can spell it right, height, is that right? It looks weird, okay, there we go. All right, so now we've got some dots on our x and y-axis, and we're not done yet, but already we can see some patterns emerging.
[00:01:25] Does anybody notice anything jumping out? How about these diagonal lines? Since this is yeah, so these diagonal lines jumping out any guesses what this is telling us?
>> So, there's it's showing us that there's a lot of like similarities. There's some kind of pattern here, right?
[00:01:51] There's some kind of pattern in the data set. What this is showing us is that essentially you've got kind of a constant ratio between width and height. So as width goes up in certain patterns, the height kind of scales accordingly, so this probably corresponds to certain aspect ratios that we're noticing in the data set.
[00:02:12] Ratio of width to height. So there may be that they're not all the same with but or height, but they're kind of emerging as the same ratio which is showing us that kind of constant scaling factor. In any case as we continue to add more detail to this plot, we'll be able to confirm that or disprove it if I'm completely wrong which is definitely possible.
[00:02:34] So let's see what else can we do here? We can use some of the other channels to encode different properties, for example, the device category. So what did folks use for the device category mapping here? Before we use the fill, so we could try doing that again, for example, not the only option, but let's try fill device, okay?
[00:03:03] So now we get some more information here. We can see as expected that the smaller values seem to be our mobile category. Believe yeah, that was that orange one there. And the bigger ones are the blue for desktop and we got some reds in the middle for tablets.
[00:03:23] Cool, cool, cool. But a lot of these dots are overlapping, so it's a little hard to see. So, we might wanna also be using fill-opacity to make these transparent. So one thing we could do, for example, we could set fill-opacity to a constant value like half, like 0.5 so that I can see where there is overlap.
[00:03:47] So that's 1 option that I have. Or, I could use the opacity to actually represent some meaningful number in the data. Like for example we said it would be interesting to use that sessions value that we saw before. So what we'll notice now is that essentially the darker a dot is, the more session We observed with that resolution in that device category.
[00:04:15] So we can see now the way that they're clustering, basically the darker clusters are similar resolutions and the darker individual dots are where we're seeing a lot of sessions from the same resolution and the same device category. Okay, what else can we do here? Well, right now all of the dots are the same size.
[00:04:34] But there is a channel called R that stands for the radius of these dots, which we could also use to represent, let's say the session. So maybe instead of the opacity, I want to scale up the size for the different number of sessions. Let's try this. Shall we?
[00:04:51] If anybody who tried this already you'll know what's about to happen. Whoa, [LAUGH] All right we've got some huge dots and some little tiny ones that are really hard to see. So in this case what we're seeing is that there are some really big discrepancies in our data set.
[00:05:11] We have some very small numbers of sessions and some very big ones and what I might want to do in this situation is scale down the kind of differences between those huge numbers and the small numbers. So sometimes when we find that like, information is getting obscured like this, like right now the big dots are so big that they're completely dwarfing most of the other dots.
[00:05:39] What I can do is also play with the way that I am scaling the mathematical values that I'm actually visualizing. And so, for example, in the reference answer, you'll see that what we could do as one option to get around this, is to instead of plotting the sessions value itself, we could actually transform it into the log of the sessions.
[00:06:09] So this way, if I do this now I get a little bit closer sizes, between my different dots, so that I don't completely lose information about those smaller values. And if we want to duplicate that again on the fill-opacity, and have fill-opacity also encode sessions, now we can see with both the opacity of the dot and its size, how many sessions we're seeing.
[00:06:39] So now we can sort of see where the focus points emerge. Now what you might notice, is that in addition to our log transformation here that we're doing within a function passed into the our channel, you might have noticed this little line down at the bottom here, which says that we want our opacity scale to also be a logarithmic scale.
[00:07:04] So these are two different ways of doing this in this case we're passing in this log type to our opacity scale and basically telling plot to do that logarithmic sizing for us. And if you'll notice if we take that out so if we comment that line out, what happened is once again we lose a lot of the detail in the lower range of those values because the high values are so much bigger.
[00:07:35] So essentially now we're losing a lot of information on our chart without that log scaling. So you can play around with these type of options as well to create a visualization that highlights again, whatever the salient information is that you're interested in. And make sure that you don't obscure away those interesting details by having the wrong kind of automatic or default scale values that plot is just going to choose based on the the input domain of your data and the output range from in this case 0 to 1 on capacity that you have available to you.
[00:08:14] So just a couple of notes on making sure that you are visualizing as much data as you intend to be visualizing and not unintentionally obscuring some of it behind those kinds of, off balance datasets. Okeydoke, let's see what else did we say we wanted to do here. Tooltips right?
[00:08:38] So we could do that with for example just the number of sessions or you might have wanted to also display the resolution, so we could have contaminated those strings together whatever you would like their questions. Yes, exactly. Somebody guessed at the aspect ratios of these diagonal lines. Exactly and then there's some vertical lines emerging too.
[00:09:07] Where we're seeing that folks have their browser is set to a particular width that they use at lots of different heights, and he guesses what's going on there. So we might need to dig in to be sure or maybe even do some more intense user research to collect more data.
[00:09:28] But I would guess that maybe folks have it like split to half their screen or they have it kind of minimized to the default, whatever window width it is, and then they're scrolling it up and down and dragging it longer and shorter, perhaps. And then we can notice that, for example, in our desktop, blue dots, we see a lot of aspect ratios where the screen is wider than it is tall.
[00:09:57] Whereas on mobile, we see kind of the opposite pattern. These yellow dots are kind of clustered at aspect ratios where the height is more than the width, which kind of makes sense, right? A lot of the time we're looking in portrait on our phones, but rarely on our desktops right we're not really spinning them around so much.
[00:10:16] Cool so we're already finding some insights and some patterns in this data which is the whole point.