Check out a free preview of the full Data Visualization First Steps course:
The "Visualizing Analytics Data" Lesson is part of the full, Data Visualization First Steps course featured in this preview video. Here's what you'd learn in this lesson:

## Anjana walks through the solution to the wrangle the data exercise and how to use Plot's Group transform functions to aggregate data based on a categorical property to compute values in each category.

Get Unlimited Access Now

Transcript from the "Visualizing Analytics Data" Lesson

[00:00:00]
>> Okie-dokie, how did that feel? Hopefully, we all came up with a solution. There are several different ways that we could map this array, if you will. So a few suggestions here in the hints. So, yeah, we're gonna be passing in a function to array.map and we're probably gonna wanna use the split method on our resolution strings to get out those two different values for width and height.

[00:00:29] And then there are several different ways that we can convert strings to numeric values, but one quick and handy way is to precede the string with a plus operator for quick integer numbers like this. So putting that all together in the solution field, you can find one possible way to do this.

[00:00:51] There are several other ways of course that we could find. So let's just paste this in here. So, what we've got is we are first of all taking in each element in our dataset, so each row in this data as a datum. You could also for example destructure the value here to pull out the values that we care about, the different properties that we have they're totally fine.

[00:01:20] In any case, we're probably gonna wanna split out that resolution field on the X delimiter to pull it into, and here we're using a little destructuring to grab the two values of the array that's gonna result as width and height. And then what we're gonna return for each element that comes in this D element, we're gonna return a similar looking object that has the same device and resolution.

[00:01:47] It also has those width and height values. And again, we're using the plus operator here to convert those two numbers instead of the original strings. And then we're mapping the users and sessions over from the original again, using that plus operator to convert them. So, again this is one way of taking care of this yours might look a little different, feel free to share your solutions but in any case what we want to end on now.

[00:02:18] If we check out one of our elements and we can do that by clicking these little carrots to expand In the arrays and objects here we see we have the same strings for device on resolution that's fine. But then we have numbers for width height as separate columns or separate properties on each of these data objects and then users and sessions again are no longer strings.

[00:02:38] So that's the important thing that we're gonna be looking for in order to move on successfully to the next exercises. So if you got different data formats, if you have strings for any of these values, just try copying over that solution so that we have the data in the right format to press on.

[00:02:55] And draw some things on the screen which is our goal here. All right, any questions so far? Again, this is just some little lightweight data wrangling, it gets a lot more complex as we deal with more intricate datasets. But just a taste of the type of quick little transformations that we're gonna find ourselves having to do a lot as we're working with data in the real world.

[00:03:22] Okie-dokie, so I don't think there's too many questions. So let's move on to visualizing the data which is our whole point of being here. Okay, so the first thing we might wanna ask ourselves if we're looking at a data set like this is, all right, I have these different device categories; desktop, tablet, mobile.

[00:03:44] How many users are there that represent each of those device categories? And so what we wanna do is break down our data which is split up into all of the different resolutions observed for each of the different devices types. What we wanna do is aggregate that data together so that we get the total for each of the different device types.

[00:04:09] And as we mentioned a while ago, we have the concept in the grammar of graphics which is implemented by plot of transforms which are ways of transforming the data and often aggregating the data. In this case, we want to aggregate data by group based on a particular value in the data set in this case, the device property or column.

[00:04:37] And so the group transform that plot exposes and you can read all about in the link documentation there, does this type of aggregation for us? It essentially lets us specify a categorical property where a categorical property is one where the values are in distinct categories. As opposed to a continuous property where we have for example the number of users is on a continuous spectrum of numbers, so that would be a continuous property.

[00:05:09] What we're gonna do with the group transform is tell plot, okay, this is a categorical variable or a categorical property of the data that I want you to compute a grouping aggregation based on. And so that can let us do things like compute the sum, or the count, or the proportion of values in each category.

[00:05:31] So plot exposes a few different aggregation functions that we can use in a transform like this. So let's take a look at an example of an aggregated visualization. Not the most exciting one but we're gonna make it more exciting. So this is a bar chart that essentially shows the total frequency in our data set.

[00:05:51] So the number of observations in our data set of each of the categories. So we see a bar for our desktop that shows that that is the most frequently observed category, mobile being the next, and then tablet being much less frequent. So let's take a look at the code that we use to build that visualization.

[00:06:11] You will notice a slight difference from our last plot examples, so last time we saw we were calling plot.plot and then passing in a marks array which in our last example I had plot.sell. But we might have some other types of marks like bar for example. In this example we're gonna see a slightly different syntax where here we're calling the mark itself.

[00:06:39] So, in this case it's the bar Y, meaning vertical bars mark, in our previous example would've been plot.sell. So we're calling that and then after passing in all of the necessary values to that, which we'll look at in a second, then we're calling .plot chained onto that call to the mark.

[00:07:02] So these are two different ways that you can equally like valid ways of using plot. Essentially when I only have one type of mark instead of passing in that whole marks array an arguably quicker way to do it is to just call that mark directly and then call .plot chain after that.

[00:07:21] Both of them are totally valid, feel free to rearrange the code if you find one or the other easier to think about. Okay, so what's happening inside of our call to the bar Y mark? So first of all, we're passing in the data again. In this case, it's called devices because that is what we named the output of our mapped wrangled array here.

[00:07:44] So the devices arrays are properly wrangled data. The device analytics arrays are pre-wrangled, not quite ready yet data. So we're gonna be using our devices data and then we're calling as the input, as the options essentially for this bar Y mark, we're calling plots group transform function. And in this case since we have vertical bars, bar Y we're calling group X meaning across the X axis horizontally.

[00:08:20] That's where we wanna be defining those different groups based on our categorical variable. So if you wanted to, let's say have horizontal bars with the groupings vertically, you could swap those letters and call bar X with group Y, if that makes sense. So essentially we wanna be grouping on the perpendicular axis to the one that we want the height of our bars.

[00:08:48] As we go through if folks have questions about swapping those out, we can totally tackle that. Okay, so the way these group transforms work is they expect an input and an output. So these transform functions, they take an input from the data. In this case, we're saying the input that we want for this calculation.

[00:09:11] So the the categorical variable that we're interested in grouping by that is device. And because we're doing the groupings along the X axis, we're gonna pass that property just as we did before with our mappings to channels, we're gonna pass that into the X channel. And so we're gonna be grouping on that same channel, that X channel that is receiving these different device values.

[00:09:38] So, the second argument to this group X call is going to be our input to the transform function. And in this case, it's saying we want to group by device and we're grouping along the X axis. And then the output is going to be essentially how high each of those vertical bars should be for each of the different device categories.

[00:10:03] So that means in this case we're doing an aggregation of the count of observations. Later we'll look at some different aggregations that we could be using here these are kind of built in functions that plot comes with. So in this case the count function is going to say, okay, for each of these different devices for each of the categories that, that variable includes, make the value that I care about as the output of this group transform.

[00:10:33] Make that the count of observations that I saw for that device category. And because again we're drawing bars on the Y axis we want to map that output to the Y channel to say I want the height of each of these vertical bars, to correspond to that aggregated count.

[00:10:55] For each of the device categories that I'm drawing along the X axis with that group X function. So if this takes a little while to wrap your head around do not worry you are in good company. But essentially the important thing to remember here is that we have the inputs as the second argument, and then we have the outputs as the first argument.

[00:11:18] And the combination of the two is gonna say, okay, I want you to take in this categorical variable and I want you to output this aggregation. And then in our call to .plot we also have those generic properties of the overall chart. So for example, the margins.