Transcript from the "Analytics Project Data Setup" Lesson
>> So we are going to jump back in with another project. This one is going to be the project what size devices our users browsing with. You should be able to navigate to it right from that sidebar there, the data visualization first steps collection you should see there.
[00:00:20] And we can drop the link in the chat as well. So that is what we're gonna to be working through next. And so in this project we're going to be looking at a different dataset. This time, we have another file attached to our notebook. This one is called device analytics.
[00:00:40] And we'll notice it as a CSV comma separated values file. Another really common type of file that you might get data in, again, if it's a small data set. And this is some mock data about usage to a hypothetical site that is the kind of data one might be able to export from an analytics platform like for example, Google Analytics.
[00:01:53] So we have a column called device, where we have a kind of size category desktop, mobile or tablet. We might have a few tablets in there if we scroll down. Then we have the resolution column which shows us the resolution that folks on this device, this resolution we're browsing.
[00:02:14] And then we have a column for the number of users that we saw with that combination of device and resolution and the number of sessions that those users browsed in on that device. So, what we've got here is some data that has the values that we want, but doesn't necessarily have the exact shape that we want.
[00:02:34] So we're gonna do a little tiny bit of lightweight data wrangling, which as we talked about before is a crucial first step to setting ourselves up for a nice visualization. So let's take a look at one of these objects here. So if I'm just looking at the first thing in this dataset, I see I have these different properties corresponding to those column names and then I have string value for each of these observations here.
[00:03:04] So our first task is to get this into better shape so that we can more easily visualize it. So we have a couple of problems her. First of all, we have this resolution value hear which is a string that combines both the width and the height of the user's screen or resolution settings.
[00:03:26] So what would be more helpful than that, is to also have additional columns that separate out the width and the height of the resolution. So instead of one string where we have this kind of 1920 by 1080, we also want a column that says, width that's 1920, and then height that's 1080.
[00:03:47] And it would be great if those were numeric values instead of strings, because that way we can actually treat them as a continuous data space. Similarly, we notice we have strings for these users insertions counts, and again, these are actually numbers, so it'd be useful to have them as numeric values.
[00:04:08] So our task, our mission should we choose to accept it is to do a little bit of lightweight data wrangling using our good old friend array dot map. So we have this device analytics array with our original data, we're gonna map over it with a function that we're gonna write.
[00:04:28] So we're gonna replace this little identity function to make sure that we first of all add width and height columns with numeric values corresponding to the values broken down from the resolution column. And convert the values in the users and sessions columns from strings to numbers. So there are some hints about ways that we can do those conversions in the hints button.
[00:04:54] But this is just basically going to be a little thought exercise for us to get ourselves into the habit of wrangling our data to make our lives easier later.