Data Visualization First Steps Data Visualization Use-Cases
Transcript from the "Data Visualization Use-Cases" Lesson
>> So, okay, data visualization is what we are here to talk about, but why? What are we even talking about? What is data viz and why should we as software developers even care about it in the first place? Well, we've heard a couple of reasons already so far.
[00:00:15] But, let's go through and take a look at a few ways we can understand data viz and how it relates to our day to day in the world of software development. So, one way we could characterize data visualization is as the process of translating data to graphical representation.
[00:00:38] Not only is that kind of a yawn inducing definition, but it also doesn't really tell us a lot about what the interesting value of data viz is. So okay, yes, we have some data and we want a graphical representation of it but why does that matter? Why isn't it easier to just look at the data itself instead of looking at a graphical representation of it?
[00:01:02] Well, one reason is because our bosses ask us to build data visualizations so that we can put together things like dashboards or insight charts in the web apps that we're being tasked with building in our day to day work. So one thing we could say is that, well as front end developers, we often have to put visualizations into the sites that we're building.
[00:01:25] Whether it be for internal tooling to kind of understand our systems better, or whether it be because the app that we're building wants to expose some kind of information to users about the data that they care about. And so that means that it is also, as we said before a very relevant skill especially in the front end world but I would say for software developers in general, it is a really nice addition to have on your resume because it pops up in, sometimes it is an entire job, right?
[00:02:00] Of being a data visualization developer. But sometimes, even if you're doing front end or full stack development, and most of the time you are building those apps. And take a look at a few ways we can understand data viz and how it relates to our day to day.
[00:02:16] Got it. And that's because the reason that we care about making these graphical representations of data, is that by visualizing data, we, human creatures with are very visual perception are more quickly and accurately able to discover meaning and patterns in that data. Than if we were just looking at rows and rows of numbers.
[00:02:41] And so this means that data visualization essentially lets us more easily and more quickly discover meanings and patterns And then give us the insights. So the answers to the questions that we have that will help us in our day to day, whether that's as developers and we're gonna to look at some of the questions that we can answer with data and data visualization being a very effective way to find those answers.
[00:03:08] But also, again for our users, right? If we're building apps and we want our users to be able to get insight into their own data or the things that they care about, using graphical representations of data is a really great way to quickly convey that meaning, that insight to the audience's that we're trying to help out.
[00:03:30] Okay, so let's take a look at an example here. We have data, we're swimming in it these days, right? It is just coming out our ears. We have so much of it, but finding meaning within it, that's the challenging part. So if we're looking at this data here, what do folks notice about this data?
[00:03:48] Got some rows, got some stuff in here. Shout it out or in the chat, feel free to drop it in, what do you notice here? What does this data say to you? What do you discover from looking at these rows? Anything? So I'm seeing some URLs here and, they have GitHub in them.
[00:04:13] So it looks like this has something to do with GitHub maybe. I also see spelt Js. So it looks like maybe something to do with the spelt GitHub repo.
>> Someone said requesting data from the GitHub API.
>> Yep, exactly, requesting data from the GitHub API. I don't know, there's some Shah's going on there some more URL stuff.
[00:04:35] There are some status, there's a status column here, everything is completed, success. Okay, now I'm seeing the word test, so maybe this has something to do with tests running. But I don't know about you all, but I'm not immediately getting a lot of value from this data. I'm not immediately seeing any kind of patterns or knowing what I should be understanding.
[00:04:59] I've so far gathered, maybe this has to do with tests running on the spelt JS repo on GitHub, maybe, but I don't really know too much more than that. Now, what we look at it graphically? So this is the same data, but visualized. So what can we see here?
[00:05:18] Anything else jumping out to folks that wasn't clear from the rows of data themselves?
>> It's like showing the frequency of a, maybe a request to these different nodes.
>> Right, okay, so we have these ticks happening for all of these different nodes or these different rows. So it looks like something is happening with some frequency.
[00:05:44] Exactly, something is being repeated. Anything else jump out at folks?
>> Just looks like tests running and depending on when they finish they could finish between two and six minutes or whatever.
>> Right, exactly. So now we know something about the number of minutes that each of these tests took, right?
[00:06:02] The duration of them. Could we get that from just looking at the rows of data? With difficulty, maybe I certainly didn't notice it immediately. Anything else?
>> The ratio of successes to failures and canceled tests.
>> Yes, exactly. So now we have this element of color coming in.
[00:06:21] So we have success displayed in green, failures displayed in red. And of course those colors can be tweaked for better accessibility and things like that. But already from that, we can see that most of the ticks look like they're green. So it looks like we have a largely successful runs, we have some gray ones that are canceled and we can see where a few red failures jumped out at us, which maybe is something we'd want to dig into and take a closer look at.
[00:06:48] Okay, so this is just a tiny example of how, when we're looking at data, this is in one sense a visualization, right? These are these are values set up in rows, one row for each observation and one column for each kind of characteristic or different value of that observation.
[00:07:10] But in terms of finding those insights, like, here are the tests that are failing or this is how long most of my tests are taking to run. That is, I hope you all agree, much easier to grasp quickly from a graphical visualization. So that's the TLDR, we can all go home now.
[00:07:30] No, just kidding. All right, [LAUGH] so this is what we're gonna be pouring into today. We're gonna be discovering how we can build customized visualizations to give us the insights that we particularly care about. And that's gonna be different for every particular question that we have, we're gonna want to set up our visualization in different ways to give us the answers to the questions that we care about from the data that we're working with.
[00:07:57] So, now, for us particularly as developers, why is data viz super relevant for our day to day work? Well, there are a lot of things that visualizing data and getting those insights from the visualizations can help us do as app developers, as front-end developers, as full stack developers.
[00:08:18] So, for example, when we're developing features for our user, if we look at the data, of how people are actually using our sites, and our products, we can find out some really great information to help us develop the right things for them. So for example, asking what should we build?
[00:08:38] Like what issues are folks struggling with? What areas of our app are folks spending the most time on? What are all of our issues that folks file on our open source repo? What patterns are we seeing in there? Prioritizing use cases. Later on we're gonna take a look at visualizing data about the different types of devices that folks are using when they visit our site or our app.
[00:09:04] And understanding, okay, most of our users are on this type of device can help us prioritize things like the designs that we build, the features that we bake in to our platforms. Understanding whether or not our features are successful, okay, I just spent the last six months of my life developing this really shiny glittery button, are people clicking it?
[00:09:28] How do I know if I don't actually look at the usage data? And by visualizing that I can see, okay, which features that we've built in the last quarter or year, whatever, which ones are landing? Which ones are folks actually getting a lot of traction from? And that can help us plan out our roadmap in the future to know what types of things folks actually wanna see on the apps that we built.
[00:09:53] And by the same token, as we are continuously developing these sites most of the time, right? It's never a done deal. We can understand better as we're iterating on our feature set and on the functionality and the user experience of our sites, we can understand from the data that we look at how we should progress and what things we should keep and what things we should maybe figure out alternatives for.
[00:10:18] So in terms of feature development itself, data visualization, data itself very valuable and visualizing that data to see those patterns, super valuable. Okay, what about performance? This is an area where data viz can be super handy, in terms of understanding how fast is our system running. We just saw in the test visualization that we can get a sense of how long are these test jobs taking to run.
[00:10:44] If we see things taking ten minutes and so on and so forth we can dig into those cases and try to optimize performance. Similarly, when we're looking at the performance of different parts of our site, we're gonna look later at some data around API responses. And see, okay, are there certain endpoints on our site that are taking a really long time to return responses to the user?
[00:11:08] Or are there certain features again, that we could improve the performance of by digging into those details in a visual depiction of that data, we can more quickly understand where we should focus our attention where those bottlenecks are. And we can also understand how well our improvements are actually improving things by the same token reliability, right?
[00:11:32] Are we getting a lot of errors or do we have, for example test jobs that are really flaky and tend to timeout all the time or what have you. So understanding not just speed but also reliability and maybe we have different trade offs and in terms of, maybe it's okay to be slow, but we want to make sure that 100% of the time the user is getting what they need.
[00:11:54] Other times it might be better to make the other trade off and say, okay, we want things to be really fast even if once in a while something gets dropped. So being able to look at that and say, how well is what we're building matching In our needs or our users needs, in terms of performance, super useful.
[00:12:12] Yeah, and by the same token, understanding where those bottlenecks are where we should improve. And similarly to with feature development, if we have made some performance enhancements, being able to prove to ourselves and to our bosses, of course, that they actually made a positive impact. And helped improve the performance of the site can be really successfully accomplished by showing a visualization that says, hey, look how long this bar was before and look how short it is now.
[00:12:41] So this can also be really useful for not just finding those insights but also conveying those insights to others. The final area that I think it bears mentioning in terms of how visualization can help us as developers is understanding our code and our development practices themselves. So not just the site's than the features that we build but actually the process of development that we engage in.
[00:13:06] So if we look at data about our code base or about the contributions to our code base, for example if it's a GitHub repo we can quite easily discover a lot of interesting things from the data that GitHub exposes. We can understand things like how is the codebase organized, right?
[00:13:23] What is the relative size of different directories in the codebase or which are the parts of the codebase that are quickly moving and changing a lot and which are the ones that are more stable that are sitting more kind of in stasis as it were? When we're talking about our development workflow, again, we just looked at some test jobs from continuous integration.
[00:13:48] If those jobs are failing all the time or taking a really long time, that means our development workflow is gonna be a little bit laborious, it's gonna feel sticky and difficult to move through. So how can we make that smoother? Or if we're looking at things like, how long does it take for our site to deploy once I make a change?
[00:14:07] How can I improve that process? How can I make it a smoother faster process from merging a change in to the codebase to getting that up in front of my user, for example? Likewise, we can visualize different notions of productivity and I'm using quotes around productivity here because I think, we as teams need to define what our own notions of productivity are.
[00:14:31] It's not always going to be sort of these archaic metrics, like lines of code written in number of hours. But maybe it'll be things like, for example how quickly are we able to respond to open source contributions if folks open an issue on our open source repo? How long do they have to wait before we get back to them?
[00:14:52] Or how long do our developers have to wait for their PRs, their pull requests to get reviewed by other teammates? Or how does collaboration across different projects in our organization work? So understanding notions of productivity and visualizing that data so that the team can, again, have smooth our collaboration and a more pleasant day to day as we work together, as we collaborate is a really great application in data viz as well.
[00:15:24] And yes, similarly if we do have open source projects weather as perhaps individual labor of love, open source maintenance or maybe we work on open source for a living day to day. Understanding how the contributor community interfaces with that project can be a really useful way to take a look at some contribution data for example, again from GitHub.
[00:15:47] And understand okay, how many new contributors do we have every month for example? Or how many folks come back time after time and continuously file bug reports to help us out etcetera. So, this is another kind of area, this notion of the the meta level of how well our development workflows and practices functioning that we can get some really solid answers to by visualizing data about our code bases and our contribution and collaboration.