Transcript from the "Stats & Graphs" Lesson
>> That's meeting, that's all you ever wanted to know about meetings, it's more than I ever wanted to talk about meetings. So, let's talk a little bit about metrics because I think metrics are really important to product management. I am not a statistician, I am not going to spend a lot of time talking about stats because I don't know a lot about stats.
[00:00:21] I know they're important and I get a lot of people to do stats on my behalf. And or I look a lot of dashboards. So I had never taken a stats course until like seriously three months ago four months ago when I took my first stats course as a part of my MBA.
[00:00:40] Until then I had just consumed stats and not really actually, been on the other side of producing them. So the first thing I to say is stats is actually one of the most useful forms of mathematics to someone that works in an office environment. Just because a lot of the stuff that we do has to do with statistics.
[00:01:01] So I pulled out some things that I think it's just generally useful for a product manager to be aware of or to think about. First of all, my master's program borrowed heavily from this book here called OpenStats. I put in here there's a couple of versions of this.
[00:01:20] You can buy a paperback edition of it, but it's also just free as a PDF. One of the cooler open-source projects, in my opinion, a bunch of statisticians came together and made a free stats book, and they update it constantly. You can see that they're on the fourth edition here of the OpenIntro to statistics.
[00:02:03] So just be aware anyway. Yeah, so you can see this is all in R. In other words, unless you are really bent on learning R maybe just stick to OpenIntro statistics. So I'm gonna say any moment that you spend learning stats, it's gonna be useful to you, it's not a moment-wasted If it's not required by any means, but it is useful.
[00:02:29] So I'm gonna just leave a couple of things in here, I think sampling is something that I found counterintuitive. But I found very interesting if you take a truly random sample of a large group and the sample is large enough that you can say it's representative of the whole.
[00:02:42] And why this is important something like AB testing or talking to users and some of those things that if you can get a large enough sample and you are not biasing your selection. So that for example, like talking to your users, it is biased in the fact that you're you're biased in your finding users that will talk to you and that you have access to no matter what that's still a biased sample.
[00:03:04] It's still better than nothing, probably question mark? But generally, if you can get a truly random sample of your users, you can say that that's representative. And therefore you can make inferences of like, hey, so many of our users are using this browser and they're, these kind of users and those kind of things.
[00:03:27] So, getting more information about who's using my product sampling is critical. Now, again, this is stats one on one, I'm sure all of your data scientists are like, yeah, obviously dah, right? Just useful to me. P-values, I'm not gonna spend a ton of time about this because I don't have a ton of depth here, and this is the most advanced stats concept that I want to talk about.
[00:03:53] But a p-value basically is saying, lower p-values means these two things are more closely correlated that you can think of it like that I think I spent a bit of time crafting this sentence here. A more accurate way to state what the p-value is is how likely that these statistics would happen if the things we're measuring are totally uncorrelated to each other.
[00:04:15] Which is hard for me to conceptualize, which is why I came up with this second sentence that lower p-value means the two things look like they're really correlated. I don't wanna spend much more time on it than that just because it's not actually the point of this course.
[00:04:31] But a lot of people have arbitrarily chosen the p-value of 0.05 which is otherwise 5%, it's like being the threshold to call something statistically significant, right? This comes into play, imagine you have an AB test where, we talked about that with Netflix, on Netflix you have a button that says, sign me up now.
[00:04:49] And a different button that says, just click here for free Netflix, and you have a, you're in an AB test of showing 50% of users this one, and 50% of users this one. And you end up with a result that treatment be that you click here for free Netflix is better for that has a higher signup rate doesn't say the p-value was like 0.2, right?
[00:05:15] You would call that statistically insignificant in other words, this was random noise, and we actually can't make a decision based on this experiment. Or if it had a p-value of 0.01, you could say, actually, I've basically proven that this is the better treatment, and we should definitely use this one.
[00:05:33] That's what I wanted you to get out of p-values, so that you can read experiment results and say, okay, this one's better or okay I actually can't take any information back out of this. So like the p-value is 0.2 it's not saying that it's the opposite it's not saying that like treatment A was better what it is saying is that this experiment was inconclusive So it either means you need to retry your experiment under different constraints so that you can tighten that p-value up, or it means that there actually just isn't correlation between those two things.
[00:06:06] One of those buttons isn't gonna do any better than the other one. And you can't prove it in this particular experiment that you were doing. That make sense, is that enough nonsense of Brian trying to explain statistics? Did I mostly get that right?
>> Yeah, that's pretty good.
>> The American Statistical Association has published a single consensus paper in its history, and it is on the p-value. And it is worth looking up.
>> Cuz it explains some of the confusion around how it's been misused, well, I think you've done a great job
>> Okay, I appreciate that. I spent a lot of time crafting that section because I don't wanna get this wrong because otherwise the chat is gonna tear me apart and then Twitter is gonna tell me apart. It's happened before my computer science class I there was Reddit threads about what Brian got wrong on the very first time that I taught it that I fixed for the current one.
[00:07:02] So go watch the current one to think I think it's pretty good. Okay a little bit about graphing, the first rule of fight club is never use pie charts. [LAUGH]. The first rule is never use pie charts ever, just hard pass. There is no good reason to ever use a pie chart, okay?
[00:07:24] If you walk away with nothing else from this course just never use a pie chart. You are already better pm than some that I know. I'll tell you about why in just a second, cuz everyone wants to use a pie chart. Why do we use graphs, why do we put graphs in anything?
[00:07:40] A lot of people put in graphs as we talked about earlier, which is like to show their work is like, look at these numbers. I don't care about your numbers I don't even care about like, contextually showing me numbers does nothing for me, so don't just show me a graph just to show me a graph.
[00:07:58] Graphs are for storytelling, they are tools of storytelling. Numbers themselves are not useful to show off, we use them to describe changes and to identify needs, right? So what if you show me a graph you better have a sentence underneath it or you're gonna you're bound to tell m is like.
[00:08:16] Here is our signup graph of people signing up or tenants on a line graph over-time. And there's a dip here, let's talk about the dip or let's talk about why this shut up., right? And you're using that as a as a tool to tell me something about it, okay?
[00:08:34] So given that you wanna have a graph that shows me your story, this is why pie charts are terrible, is because they're awful at telling stories, in other words, they basically don't tell stories. So I put a little graphic here, this came from the extreme presentation method, there's lots of things out there is like, what do you wanna talk about?
[00:08:57] And so like, do you wanna show relationships, do you wanna show distributions, do you wanna show composition? Do you wanna show comparisons and then like there's a bit of a selection chart here. So comparison, you wanna show it over time and you wanna show a few periods, then a column chart would be good for a single, a few categories, or a line chart would be good for many categories, right?
[00:09:19] I actually went through this and was like this is actually a pretty good selection methodology here. Generally, when I have numbers that I wanna show in a graph, either I'll ask my data scientists, because to them it's obvious which one you choose. Or I'll just Google it like I have this kind of data, what kind of charts should I use?
[00:09:38] That'll help. And actually Tableau, if you've ever used the Tableau software, and I know that that's somewhat a competitor to what the thing that I work on now. It's actually fairly good at generally suggesting you have data that looks like this, you might consider these kind of charts, that also helps as well.
[00:09:55] So tell a story with a graph, and then choose the right graph to highlight whatever story you're telling. Like a good one that I had to show recently was cost, like I was showing the COGS, and I can't remember what COGS stands for, cost of goods?
>> Sold, thank you. Which is basically like, how much does it cost to run my service? And we had one part of our spinner that was ballooning, right?. So I showed it in a stacked bar chart. So basically, you could see over time that like the spending was staying the same.
[00:10:30] And if you were just looking at the total cost of our total COGS for my entire product, it looks like we were flat and like good job cuz we were growing users but not growing cost. We actually saw that our costs were going down because we had done a lot of good infrastructure work.
[00:10:42] But we had one part of our service that was ballooning at such a rate that it looked like it was flat, right? And so I was trying to say it looks like we're doing a good job, we're actually doing an awful job. At some parts, we're doing a good job of other things, but a stacked bar chart told that story really, really well because you could see everything else shrinking and that one part growing, right?
[00:11:04] And stacked bar chart, you can see that that's this one down here, this is a stacked bar chart right here the stacked column chart. So I've used lots of different charts to tell stories, so just select one that helps you make that a bit bigger. Select one that helps you tell the story that you wanna tell.