Check out a free preview of the full Web Performance Fundamentals, v2 course
The "Statistics" Lesson is part of the full, Web Performance Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:
Todd discusses the drawbacks of using performance averages and why they can be misleading when analyzing web performance data. He also discusses the benefits of using percentiles to measure performance and explains how they can provide a more accurate representation of user experiences.
Transcript from the "Statistics" Lesson
[00:00:00]
>> Todd Gardner: And this is where we need to talk about statistics a little bit. Again, don't be worried. This isn't going to be like a crazy math class. We're just going to talk a little bit about a few concepts. First, average. Average is the the go-to tool that most people will use when they want to look at a lot of data.
[00:00:20]
So I have this huge set of data and I'll just take the average of it and I'll be good. But there's huge problems with thinking about performance in terms of averages and that's what we're gonna talk about. So let's say we have an average performance score of 80.
[00:00:34]
Now that could mean a couple different things. It could mean this averages out to 80. We have three people that get a 99, three people to get a 90, three people that get a 70 and three people that get a 60. What does 80 mean here? Is that we have half the people have a really great score, and half the people have a really crap score.
[00:00:59]
Nobody got an 80. Like there's no 80, there's nobody's having an okay time, everybody either loves it or they hate it. Now it could mean other things too, it could mean almost everybody had a great score we have a whole bunch of '90s, a whole bunch of 85, so whole bunch of like really acceptable scores.
[00:01:23]
But 10% of the people are so bad that like they would complain online about how bad and slow your site is. This is also 80, so 80 averages hide a bunch of interesting data. They hide like what the overall picture looks like. So generally, when we look at web performance data, we don't talk about averages.
[00:01:49]
We talk about something else. We talk about percentiles. What the hell is a percentile? Well, let's break that down. A percentile is if we take all of the data, let's say we have 11 scores here, ranging from 0 to 100 every 10. Just for simple math, the 50th percentile is if we arrange them in order, what score is the halfway point at?
[00:02:18]
>> Todd Gardner: So we have halfway through the 50th percentile is 50. Now another way of saying this would be p50 that's percentile 50. You could also call this the median, is the same word for the 50th percentile. The 50th percentile, that's interesting, now we're using that to illustrate the concept.
[00:02:37]
But generally in performance, we don't care about the 50th percentile. We care about some other scores. We care about the 75th percentile and the 95th percentile and the 99th percentile. We care about these because these represent the 75th is what do the majority of users experience? What's the worst score that the majority of users experience?
[00:03:01]
The p75 is in fact what Google's using in their Core Web Vital reporting to decide what's your one score. It looks at the 75th percentile. So the 75th percent worst score is that one score roll up. Now for your own perspectives, it's also interesting to know what your 95th or 99th are, because it's a representation of the worst scores, the worst performance.
[00:03:31]
Now, some of you might be asking, well, why don't you just take the 100th percentile? Why don't you take the literal worst score? And that's generally because of garbage data. The web is a weird place if you collect everybody's like load time over a couple of 1,000 data points, the worst score will be like this one took three years to load, which isn't real.
[00:03:52]
It didn't really happen, but you'll get garbage data, and so you generally wanna kind of throw a little bit away. That's the same reason why I have both the 95th and the 99th here is depending on how much data you have, you might want to throw away a slightly larger percentage.
[00:04:08]
If you only get 10,000 hits a month, or 50,000 hits a month, you might wanna look at the 95th because the 99th might still be a little weird in terms of like the amount of data that you capture. So let's look at how these percentiles change. So this is like an even distribution of data we had, one person with a 0, one person with 10, one person with 20, etc.
[00:04:33]
Let's look at like a different set of scores. So in this case, the p50 was 50, the p75 was 75, the p95 is 95, it's really easy to think about. The average for comparison is also 50 in this example with this very normal distribution. So, let's look at a not normal distribution.
[00:04:57]
So, here's a distribution of scores that's much closer to a set of data that we might get from web performance where it ranges in a much narrower zone. The worst people are getting 79, 80s, stuff like that. [LAUGH] The best users are getting like 79, 80s. The worst users got 256 and there's a range there.
[00:05:23]
Now notice between these distributions, the percentiles didn't change where they were. It always represents the median user experience, the majority worst user experience, and the very worst user experience but the average shifted a lot. The average in this case was thrown off by that garbage data at the end, we got this weird one off person that had a really bad score, and it threw off our average.
[00:05:53]
So we have to think about these percentiles and what they mean when we interpret larger data sets, we're looking at real data that has hundreds, thousands, tens of thousands millions of data points. And we interpret them by looking at the 75th or 95th percentile for representing the majority of user and the worst user.
[00:06:16]
So let's translate this back to lab data and field data. When we're thinking about lab data. Lab data is easier to capture. We open up our browser, we visit the site, we run a report. We got a data point, it's easy. But field data is more accurate. There's a lot of data points, but it represents the real experiences from the people who visit your site or if we were to translate that down.
[00:06:44]
Lab data is a diagnostic. Field data is experience. We generate a piece of lab data as a diagnostic to understand how did my site perform right now? And are there opportunities I could use to improve it? But field data is the real experience from people. So it's It's important on when to use each.
[00:07:05]
If you're trying to understand is my site fast, you need field data to tell you that. If you're trying to improve the score of your site, you can use lab data to help you understand what needs to be improved. The score you get in lab data doesn't mean your site is fast.
[00:07:24]
You can get a 100 perfect lab score and still have a slow, frustrating, painful site to use that's getting SEO penalized. You can also have a lab score that says your site is slow and crappy and have perfect Core Web Vitals scores, as far as Google is concerned.
[00:07:42]
They're not correlated It together, because a lab data is one data point in this spectrum. Let's talk about how we interpret this. So when we're talking about lab data and how to do it. What's really important is how you gather that lab data, and cuz you need to simulate something that looks a little bit closer to reality than what your workstation is.
[00:08:08]
You need to think about, are you going to simulate as mobile or desktop. Who are you trying to get performance information from? What kind of network conditions does a typical user have when they access your site and what's the processing power of their device? Chances are they don't have the same laptop that you do.
[00:08:26]
They don't have the processing power that you do, and they're probably not located right next to the host like you are.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops