Check out a free preview of the full Web Performance Fundamentals course

The "Interpreting Field Data" Lesson is part of the full, Web Performance Fundamentals course featured in this preview video. Here's what you'd learn in this lesson:

Todd describes the biases around browsers or types of users that should be considered when interpreting field data. Performance averages can also skew results since an average score may not accurately represent the majority of users. Percentiles like p75 or p95 can be the most helpful metrics.


Transcript from the "Interpreting Field Data" Lesson

>> But we have to think about what that data means. So let's take a look at some quick statistics. Cuz field data is different than lab data, when we get lab data, we get a score. So our website was ranked 95 in a lighthouse score. So we think, yeah, that's great.

But if we take corresponding field data to that same score, we get lots of different scores. We had some people who had that 95. We have some people who had 100, and had a way better experience. But we also had some people who had a 64, and had a really crappy experience.

And so we have to figure how do we interpret this data. What do we think that this means? Because 95 isn't the whole story. There's a wide range of experiences and understanding what that means can change what you do. Now when you look at that field data, we have to understand that we're taking a sample.

This is not all browsers will report this, especially web vitals. Because the web vitals interfaces, the APIs in the browser that report this data is only in blink-based browsers today. Firefox hasn't implemented it. Safari hasn't implemented it. This is just in Chrome and Edge. These are the only people sending this data.

Now fortunately, there's a whole lot of people who use that. And so we still have a pretty significant sample size. But there's bias in that data that we need to be aware of. So for example, the red section of this, this is browser usage that I captured off of internet stats website as of recently, 2020 aggregated.

It might be a little different now. But for the purposes of our conversation, it's fine. So all of the red is Chrome. And the green, which is Edge is now based on blink, now that they moved over to it. And so we get data from them as well.

Unfortunately, 14% of that Chrome section says it's Chrome. But it's Chrome on your iPhone, which isn't really Chrome. It's a Chrome wrapper around the Safari engine. And so it does not report web vitals. Because it doesn't have the API to do it. So the data here is going to be skewed.

It's gonna be biased. The data might have a positive bias on desktops because Chrome is gonna represent a percentage of the users. But those users who use Chrome are probably more modern machines. They probably keep them up-to-date a little bit better. And it's going to ignore all of the old versions of Internet Explorer.

It's gonna ignore all the Firefox users. It's gonna ignore the privacy of people who are privacy aware and use Brave and Opera and those sorts of things. We can also infer that it might have a negative bias on phones. Because the data that we'll get from from mobile devices is all gonna be from Android.

It's not gonna be from iPhones at all. Android devices tend to be quite a little bit slower on web applications than iOS phones. And so mobile data is gonna skew worse than the actual average experiences. And for Mac OS users for MacBook Pros, that sort of thing, which have a pretty high adoption of Safari relative to Chrome.

There's still a lot of Chrome, still majority Chrome, but there's a lot of Safari. We're not gonna report that. We're gonna underreport that percentage of users. So as you think about this data and interpret it to your own user base, you just have to think about these biases and pull them into your data.

Another thing we should think about is how do we roll up these numbers. So we have a whole bunch of different scores that come in from field data. And let's say we're gonna be super simple about it, and we're just gonna take the average. Well, the averages kind of sucks a lot of the time.

Because here's an average of 80 based on these set of scores 3, 99s, 3, 90s, 3, 70s and 3, 60s. Well, mean 80 is right. But it's really half of my users have a totally unacceptable score. Here's another example also 80. This is the majority, my users, have a pretty great score.

But I have two people who are just so bad that they drag the average down. And they get seemed like I'm not doing as well as I am. And so averages are very misleading. And so most of the time, field data is interpreted in percentiles. And what I mean by percentiles is median.

So a median half of your users will have a better score than the median. And half of your users will have a worse score. So the median in this case, or the 50th percentile is 95. That's a very important number. And it's interesting. But for Google's purposes, they usually care about the p75 number which is 75% of your users have a better score than this.

Google cares about that cuz they're saying, you know what? We're gonna eliminate a big chunk of people who have a really crappy experience on the Internet. But most people will have this score better. And so we're gonna rank you based on that. From your own personal perspective, I think it's really interesting to care about p95.

p95 basically gives you the idea of what your worst user sees. It's not actually your worst. But a lot of the time, your worst score from field data is gonna be so hilariously bad. You can't even imagine how it is possible. It's going to be from a boat in outer Mongolia that runs on a 486 processor and has no idea how to load a modern web page at all.

It is going to be so hilariously bad that you won't think it's real data. And so usually, you have to lop off the worst 5% of users as erroneous data. And p95 is usually a common consideration for what your worst users would experience. So how would we interpret that?

Let's look at some real data. This is a bunch of made up data. Let's look at some real data. Here's some charts from real field data for real websites. And let's see what does this tell us about it. So here's the chart, and it shows us how many users experienced the site in each of these buckets.

So almost 2,000 people loaded a page in less than a second. Little, fewer and the median loaded it in less than two seconds. Little, fewer left in less than three seconds, and that is our 75th percentile. So at this point, we've covered 75% of our users. 3 seconds are faster, feels like a pretty fast site.

But then the long tail comes out. And it slows down very gradually and the 95th stretches all the way out to 7 seconds. And so we have a wide distribution for even sites that we feel are super fast. Here's another example from a real site. Now this long tail stretches out really far.

And it's not a perfect normal distribution. So the site itself is obviously slower. The median is at four or five seconds. The 75th is all the way up at eight seconds. But somewhere up here, between 10 and 15 seconds, we seem to have more users have this worse experience than the ones around it.

There's something about our site that makes this particular behavior, this particular performance more likely than it should be. And so that would be worth exploring more. The 95th is obviously just terrible, the worst users are seeing 20 seconds or more. Here's a hilariously bad site that almost has an inverse distribution where it is really unlikely for people to have a good experience.

Some people obviously do. These are probably people who work in the same building that your web server is in. But most people, the most frequent performance is 30 seconds or more, which is just a hilarious thing that you would never imagine happened. But this is real data captured from real websites through request metrics.

So this data can take all kinds of different forms. So when you start measuring your own field data, you have to look at it thinking, I have a distribution of users. That distribution is biased based on where my data is coming from. And then what does that data tell me about my site?

And how it performs for different classes of users? Now I know those are a bunch of abstract questions because they are, every site is different. I can't tell you what your performance charts mean to you. Because every application stack is different. What your code does is different, who your users are serving are different.

And so I can't give you a, here's how to make your site fast answer, because that's going to be relative to you. But looking at your data can help you understand what your real user experience is.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now