Visualizing Flow of Data Exercise

Netflix

Lesson Description

The "Visualizing Flow of Data Exercise" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem guides students through the visualizing flow of data exercise to map information flow between clients, servers, caches, and databases. He explores variations like mobile clients and specialized caches, emphasizing understanding components and boundaries for effective system design.

Join Now

Preview

Transcript from the "Visualizing Flow of Data Exercise" Lesson

[00:00:00]
>> Jem Young: We can go a little deeper than this. So let's do another exercise. Let's visualize the flow of information, because this is really important. Now we learned all about clients and databases and servers. Create a diagram showing the flow of information, and above each component, briefly describe its role in the flow. So now we're doing real diagramming now with real parts, not just building cool pizza shops.

[00:00:39]
How did the exercise go for everyone? Makes sense. Any questions? Did it challenge you a little bit on like, where does this stuff go? Even though we just talked about it, did it make you think, am I certain? Well, I like details going from server to cache, but really it's kind of, I mean, it's going from server to cache, but then if there's like a cache miss and maybe you get a grab from the database, you're not really going directly from the cache to the database as in that diagram there.

[00:01:13]
So it's just figuring out what—okay, looking at it, getting the details down and where is everything actually going. Yeah, it's a lot to think about, because then you have to think like, what, I want to cache somewhere in there. Where's my cache? Where should it go? What's important here? I had a similar thought for load balancing because I've inherited systems that chose to not use a load balanced database endpoint and that's like the first thing we look at.

[00:01:48]
Yeah, let's talk about the variants. First, let me finish writing my—so we have the client. This is my flow. We got the client talks to load balancer. Load balance routes traffic to the server, the server, and we'll call this web server. We'll say the server's job is to fetch the data, and I could say something like serves the UI if I wanted to. That's not the type of server. And we'll say the cache.

[00:02:20]
What do caches do? They hold common data, it holds common data or recently used data. And then we'll say the database, that's what databases do. Stores data. Nothing very fancy here, anything you're like, I don't know about that. I mean, it's okay, tell me. Does yours look like this? Well, um, I was going to say one thing I did add to my diagram was a client-specific database or not database, but cache just to make sure that you're not making a bunch of API requests.

[00:03:07]
I did the same thing mostly because people use Next year so much. I like that, yeah. So what you're saying, we could—instead of this or separate cache, but two caches. So like for example, specifically for a React application you might use React Query. Yeah, you could totally do that or call it a client cache. We can call it, we can even say this could be a CDN. Also valid. I mean, we can get rid of this cache too if we want to do this instead.

[00:03:54]
What else? What other variants you all came up with? I just had the server—so I'm going through the cache. We said this, but going to a separate cache and then there's nothing cached for that request, then go to the database. Yeah, you could have a cache at kind of all these layers if you really wanted to. I didn't put a lot of cache in there because then you'd be like thinking too much about it, but other than the load balancer, which you could cache at the load balancer too, like certificates, things like that, you're not really going to do that as much.

[00:04:29]
Yeah, you can have a client cache, you have CDN, you can have cache here, database has its own cache. So we're not going to write to that, but yeah. Let me try something a little different for you all, and bear with me. I play with the drawing a little bit. So I'm going to copy all this so we don't lose it. Let's try a different variant. So what if I said, what if I said this was a mobile client? How does that change things?

[00:05:17]
I forget the name of it, but the local similar to local storage. I'm blanking on the name of what they use these days. Yeah, what's the difference between a mobile client and say a web client? Getting the app from the app store, so it's installed separately and not from the, maybe your server. Yes. So what does that mean? That means you wouldn't necessarily use that CDN. You'd have the mobile app making API requests typically, not request to get UI.

[00:05:55]
Yeah. Yeah, we don't need a web server anymore because that's already on the device. So the UI is already baked in and we can just assume that's there. So we can make our flow look very different. We could say now the job is to get that cache. Bear with me. Same components, very different system. So now the role of the system is just to keep the cache up to date. So if we had a mobile app, let's say, I don't know, like something we have a customized homepage, Instagram or something like that.

[00:07:13]
We can pre-compute that entire homepage and the data needed for it, and we throw it in the cache. If it's something commonly shared, maybe Instagram's maybe too personalized. I'm trying to think of a good example here. Something quasi-personalized, New York Times or BBC. BBC is going to have some sort of regional homepage, so you have US one, Europe one, etc., and you can say the entire job is just to keep this cache up to date and the mobile app is just fetching that instead of making all these long API requests, because it already has all the information, the logic needed to render.

[00:07:48]
All it needs is the data now. What do you mean when you say that a phone can have a server as well? Do I say a phone can have a server? Maybe I did. Lisa's saying yes, Michael's saying no. We're kind of saying the phone is the server. It's the concept of edge computing versus a remote server. Yeah, maybe that was it. The phone, we don't need the web server anymore because the UI is already baked into the client.

[00:08:24]
It's more what I was saying. Is this how the correct flow is? We go from server to cache and then to the database? It's already cache database can be. Yeah. I could probably draw more arrows, but this is more of a back and forth in all these. I thought of a better example. BBC homepage, yeah, it's semi-personalized, something you want to cache. This is actually a good example of what a streaming video would look like because it might make some API requests and we could have a small flow here, but that's not going to be the bigger part.

[00:09:01]
The bigger part is all the video processing. And really, when it's, if it's a video service, you're always pulling from a cache somewhere, a CDN generally. So what's a good example is, this could be closer to like Open Connect, Netflix's CDN around the world. And we're just, our job here is just to populate the cache instead of making all these round trip requests. So does yours look different? Then let's—little bit, yes.

[00:09:39]
Is that okay? Yes, yeah, it's okay, because again, we didn't specify what we're trying to build here. I just said use these parts in some way, and we come up with different systems. And that's why when we get to the next part, which we're going to get into scoping the problem, that's why it's really important. And that was one of the purposes of exercise one, get used to all the components, thinking critically about what is each one doing.

[00:10:05]
And when you can think, when you can lay it out like that, then you know, oh, wait, my web server is actually serving—what's it doing up here? My server is fetching the data, it's serving the UI, it's doing the authentication, it's doing all these other things. Maybe this is a choke point in my system. Maybe I should have two servers. Maybe I should have different types of server here. And that's where we're going, is you can't just take a bunch of components and lay it out.

[00:10:32]
That's an easy way to fail on system design. You just make assumptions and we don't make assumptions, we ask questions and then we get into, are we building kind of a standard application flow? Are we building a streaming video application for mobile apps? So when the diagram flows from the server to the cache and then this cache flows to the database, would that be a case where the cache would be making the request directly, or would that basically be the server makes a request to the cache to see if there's there, sees that there's a cache miss, and then the server makes the request directly to the database and then retroactively puts the data into the cache?

[00:11:12]
Yeah, you can have different ways of caching, so you can have a pass-through cache, which is, hey, let me check the cache. Cache says, nope, not here, let me forward this request on. We could make it more round trip. Show a pointer here where the server checks the cache, the cache says nope, and then the server, it bounced back to the server, it goes through. But yeah, those are all valid flows. Also, I didn't diagram the flows going backwards too, which, you know, sometimes I do, sometimes I'm like, unless it's critical, just assume it's going backwards.

[00:11:45]
Good question though. Yeah, I got it, because like I said, I just haven't dealt with databases and caches as much, so I wasn't sure how common it was just for like the cache to kind of handle those concerns. Yeah, I put it in front of a database here because it's, you can call it a query cache. Usually those are built into the database, but if you want to expand the capacity of your database and the power, you put a cache in front of that, which is just saying, hey, every common query, fetch the homepage, fetch the news, news feed.

[00:12:17]
Instead of making that round trip, it's just storing that query and then pulling it instead of going to the full, I got to do a lookup every time. Yeah. Good question though. So to wrap up the previous chapter, we learned everything as a system. Everything you're part of a system. Everything we do is part of a system. It's amazing. This is what humans are good at, building systems, building complexity.

[00:12:39]
Every system has an input, output, and the boundaries, and the boundaries are what define the system, the scope of it. All distributed systems have very common building blocks, and at the end of the day, there's not that many of them in terms of types. It's the nuances of those building blocks that make the difference. Remember the commuter car versus race car. Same parts, very different details. And if we understand, understanding systems is all about understanding how it all works together, you know, knowing when you need to know what's in the details, knowing what's important.

[00:13:14]
We talked about load balancers a little bit and reverse proxies. When is that important in the system design and when is it, it's a load balancer, cool, let's move on to the next thing. And when you can do this effectively, you're just an effective engineer all around, or engineering manager, or anything to do in tech. If you can describe systems accurately and be able to diagram them, the steps and the flows, you are already a level up.

[00:00:00]
So if you're stopping here, cool. Congratulations, hopefully you're a little bit better than you were before.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now