Backend System Design

What is a Distributed System

Jem Young
Netflix
Backend System Design

Lesson Description

The "What is a Distributed System" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem explains that distributed systems involve parts located in different places to handle failure and scalability. He also walks through the core components of a system design, including clients, databases, servers, load balancers, and caches, emphasizing their roles and importance in building products or services.

Preview

Transcript from the "What is a Distributed System" Lesson

[00:00:00]
>> Jem Young: So we talked about a system, that was a pizza system, a pizza baking and almost delivery system. But when we talk about system design, we're not talking about isolated systems generally, we're talking about distributed systems. So what is a distributed system? Well, it's the same as a regular system except the parts aren't necessarily next to each other. So if we had a distributed pizza shop, what would that look like?

[00:00:28]
Not a trick question. Might have the kitchen in one part or in one location and then the receptionist are already taking in another location. Yeah, would that work? Not super well. So that's one way of thinking of a distributed system. Another way is, exactly, a franchise. We have all these independent systems that are all delivering pizzas, but they're all specialized in a certain region to get those pizzas faster.

[00:01:01]
That's a different way of thinking about the distributed system. But mainly the idea is, they're different, they're located in different places, generally on different machines, and they're all trying to solve a problem. And you can be as distributed as you like. Distributed could be, I've got a server in one corner of the room, I have a server in another corner of the room, that's a distributed system.

[00:01:23]
Or you can say I have a server in Sao Paulo, I've got another server in London, and my database is in San Francisco. That's a distributed system too. But this idea of it's not just one machine is really the crux of a distributed system. And why we distribute is because we want to handle failure and we want to be able to scale. It's really hard to scale just one machine, and we'll talk about that when it comes to vertical scaling.

[00:01:56]
You can do it. It's really simple, but what's the problem with just one machine? It's one of these two here. You still have the single point of failure. Exactly. And load. There's a finite amount you can scale a box. That's what we do. That's why we create distributed systems. Anybody who's taken a full stack for front-end, maybe recognize this artful diagram. But it goes back to one of my favorite interview questions, which is, how does the internet work, roughly?

[00:02:33]
And I'll keep beating this hammer because it's something I believe very strongly, but the internet's one of our greatest inventions ever. For whatever you think of social media and everything else, the internet's still just this amazing series of components and systems working together in cooperation. And it's extremely scalable, and it's extremely failure tolerant. And when you think, wow, we have the world's compendium of all our information.

[00:03:03]
And just a service we could access through a set of standards. That's incredible. Like what an incredible accomplishment. There is no parallel in humanity. We've never done anything else like this. Hopefully we'll do more with space or something like that, but I don't know. Things are bleak sometimes. But the internet still stands, an amazing distributed system, absolutely incredible. There is no system like it.

[00:03:29]
So we talked about how to design a pizza shop, but let's talk about the common system components and every system design is going to have the same baseline of components we have to know about. Again, like the race car example versus the commuter car, it's the details that matter here. But as long as you have the basic parts, then you'll already have a good start. You won't be thrown when, like, hey, what is this thing doing?

[00:03:56]
What does this mean? What do I need to think about when I think about a distributed system? So we have about 5 main components. Again, this can scale or shrink depending on, well, this won't shrink, but this will definitely scale up as our systems get more complex. But if we want to start, if we want to say, I want to go from 0 to 100,000 as an arbitrary example, these are the core components that we need to know.

[00:04:23]
One, there has to be a client. We're talking about building products here, but you can also talk about building a service, say, a third party API for developers. That's going to be the client in many cases. But in this case, we'll talk about products. So we say the client, its job is one, it actually sends a request, it kicks the whole thing off. It displays the user information to the user when it all comes back, but it's also the source of how we get any sort of input into the system.

[00:04:52]
What does a user do? You know, how do they put in their data? There has to be a client to do that. Otherwise, you just have this kind of closed loop system that doesn't really do anything. So think of this as an initiator, but all of our systems we design are going to have a client of some sort. The database. Everything we do is reading and writing from a database for the most part. So the database is going to, or data storage is going to store, update and review data, kind of in the name.

[00:05:22]
I use database, but really I should be using data storage because they're different things. We often use database, but really we'll say data store. After that, you have to have a server. Right? What's the client talking to? Nothing. Itself, localhost? Is it 127.0.0.1? You have to have a server. The server has to process the requests. It handles the business logic. And when we say server, it's any kind of server, API server, file server, anything that's processing your request and sending back a response.

[00:05:56]
When we talk about distributed systems, we need a load balancer. If you have more than one computer, how do you decide where the traffic is going at any one time? You have to balance that load out. So we have a load balancer. It's kind of in the name, one of the easier ones. It distributes traffic to keep things running smoothly. And then we have a cache. This I would say is, you don't always need to cache, but you should always be caching.

[00:06:23]
ABC, always be caching. If you're designing any sort of, yeah, I made that up. If you're designing any sort of system, there should be a cache somewhere in there. And there's probably caches inherently built in that you probably aren't aware of, but let's talk about being more explicit there. So examples of a client, browser, mobile app, TV as a client. What other types of clients? CLIs, CLI, yeah, it's a client.

[00:07:10]
What else? Your fridge. Mm. Tell me more about that one. All the wacky IOT stuff we got these days. The cars. Hm, cars. Oh yeah, yeah, cars, definitely quite. Come on, this is what we do. This is Frontend Masters. All we do is work in clients. That's what the front-end is, right? Someone give me a wacky example, not a refrigerator. What's Apple's virtual headset that they released, I forgot the name of it.

[00:07:43]
The Vision Pro. Vision Pro, yeah. That's a good one. Point of sale system. Oh, that's a great one. Yeah, I thought we'd have a lot of examples coming off, but yeah, these are all clients. So the input here is going to be some sort of terminal, someone said terminal, nice. Other servers, robots. Watch trains. Satellites. That one's, I don't know. That one's a stretch a little bit, maybe. It could be requesting data.

[00:08:28]
And sending data, that's true, hm, all your household devices, not just the right to the vacuums, the washers, all of it. The ones that are spying on you, yeah, yeah, the Roomba. Hot tub, sauna. Yeah, anything that's going to be that takes some sort of input that needs to change, and it's going to make a response. It's going to send a request to the back-end. That's a client. And the output is requests, UI updates, you know, changing the interface, changing the behavior of the client.

[00:08:59]
Servers, we're all a server, very overloaded term, but it kind of does what it does, it kind of does what it does. It serves requests, that's why it's called a server. Hopefully I just like, whoa, that makes sense now. I never thought of that. Different types of server, there's web servers, which is mainly what a lot of front-end engineers are familiar with. There's API servers that are doing the heavy lifting.

[00:09:33]
A video processor is a type of server that's taking an input and it's putting some output. What other types of servers? These you should know, just. Messaging servers. Yeah. Anything doing some sort of processing, doing some of the data, making requests. Would you put the database in this category or is that separate? We'll put it separately. Yeah because the server is designed to do something with the requests first.

[00:10:04]
The database is designed to take in a query and then look something up and then return that, but it's not actually doing anything with the requests per se. But I know potato potato, but we'll call databases different. The response of any server is going to be some sort of manipulation of that response. HTML, JSON, UI, a server can make other requests. It could output queries, modify data, etc. Load balancers, we talked about that.

[00:10:32]
It's in the name, balances the load across systems. There's software load balancers, there's hardware load balancers. We will not get into hardware balancers, load balancers, probably not that deep on software ones either, because they're fairly straightforward. Until they're not. But a load balancer is going to take a request and it's going to output a routed request to the correct server. Any example of a load balancer y'all could think of, bang off, if you've taken full stack for front-end.

[00:11:02]
Nginx. Yes, Kayla, thank you. Nginx, my favorite load balancer slash reverse proxy. If you're in Kubernetes, yeah, ingress controller. Oh my God, there's so many. There's a lot of load balancers. We won't get into the nuances of all of them, because mainly in my experience, it hasn't come up in interviews. No one's really looking for that level of detail. Just know I can name off a few load balancers and they work, but Nginx is going to be my favorite, but that's a reverse proxy load balancer.

[00:11:40]
It's all good. Is Nginx typically used as like isolated load balancer or is it used with other load balancers, or I mean, I guess it depends on use case, but um I'm just, haven't used Nginx as a load balancer before, mostly just as a proxy, I guess. Yeah, Nginx is a reverse proxy that is often used for load balancing, but also other things like authorization, SSL termination. It's kind of the everything piece of software, but Apache you can use the load balancer.

[00:12:21]
F5 Big IP is recently hacked, yes, very hardware load balancer, different levels. We won't get into level 4 load balancers, which is that more hardware level. Yeah. AWS load balancers. Yeah, we'll talk more about load balancers in a little bit. Databases, data source. Basic ones, relational, non-relational. Within those two words hides a boatload of complexity. Mainly today we'll be in this early section we'll talk about relational databases because non-relational databases are a whole new thing, and by new in computers, I mean like 20 years or so, which is new.

[00:12:59]
And then there's specialized types of databases, graph databases, object oriented, vector databases. We say column, time series. There are hundreds of different types of databases. We won't go into most of them. But the input to a database is going to be some sort of query. And the output's going to be data or a status response, like couldn't find the data or something like that. It could also be an object though.

[00:13:30]
Databases can store objects as well. Then we have a cache. Examples of caches, you could have a client cache, server cache. CDN type of cache in memory, but the input to a cache is going to be some sort of key. It's got to be fast. That's what makes the cache. And who's going to say, Kayla, if you're going to say, hey, isn't a cache a type of server? Yes, but it's a very specific type of server. I guess when you think about it, everything's a server, but you know, let's not be too pedantic, or maybe we will.

[00:14:11]
I was thinking of distributed caching. Yeah. That's definitely a thing. There's local cache, centralized caches, distributed caches. Lots of layers of caching. But a cache, what do you think it does? What do you think it does? Retrieves or stores frequently accessed data. Yeah. Yeah, exactly. Temporary data. That's the difference between a cache and a database because you're saying, hey, they both store data, what's the difference?

[00:14:41]
With the cache, it's going to be temporary, whether, whatever you want to use, lease, least recently used, static, large objects, those are caches. But is a cache a type of database? Yeah, kind of a key value store, which type of database. And I know it's getting confusing. That's why you have different names for things. Everything's a database, everything's a server. And what's a CDN? Content delivery network.

[00:15:15]
It's, I don't know if that one's hard to describe, but they've been around forever. Yeah, what are CDNs for? Serve assets, cache assets I guess. Globally, usually closer to the client. Might handle specific types of data, it gets to be a lot. The idea is you take a file and you copy it all around the world and then you serve it from the location that's closest to your request. Yeah, yeah, nailed it.

[00:15:40]
Kayla, Kayla, you said it first, so you get the credit. Mark, you've taken caching before. I'm just kidding. The important thing about the cache is it's close to the user, and that's what makes it a cache. If it's a cache that's super, super far away, or for at least it's for a CDN it's close to the user. But generally when it comes to caching, the caching is always close to the data. Otherwise, you know, we have a database essentially, and that's kind of a key differentiator.

[00:16:12]
But a cache, it makes things faster. Again, we always want to be caching where we can. And when we draw these out, it's a connection, it's an arrow, just indicates there's a relationship between the two entities. Earlier we talked about what's a system, the system's about the inputs, the outputs, and the relationships between all of these. So a basic system, and this is probably what most front-end developers know, this is, you have a client, you have a server database.

[00:00:00]
This describes pretty much any system, well, I won't say any system, most systems in the world. And this is kind of where our knowledge stops, generally. This is it. This is what you know, they have some sort of client that talks to the back-end API and there's a database.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now