Backend System Design

Scaling a System Exercise

Jem Young
Netflix
Backend System Design

Lesson Description

The "Scaling a System Exercise" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem explains how to scale a system for higher traffic by reviewing core technologies, adding caching, and applying vertical scaling and partitioning. He also covers using read replicas to handle heavy load efficiently.

Preview

Transcript from the "Scaling a System Exercise" Lesson

[00:00:00]
>> Jem Young: --- Let's put it all together. So we did a lot on data storage, we did some on estimation, we understood, hey, our one database is probably going to hold up for our Studio app even with 100,000 users. All right, here's our to-do app. Pretty good, works for 100,000 users. We've done the math on that. Was the math shaky? Sure. Was it good enough, or as we like to say, directionally correct? Yes. Let's talk about technology we're using here real quick and then we're going to talk about how we would scale this up.

[00:00:34]
So what we're using is we switched that load balancer to a reverse proxy. Remember we said reverse proxy can do a load balancer, but it's going to be more capable. So for that, I'm going to say I'm using Nginx. That's my go-to. Works pretty well, very few things I've run into that couldn't be solved with Nginx. There are specific ones you might want to use for reverse proxy if you're getting really, really performant.

[00:01:02]
You're talking probably hardware load balancers or reverse proxies. But if you're at that level, you're probably not in a system design course. You probably know what you're doing. But Nginx is a good, just standby. Web server, this could be anything. You all pick. What's the web server? Go. Bad choice. No, no, you can do Go. Is there a specific like what's the server for Go? Is it just Go? I use Gin router or Gunicorn.

[00:01:37]
I don't know how to say it, the Python. I'm not familiar with Go, so you have to like tell me what it is. Yeah, well, I think for the most part, a lot of people use the standard library. Just Go. Yeah, let's just go. All right, our web server is made in Go. Yeah, web server. You know, especially in front-end development, we probably argue about it a lot, doesn't really matter. Python, Express, basic Node, Java, they're all going to do the trick.

[00:02:06]
I know, oh no, web server, it's really complicated. Yeah, every one of these is complicated if that's your job, but at a high level, these all do the job. Our database, we said you know, we always choose the simplest thing for the job and then scale it from there. We chose to do a relational database because that's fairly straightforward. I'm going to say this is Postgres. And there you go. This is good enough.

[00:02:38]
Is it highly specialized? No. Is it good enough to hold 100,000? Yeah, probably 500,000. Yeah. But let's work through all the things we learned about data storage and say, how would we scale this up? All right, so we're hitting the limits. Something happened. Our app is super popular, especially in the party planning and or critical medical community. Thank you, people who remember the joke from earlier.

[00:03:18]
What's our strategy here? What's the easiest thing we could probably do to scale up? Is it the request that is causing the problem, or is it like the database can't handle it? Yes, Kayla, that is system thinking. What is going to be the bottleneck here in our system? It's always something to throw more servers, more technology at it, but what is going to be a slow thing in our to-do app? Reads, yeah, are we going to be reads.

[00:03:49]
Probably be something in the database because our to-do app is pretty simple. We're not saying we're feeding all your to-dos into some LLM and it's going to output, here's what we think your next to-do should be. We're not doing anything complicated like that. We're not, so the APIs are pretty straightforward. We're not doing any computationally expensive process, so our web server is still holding up pretty well.

[00:04:10]
If it got more advanced, we'd probably add an API server, but our web servers can do all that. So the slowest part here is going to be anything involving the database with the reads. A cache, yes. So a cache. Make sure the style fits, I like buy a certain style, go. And let's pop the cache in there. Cache. And then our cache would be anything. It really doesn't matter too much, but I was going to say Redis because that's lazy mode.

[00:04:56]
When in doubt, Redis will do it. Okay, so now we added a lot more, this actually adds a lot more capacity because we said our application is read-heavy and not write-heavy. So this allows us to scale up quite a bit. How much? It depends, but we're getting up to a million? Yeah, but at a million we're thinking a million users, we're probably thinking our database is starting to grow a lot more. We could do the math here, but for the sake of this thought exercise, let's just say, hm, our cache is working, but now our database is filling up and the writes are getting slower and slower.

[00:05:39]
What do we do next? Move to Mongo. Not yet. Any database migration is going to be the last thing you want to reach for because it's going to be expensive. There's going to involve downtime unless you do it very, very carefully. You need people to know what they're doing, specialized people called database administrators, you know, those people who love databases. Yeah, sorry, I shouldn't hate on databases, I just like couldn't do it.

[00:06:04]
It's not my thing. I have nothing but love for them. So that's not, we don't want to do that yet. We want to do the easiest, cheapest thing first. So what do we say? Scale up maybe? Vertical scaling, vertical scaling, yeah, scale up, yeah, so. Can't really represent that in a diagram, but let's just make this bigger. There we go. Now we're at scale. This is big time scale, a million users. We got our cache, we scaled up our relational database is now scaled up.

[00:06:47]
We are cruising along now. But good news, the Studio app is blowing up. It is the hottest Studio app on TikTok. I don't know how social media works, but it could be a thing. We've already scaled up, we're at the max. There are no, there is no bigger instance we can use for our database in the relational database. What should we do next? Partition, partition, yeah. There's probably a way of representing that.

[00:07:17]
I don't know how to represent it, but we'll just write it down. We'll say we'll do two partitions. Now we're at, I don't know, 2 million users. We've partitioned. We've got a very huge database still going well. We're getting even bigger. What should we do next? Shard, yeah. Yeah, next thing we do to scale a relational database, let's shard it. That one we can mess around with in the diagram. So we'll do two shards just for the sake of simplicity.

[00:08:05]
I'm actually going to add just another cache, one for each database. Do they need to talk to each other? Do we want to add an additional load balancer or? We do need a load balancer, yes, for the shards. So now we have shards. So we partitioned. Now we shard. We had a load balancer and now we have a cache in front of each database. Pretty good, so. You can see how the complexity has already grown when we scale very quickly.

[00:08:31]
We need a load balancer now. Now we need two caches. These are all things that can break, things we have to manage, things you have to think about when you're thinking about what's happening in my system. Which is why we always start with the simplest thing first, and then when we need to, we scale up. And because we've designed this really well, when we do scale, we don't have to change that much.

[00:08:56]
So on the back end, we're not changing too much anymore, or we don't have to change the queries or anything like that. That's already being handled automatically. The load balancer can do the partitioning to the correct shard if we need to. So this gets us pretty far, yeah. I was just going to ask you like, yesterday you talked about, you know, being able to scale up, but we also in times when you scale down, when we're not anticipating as much traffic.

[00:09:30]
When you're designing a system like this, do you have thoughts of, I mean, obviously we think about scaling up, but like, are you considering how we would scale down if we needed to? Yeah, that's a good callout. Yeah, we should consider it. I'd say in the context of a system design interview, you don't have to worry about scaling down. That doesn't really usually happen, but in real life, yeah, we should be thinking about how to scale this down.

[00:09:58]
Could we scale this down easily? Which part would we scale down? And my first thought was for the vertical scale. I guess you can't really undo that, or I don't know. No, Joe, you're right. We kind of left the vertical scaling with the sharding. I don't know how difficult it is to unshard a database, though. It is difficult. But it can be done in relational databases. It's just, again, anytime you're migrating, you hear the words migrating in database, think that this is going to be a pain in the butt.

[00:10:29]
But Joe, I like you said, vertical scaling. That's why vertical scaling, you can go up, but you can go down too, and we don't have to change any of our code to do that. So now when we shard our database, we automatically implied we're going back down. So if we're hitting performance loads, we can scale both these databases up again, we can scale them down. In this case, hey, traffic dropped off, our to-do app is very seasonal, people use it during the holidays, but after that they stop using it.

[00:10:59]
What should we do to scale down? We can shrink our databases, one. We can probably take out the caches if we want to scale down. We're saying, hey, let's make this simple. We actually don't need the caches because the databases are actually performing just fine if we want to do that. Yeah. Okay, so we're at the limits of scale actually. Our app is blowing up now, we're at 5 million users. What do we do now?

[00:11:31]
So our reads are getting slow, or what is the, yeah, reads are getting slow, we're since we're doing more reads, that's the thing slowing us down. You said partition. Kind of back into that loop again. We might partition after the sharding, that's the strategy we should be thinking of. So I like the action, it's just we can always partition more and we can shard more. We can partition more and shard more.

[00:11:55]
That lets us scale up quite a ways. We're going to have to hit a very, very high limit until we reach the limits of this particular architecture. And that's, you know, that's what I, we should be internalizing is we don't necessarily need to switch to a NoSQL database just yet. That's the last thing we would do. And that way, we can simplify this quite a bit and say, hey, there's a load balancer, maybe we have two databases, there's cache.

[00:12:29]
We don't have to partition and shard anymore because that's automatically done at that stage. But partitioning and sharding takes us pretty far. Maybe another thing we can do is we can have read replicas. We can institute that right. And read. Let's see how that would work. So we got our cache, looking pretty good. We could say this one's read. Say this one's write. And let's go ahead and add another.

[00:13:39]
And here, nothing fancy. And it's also read. Yeah, so we introduce read replicas now because most of our database is being read, and we have caches in front of those reads, so this gets us very far. This handles quite a lot of load, and we have a replication strategy as well for backups. When you replicate after you've sharded, you need to replicate each sharded database. Yes, yeah. Everything's a trade-off.

[00:14:05]
Like, oh, sharding, partitioning, cool. But when you replicate, you also have to replicate those as well, the partitions, those shards. Nothing's free. But did this, did we go, I should have saved the other copy, but we went from something that looked very simple to something more complex, but we actually didn't do a whole lot here. We're just duplicating technology we're already using and we're just changing the way we're using it.

[00:00:00]
And this gets us pretty high on scale. So databases are often the slowest part of your system, but we have these strategies we can apply in pretty much any scenario that allows us to scale these up and scale these down pretty easily and handle the increased traffic.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now