Lesson Description
The "Availability: Primary, Replica, & Snapshots" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:
Jem explains database availability, focusing on uptime, replication, and preventing data loss. He covers replication approaches and trade-offs, and emphasizes combining strategies to ensure resiliency and performance.
Transcript from the "Availability: Primary, Replica, & Snapshots" Lesson
[00:00:00]
>> Jem Young: All right, so now we all know the difference between partitioning, sharding, amuse your friends, your database administrators at parties, very popular. Let's talk about availability. We talked about consistency. Yeah, relational databases pretty, pretty consistent. But availability in general, how do you get those 5 9s, 99.999% uptime? I forget the actual calculation on that, but I think it means in a year you can only be down for like a couple of minutes to get 5 9s availability.
[00:00:35]
But I should change the slide. Instead of saying what happens when things go wrong, I should say things are going to go wrong. Guaranteed, hard drives will fail, even if you have a, I don't know, a 99% success rate for hard drives, even 1%, that hard drive fails, that takes out gigabytes of data. It's going to happen. We know that it's going to happen. There was a, this is real life. We're very dependent on computers, but we don't, a lot of times our approach to it is not to think about availability and where's my data?
[00:01:10]
How am I backing it up? There was a fire in, I think it was South Korea, a while back, and it was just, they had all their data storage for the government in a single room. And there's a fire because the fire suppression system didn't work, wiped everything out. All the data is gone, just like that, boop. And you're like, oh, surely they had a backup. They did. Guess where it was. In the same room, yeah.
[00:01:42]
So anybody who's had things go wrong for them, or an SRE who has, like a site reliability engineer, their job is to keep things up and running, you know, they probably have like stories. If you ever want to have respect for what SREs do, go to like an SRE conference or something and just hear the things they have to recover from, recovering from failure, recovering backups. The dumb things that have happened to like break their network and you'd be surprised.
[00:02:12]
Fiber optic cable gets cut from an oil tanker in the sea who's dragging a cable, cuts your data, it's gone. All that stuff you're transmitting gone. Things fall apart. They will break, guaranteed. CAP theorem tells us that partition tolerance, it's going to happen. So how do we fix that generally? And remember, if we design our system correctly, we don't have a lot of state in there. So the server goes down, we're not losing too much because we're not storing a whole lot of memory other than, you know, whatever's happening in that particular nanosecond of time.
[00:02:49]
But generally, how do you think you protect yourself from, you know, catastrophic failure? Though a trick. Off-site backups. Yeah, in a different region. Yeah, replicas, backups, or multi-region failover. Yeah, because really, if we correct system, we're not storing anything too much. In a general server, everything's going to a database. So we just have to back up that data in some way. There's different strategies on how to do that, and generally we call that replication.
[00:03:29]
So you have a database, you always have a replica somewhere. That's a lot of, it's implied in the system diagram, but sometimes you have to be specific. It's replication, exactly what it sounds like. You're making a copy of your data across multiple servers or locations. Generally, you want to do it, if you're trying to aim for availability, you want to copy your data to a different region. Depending on kind of what you're trying to do.
[00:03:58]
But if you do things well, you always want to have some co-regions somewhere else in the world, if you can, if you're using cloud computing. And you say, well, no, I'm using AWS, rock solid. AWS goes down. US East goes down as it did in the past, and the internet stops working, large parts of it. Why? Because people had replicas, but they're all in the same region, so the region goes down, it's all gone.
[00:04:31]
Again, we don't think of this until it's catastrophic, and this is the problem. It's why a lot of companies get hacked. They have ransomware. And they're like, oh, well, we have backups. Where are the backups? Right next to the other ones. And they quickly, our backup strategy was, we immediately back everything up, so when something got infected, everything got infected. That's something you have to think about.
[00:04:57]
Do you need real-time backups? Do you want to do snapshotting instead? Should you do a combination of them? Yeah. But we always have replicas of our data, always. So one architecture is called primary replica, also called master slave, don't really use that, primary replica. So all writes go to the primary server. So what, you know, what's the advantage here? You're already seeing there's a performance piece of it.
[00:05:33]
I don't exactly know the details, but I know like if they're fighting over who's going to allow writes, you can have to deal with contention. Yeah, well, I think we'll talk about that strategy a little bit, but there is performance here because we talked about a lot of applications are read heavy. So in this case, we have one database that's getting all the writes and the replicas are just pulling from that, and all the reads are just coming from those other ones.
[00:05:59]
So that way the one primary database is doing all the writes, and there's no reads to slow it down. So it improves performance quite a bit. And then if the primary fails, the replica can just take over, you just switch over, you spin up a new one, pretty common strategy here. You also have less complexity with just a single primary writing, because you don't have to worry about concurrent writes between different nodes.
[00:06:31]
Yeah, it is less complexity in this particular one, especially compared to something like peer to peer, which we'll talk about. However, think when we see primary, think sharding. So these are probably sharded as well. So you have multiple primaries, and they're just like posted on sharding on a shard key. And then you might have a load balancer somewhere in here as well. So this is a very simple diagram, but this is the general architecture.
[00:07:03]
But what's something inherently that you're, we're going to miss here, and this one is subtle, but think the speed of computing. Say you're doing, I don't know, I'll make something up, 1,000 writes a second. What's the problem here? If you bog down your primary too much, your replicas aren't getting up-to-date data all the time, which may be okay, maybe not. It depends on that latency. One example I had in real life was a search, Elasticsearch.
[00:07:43]
I think it was Lucene at the time. Was just killing our database trying to get all those indexes and stuff, so we actually had to isolate that a little bit. Yeah, yeah, you're onto something there though. What's the latency here between the write and the read? Is there any? It might depend on where you're coming from and where they're located. Locality. Yeah. Let me ask it a different way. Are the replicas always going to be consistent with the primary?
[00:08:21]
No. So primary goes down in the middle of a, you know, random workday, those replicas are almost consistent, but you can't guarantee they're consistent. There would still be a risk of data loss, right? There's still a risk of data loss there. Granted, it's a lot smaller than if you didn't have any replicas at all, but this isn't a foolproof strategy for guaranteeing availability. It's a pretty good one, and like you're saying, Michael, it's, you know, it's pretty straightforward to reason about what's happening, but it's not a perfect guarantee.
[00:09:00]
So you have something that, you know, you can guarantee a little bit stronger, or it's a little bit stronger, primary-primary. All the servers are reading and writing and the data is synchronized between. That works as well, even simpler than probably the primary replica strategy. However, the complexity here is how do you do resolutions between servers? How do you do that negotiation? That's difficult.
[00:09:28]
And generally you might have a service actually in between these two primaries that are rectifying the data and then posting it to a full replica somewhere. But it's not straightforward. You say, oh, I want to make a copy of my data. How hard can it be? Yeah, there's a lot to reason about, especially you're talking about like millions and billions, stuff goes wrong. How much are you going to lose?
[00:09:51]
How important is that? And then you have peer to peer, which is, you know, think decentralization. Crypto is a good example of peer to peer, where, you know, one peer is doing something and then updates, and then all the peers get it. Crypto is actually probably the perfect example for a peer to peer. Obviously this is very complex, but it does allow you to distribute your data really easily around the world, because you know it will be eventually consistent.
[00:10:18]
Now that latency, who knows. If one peer goes down, there's a chance you lose whatever transactions that were happening at the moment. But on the other hand, you gain, all the other peers have a copy of that as well to some degree, so you have really easy backups here. So it's a trade-off on complexity and how you want to manage that. So these are different architectures for replicating your data.
[00:10:49]
And I'll have cool diagrams for these, but the strategies you want to use, because we talked about, hey, in primary replica, they're still not fully consistent. You can fix that with a transactional strategy, which is the transaction is not done, so a write or update is not done until all of the copies have the same data, and then it's written. So in one case, so if we apply transactional to primary replica, the primary says hey, I'm the publisher, I got a write, I'm going to publish this data to all the replicas, but my write is not done until all of the other replicas respond back that we've written to the database.
[00:11:32]
That's how you have a highly consistent system for replicas. What's the problem with that? What's the trade-off here? I shouldn't say problem. What's the trade-off? Performance, yeah. That's slower. If you have to wait for every single database to acknowledge like, hey, I've written, and you post back to the publisher, the primary, that's going to be slow. That's the trade-off you're making. Now you have a highly consistent system, but it's less available.
[00:12:05]
By available, it just means it's slower. If you want to do something, it hasn't responded yet. It's a trade-off you're making. We could do snapshotting, which is, hey, let me take a, instead of these reads happening all at once, we can just have a snapshot and it just snapshots the entire database at some period. Why would you do that instead of kind of real-time, all the reading and writing? We talked about this.
[00:12:44]
Prevents infection. Yeah. Yeah. You get good resiliency. What else? Backups. Yeah, it's cheaper. You're not, it's not write heavy, so you're only doing snapshots at some period of time. It guarantees you have a better guarantee of if the data gets corrupted in some way, you can roll back to that snapshot, which was the last good state, and you can snapshot at any kind of degree you want. So it's not always, and that's something you want to consider too is snapshotting, not just this, hey, I want to keep everything in sync all the time.
[00:13:16]
Then you can merge, that's a strategy, a little more complicated, but that's where everybody's doing reading and writing, and then you eventually merge into one larger database, and that happens at some periodic interval. That's a different strategy you can apply. The truth is, you're probably applying a variety of strategies to replicate your data. You're probably doing some combination of those.
[00:13:39]
You do want to do primary replica or primary-primary, you want to snapshot those at a certain point to somewhere else. And you want to have that distributed around the world, and a snapshot's really good for that because snapshotting means you're not in a rush. You know that, hey, I'm only taking a snapshot every minute or so. You know, that's fine. I can afford to wait the time. So you always want to have different strategies and combining all these strategies for replicating your data, not just one.
[00:14:12]
It's when you do just one, that's when it gets people. But why do people do just one? Why do they keep backups in the same room as the other backups? Because it's easier and it's cheaper. And that's something we have to think about as engineers or system engineers is, if you're not familiar with that or you're not paranoid about losing your data, everything's viewed as a cost. Oh man, this is, why do we have 20 databases here?
[00:14:36]
They're all replicas of one. Why do we need 20 replicas? That doesn't make any sense, it's expensive. Let's take it down. And that's kind of the good fight people do. But there is such thing as like overly redundant, so you got to kind of balance it out. We can always replicate all our data all the time, real-time, all that, but it's going to be expensive as well. But generally if you're talking about a system design interview, all you need to know is, hey, here's different replication strategies, here's what I would do one or the other.
[00:15:15]
Transactional for obvious reasons, or hey, our to-do app, it doesn't have to be super consistent. It's okay if we lose data in between because it's faster. It has more availability. Any questions on data storage before we move on to the next exciting one? Do snapshots keep snaps of the entire database or is it iterative and keeping only unique changes made since the last snap? Yeah, could be both, depends on what you want to do.
[00:15:40]
Yeah, you can say, hey, actually, it's kind of like git is a good example. Git isn't copying the whole tree over, generally unless it's a new tree. You're just saying like, what's the difference between what happened here and what happened here? That's the only thing you're saving. So that's one strategy, or you could say, I don't trust this data, I'm going to snapshot it over time. And that way I have a bunch of different snapshots that I can roll back to yesterday at any given time and I know that data is safe.
Learn Straight from the Experts Who Shape the Modern Web
- 250+In-depth Courses
- Industry Leading Experts
- 24Learning Paths
- Live Interactive Workshops