Backend System Design

Relational Databases

Jem Young
Netflix
Backend System Design

Lesson Description

The "Relational Databases" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem explains the differences between relational and non-relational databases, including data consistency, schema flexibility, migration costs, and scaling. He emphasizes choosing the right database early to avoid expensive changes and to scale effectively.

Preview

Transcript from the "Relational Databases" Lesson

[00:00:00]
>> Jem Young: The important thing about relational and non-relational, because it's really easy to get tripped up, especially if you look this stuff up, because it'll be a rabbit hole online, people be like, oh, this and this, throwing information at you. Um, one is going to be stricter than the other. That's the way I think of it. If I need a strict consistent data storage scheme, uh, you know, I use relational.

[00:00:23]
If I need something that's like I need some sort of storage, I'm not sure, I'm going to use something non-relational, and I can always switch later, especially a non-relational. The cost of migrating. I won't say it's free, but it's, or easy, but it's not as difficult. Relational, if you get your schema wrong upfront because it's so strict, like, let's say you spent a year writing and creating these tables and it turns out, oh, we actually missed a column or something, or the relationships weren't correct.

[00:00:52]
Uh, it defeats the whole purpose. Fixing that is very expensive. It's very expensive. Anybody ever gone through that? I can do a database migration. All the time. We inherit a lot of things. Like a tiger team of uh debugging over there. It happens. The thing you know about non-relational is it horizontally scales really well. Uh, in most cases it does it automatically, which is awesome. Relationals do not scale automatically.

[00:01:27]
You have to decide how to do that and we'll talk about how to scale a relational. Remind me what's the difference between horizontal scale and vertical scale? The horizontal scale adding multiple ovens and the vertical scale is getting a bigger oven. Yes, that's great, yeah. And when it comes to scaling, a relational, you always have to scale, uh, vertically. There's different ways to structure that, but it's, that's probably a good mental model to have is relational databases can only get bigger, and then you have to slice the data, and then you can create another server, but you can't just automatically, I'm going to horizontally scale.

[00:02:12]
Why is that in a transactional style? Because the transactions. Yeah, if you can't just load balance a relational database very easily because what if you're say, doing collaborative document editing, and then someone's writing to one database, the other person's writing the other database, but they're both the same document, which one wins? And, you know, again, it's a spectrum, so you can do that, you just have to rectify that later, which is expensive, or maybe it's OK if your volume is low.

[00:02:41]
I know, I'm pretty much just saying the senior engineer, well, it depends, um, but again, there, this isn't binary, uh, when it comes to all this. Most of these you can make work in some way, but the default is usually to go with the relational database in the, in the beginning and then scale up from there. It's a lot easier to go relational, and then go to non-relational than it is to go to non-relational to relational.

[00:03:00]
That's, that can be expensive. Any thoughts or questions? Do you all, did I evoke any emotions saying relational databases? I have to use both, so. Maybe it's just me. I, yeah, sequel, yeah. If you're going from non-relational to relational, it would just be really difficult because you don't have typically a super strict schema, non-relational, so you're kind of adding a bunch of like could be null columns to bunch of tables, I guess, which I don't really know why you'd want to go from not relational to relational, but.

[00:03:41]
The example we have is um data warehousing are, we primarily do Mongo, but, and that works great for the platform in general. But when you're dealing with massive data sets over a long time and a lot of data teams, the data warehouse became helpful. So, we have jobs that ETL data into data warehouses for those purposes. That on the human side, we had a challenge because we went from SQL to no SQL.

[00:04:08]
Everybody at our company understood relational. Nobody understood non-relational at that time. So, uh, if you don't take a step back and learn how to optimize your data, you shoot yourself in the foot pretty good. Yeah, something interesting in there about relational or non-relational. Again, relational, object oriented, very intertwined there. Uh, relational databases go back to the 70s, whereas non-relational are, you know, 2000s or so.

[00:04:40]
So, very, very different way of thinking about data, which is why I think as times change, we move more towards non-relational because they can do a lot of stuff relational can do with the right database. Uh, Mongo is a good example of it can, it can do transactional data if you, if you really wanted to. Um, it does it fine. But we still use relational as to go to because, again, it's been around so long, that's what a lot of people grew up using and that's what we use by default.

[00:05:07]
So, yeah, computer science is a weird thing, um, we're not free of our own biases there. Like I know you said like you haven't done a ton with databases, but do you have any recommendations for working with non-relational databases and just making sure it's not the complete wild west? Yeah, um, I'll get to that in like 2 slides or so. I'll have the, here's my general advice, um. You'll be OK if you had to pick one of these.

[00:05:41]
Uh, so we said relational databases, they, they excel when your, when your data has a clear relationship to each other. Uh, you know, we've got customer. They need a pizza. What's the relation between them? In order. So every customer is going to, every order is going to have a customer ID, a pizza ID, and that's the relationship between them. And you know, why does this matter? What's the benefit of strong relationships?

[00:06:13]
Preferential integrity. Yeah, referential integrity, yeah. What else though, this is the secret to inquiry performance. Yeah, you can do really complex queries quickly, uh, with a structured data, because it all, it's all structured so that the searching and the querying is really fast. And that's the benefit you get from a relational database, versus uh a document store like Mongo or something like that, where the data is unstructured, that the query is going to be slower because it's got to go field by field.

[00:06:38]
Whereas this, you're saying like, hey, I'm looking for someone's name. I don't need to search every single record for that. I know exactly where it is, and I know exactly the column that it's at, and I can extract that very, very quickly. And that's the benefit you get with relational databases. If you, if you have your indexing. Yes. What is indexing? Uh, there's different types of indexing, but I guess in this case it would be probably non-clustered index, um, which is typically a B-tree, so you have different ranges of like, I mean you could think of it like as an array that you would look up that ID or that to that index and it would tell you exactly where to go rather than sequentially just going looking through every row for that data.

[00:07:33]
So it's like way faster. Yeah, uh, I look at indexing, it's like a hack, almost, um, instead of looking at the entire database, I know exactly where to go. It's right there. Um, and the, in the background, uh, most of the relational databases like, uh, Postgres, they're already doing the indexing for you. You don't have to think, you don't have to do any of this stuff. It does it automatically. It's optimizing.

[00:07:58]
And that's what you get with structured data is you can do, understand really complex relationships. Can you do that with a NoSQL database? Yes, you can. Will it take longer? Yes, it will, because again, you're comparing apples to a Ferrari. Uh, what's the relationship there? Don't know, so we've got to do something really complex to figure that out. They're both red. I don't know, maybe. Whereas structured data, you know, instantly you already know that relationship.

[00:08:37]
And structured relational database are ACID compliance. So, anybody ever heard that phrase before? Mostly around transactional data. Yeah, yeah. So ACID is, uh, the commit is, uh, the transaction is atomic. So that means it either works altogether, the entire flow or it doesn't. So let's say you have, uh, need to do a read and a write, and then 2 reads in a single transaction or something like that.

[00:09:04]
All of those have to work. And if one doesn't work, the whole thing fails. So it's a guarantee that it's going to succeed or you'll know about it. So that makes, that makes the relational database very consistent. It's consistent in that, you know, the success happened or it didn't happen. There's no, oh, we actually, we had some downtime and that right didn't get actually put in the database. That happens because you can't guarantee it with non-relational databases as easily.

[00:09:36]
But we always know that, uh, in a relational database, it's consistent in that way in terms of it's going to be atomic, it's going to guarantee to write or fail. Um, but also, we know that every single transaction follows the same set of rules. There's no prioritization. We also guarantee that two transactions at the same time will not interfere with each other. So they're always going to happen sequentially in a way that, you know, some order that makes sense.

[00:10:00]
So generally, we're going to see something like, um, it's ways of doing transactions. I don't know, like date time, which one came in first? There's always going to be a winner there. And we know that there's 2 transactions that happened, they can't interfere with each other. Why? Because we're writing to these different tables or the same table, and it makes it really, really easy because the data interrelates already, but it doesn't have to talk to the other tables to do anything.

[00:10:29]
And we also know that we guarantee with ACID that the transaction is durable. So once that transaction is in the database, it's permanent, nothing's going to happen to it. Nothing, another transaction can't come in and overwrite that by, by mistake. Let's say it's like a race condition that doesn't happen with uh relational databases. Any, any questions on ACID compliance? Yeah, so I do have one question about consistency.

[00:10:53]
Like, is that the same consistency in the sense of the CAP theorem where, like, I'm just wondering if you choose a relational database, does that sort of make the choice for you? Yeah. Yeah, relational databases are consistent. That's our guarantee there with ACID compliance, yeah, yeah. So basically like if I choose an SQL database more often than not, I'm basically going to be giving up availability out of the gate.

[00:11:21]
It's a, it's a spectrum. Um, I'd say no, like you can, you can make it available, but if you're on the margins, which one is it going to be? It's going to be more consistent than available. Yeah. And the reason for that is, there's a few, but one is because we have this guarantee, it can actually lock, lock a table up until the write is done, which actually be really slow because you can have a long buffer of transactions waiting to happen, and they start backing up into the queue for the database, which can be slow, because you're like, why is my database read going so slow?

[00:11:55]
Well, it's in the queue and there's a bunch of writes coming, and then your reads there. Oh, OK. So that's why it's not as available, and that's what we mean. But it's still available in that sense. Uh, I think something I read on consistency was the transactions move the database from one consistent state to another. So that means you can't ever put a read or any sort of update that's going to throw your tables out, out of whack.

[00:12:18]
Uh, which is an interesting concept. You can't make a bad query essentially and mess up the whole database. That's a guarantee you have. So if it's a bad query and you're trying to update something that doesn't exist, it just won't work. That's not the case with non-relational databases. You can make, put in weird data. I don't know, maybe it's unformatted or something. I don't know how to take down a database.

[00:12:49]
Kayla, you have a look on your face. I know how you can take down Mongo. I know how you can avoid it. Um, like the one we hit. Indexes matter a lot with Mongo and there's some nuance into how you design your schema. You don't design it like relational at all. In fact, you probably denormalize, which really bothers people and then, you, they have a really great course on this too, actually. But at some point, someone will put in an index and forget about it and then their data changes if you're not watching it.

[00:13:23]
And we have scripts and alerts to watch that. Uh, regex is usually the thing that bites people because you can do regex queries. Interesting. You said denormalization, what is that? Oh boy, so when you deal with the SQL stuff, everybody always talks about like how to get consistent data and very solid data, and they talk about normalizing your data, which is deduping data, so you have more relations and it kind of sprawls out.

[00:13:51]
And then denormalizing is, is the idea that that doesn't, you don't do that as much and you might be OK with duplicating data. So there's a 16, I think it's a 16 megabit limit on Mongo document size. So if you have a lot of data, you'll have to figure out how to break your documents up. And you get better performance on reads if you just put everything in one document, but at a certain point, you might have to break it up.

[00:14:24]
So you look for where you can do that. Yeah. It's a, it's a hard problem you can get into, um, and you have to think, what am I trying to do? Am I trying to optimize for searching? If so, there, there's a, there are NoSQL non-relational databases that are optimized for that, but with relational database, you know the search is going to be fast. You can, you can guarantee that because there's a relationship there, um.

[00:14:52]
Well, yeah. Just think relational, simple. First thing you go for transactional data. You got that and then you're pretty golden on relational databases. The most popular ones are going to be MySQL and Postgres. And there's a debate, which one should I use? It doesn't matter. Honestly, it probably doesn't matter that much. MySQL is the most popular one, not, not because it's the best, but one, it's free.

[00:15:20]
2, everybody's used it before, um. 3, it's open source, uh, is there 4 there? No, it's not 4, it's probably the most popular because it's part of the LAMP stack. Um, now it's a throwback. I don't think the LAMP stack's a thing anymore, is it? It changed, I think. What is it now? Wasn't it like MEAN or MERN or I don't know. I can't keep up with it. I think they just make this stuff up. It's Blairville.

[00:15:46]
Uh, Linux, Apache, MySQL, and PHP. Maybe it's still around. Maybe there was more on top of it. LAMP seems very dated. Like that seemed, that's like a very early 2000s stack, yeah, because I think my understanding was that MERN, or like, sorry, MEAN came out around like 2012 or so. That was the Mongo one, yeah, Mongo Express, Angular, Node, and then React got popular enough that they figured, OK, like let's swap the A out for the R and make it MERN.

[00:16:22]
That's what it was. Who makes this stuff up? Front image, Twitter, Twitter, TikTok, yeah. They're catchy. Uh, so MySQL is usually a pretty good choice, but, you know, if I, if I'm being forced to pick, probably Postgres. It's a more up-to-date one, it's more capable, um. It can be, it's a relational database, but it can actually act as a non-relational database if you need to. It can do a lot of stuff.

[00:00:00]
Postgres is very capable, um. Don't, don't, uh, pooh-pooh because it's not in the NoSQL category of things. But really, most of the time it doesn't really matter. MySQL is going to be a little bit faster for reads, Postgres is going to have more features, read more feature rich in general. But you can't go wrong with either.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now