Backend System Design

Non-Relational Databases

Jem Young
Netflix
Backend System Design

Lesson Description

The "Non-Relational Databases" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem introduces non-relational (NoSQL) databases and outlines four main families, including document stores, key value stores, and column databases. He highlights common examples like MongoDB, Redis, DynamoDB, Cassandra, and HBase, explaining their strengths in flexibility, speed, caching, and handling less structured data.

Preview

Transcript from the "Non-Relational Databases" Lesson

[00:00:00]
>> Jem Young: No, I'm relational, also called NoSQL. Not quite accurate because some of them do use query language, but I don't know, I think people that hated SQL were like, what can we do? What can we name something to espouse our dislike of this, the stack over here. So they're often called NoSQL databases, which is, you know, it's fine. Not always accurate, but it's okay. There's a lot of families of non-relational databases, so we can just talk about the top four.

[00:00:32]
So you have your document source. So when you think document, think something like JSON. It's kind of a blob. There is some structure to it a little bit, but it's not going to be as strict. Every JSON is going to have the same fields. If you're doing any sort of like a catalog of something that may change over time, doesn't have really strict schema, you know, document or is a pretty good way to go.

[00:01:01]
The main one is going to be MongoDB, very powerful. That's going to be your go-to. I think Couchbase is another one that's pretty popular as well. You have key value stores. Key value, what do you think that is? Redis, it's always Redis, so that's the first one you go to. DynamoDB also one. Key value stores are very, very fast, very fast. Why? Because there's only two things you're writing, the key and the value.

[00:01:33]
So those are typically appended only, right, key value implementations. I don't know, you can update Redis if it's there, but your cache eviction might make that not common. Yeah, if you're using it as a cache, which frequently a key value store is used as a cache. Why? Because the reads are very, very, very, very fast. There is no lookup. It's already indexed for you. What's the key right there. Yeah, it's always going to be Redis though.

[00:02:09]
That's, if someone's like, hey, what key value would you pick, you just pick Redis and unless you're in like a really deep domain specific interview or something like that. But caching, session storage, anything you need a really fast lookup, key value store. Then you have your column databases. So when we think about the design of a data store table, what does it look like? It's a table, right? Yeah, yeah, yeah.

[00:02:44]
So you have your rows and you have your columns. What is the column in a data store? Typically it's the schema field. Yeah, the column in a table is going to be similar data. This is the name, name column. It's all going to be names. So you have databases like Cassandra or HBase, which are optimized for searching because everything's in a column, so like, hey, I need everybody's name, cool. But I don't need a strict structure.

[00:03:16]
I can use a column database. So they're very fast as well. And you know, you get the benefits of being a little less structured. I think you can also encode them differently because there's a lot of repeated names, so it actually reduces the amount of storage because if you have a repeated name, you can just have like that encoding and then something that represents how many times that repeats, yeah.

[00:03:45]
Yeah, good point, yeah. So you'll see column databases. I think Cassandra is the big one. I think my company uses Cassandra, a lot of companies do, pretty popular. And then you have graph databases. I've never used a graph database. I'm just not in that domain. They're pretty specific. But it's where you want to map relationships between two things, but not in a relational way, just in a connection sort of way.

[00:04:16]
So what's an example of something I would need a graph? This one should jump out to you. Social networks, yeah, that's what, you know, a lot of times graph databases are used for, or fraud detection is another one. Where, hey, what's the, you know, I'm getting a lot of traffic from this one, from, I don't know, this transaction or a set of transactions. How is it relating to these other transactions, and you make these connections, you're like, oh, so when this happens, this happens, this happens, this happens, that's probably a fraudulent case.

[00:04:54]
And that's what a graph database could be good for. But I'll admit I've never used a graph database. I don't think most people have. It's very, very specific. One of our engineers used Neo4j to graph out all of our dependencies in our platform, which was super cool. Yeah, Neo4j is a, it's a cool name. I think another one would be Amazon Neptune is a good graph. But again, I think if you're at that stage where you're using a graph database, there's probably someone on your team who knows a lot more about it.

[00:05:33]
But these are going to be your primary types, so document, MongoDB, key value, Redis, column, Cassandra, graph, Neo4j. That's really all you need to know. You can get lost in the nuances and details and things like that, but just that basic knowledge in your back pocket, you'll probably ace most interviews, unless it's something very specific.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now