Check out a free preview of the full Complete Intro to Databases course

The "Terminology" Lesson is part of the full, Complete Intro to Databases course featured in this preview video. Here's what you'd learn in this lesson:

Brian defines a database as a place to store data, explains that a schema is used, explores the various types of databases, and discusses when it is best practice to use them. The ACID acronym and database transactions are also explored in this segment, particularly around their usefulness relating to queries.


Transcript from the "Terminology" Lesson

>> Terminology. There's a lot of like terms that people around databases like to throw around. And hopefully we can demystify some of them. Sometimes it's hard for me to even know what terms I'm using cuz I'm so used to hearing them so frequently. So if I'm saying words that you don't understand, just stop me so that we can make sure that everyone gets these.

So database is really just a repository of data, right? If you can imagine like a program that you're writing you have like an object that stores like your state like a Redux store or something like that, right? This is basically that just on a much, much larger scale.

So instead of storing a hundred things, you're storing a million things billion things, right? That's what databases can do. These are meant to be giant scale things that can scale across the entire world that can have, resiliency and store very important things on it, right? That's what databases are for.

What's fun about them is that there is a variety of different ways that they can work and it depends on what you want them to do. So that's what I'm gonna teach you here. I'm gonna teach you four different kinds of databases. That's why I chose these four different databases that we are gonna be looking at today is because these are exemplary of, different types of databases.

It's actually funny I was working with our artists to kind of come up with this picture here. And she was like, can you explain to me a little bit what a database is and I came up with this analogy and then she loved it so much she actually put it into the artwork.

And I asked her, what's the best way to get from point A to point B if that's only a mile a way. If you have a bicycle, you have a car, you have a boat, and you have an airplane, right? Well, I think the answer is, it really depends what's in between the two, right?

If you're trying to get from one side to the of Amsterdam to the other, having a Ferrari is not gonna help you very much, right? That city is so jam packed that having a bike is gonna be the best thing you could ever want, right? But if you're trying to get from point A to point B and it's the middle of the ocean, you probably want a boat, right?

And so it depends on your, your job requirements of which vehicle you're gonna choose. That's kind of the same thing here with databases, they're all gonna get you from point A to point B. But one of them is gonna be significantly better for the problem that you have, right?

That kind of makes sense. So you can see here she kind of ran with that there's like a NEO4J, hot air balloon. There's a Redis car, there's a MongoDB boat and there's an elephant cuz everyone likes Postgres, elephants, okay? So that's kind of why we have all these different kinds of databases is cause they're gonna work better for different kinds of things.

So when you have these like massive stores of data and you wanna be able to get certain resources out or update or delete certain things, you need some sort of query language, right? Some things that you can hand to it and say, here's an action that I would like to take.

And so all of these databases have various different forms of query languages that we're gonna have to learn. So you're actually gonna end up learning four different query languages today. But most of them are intuitive enough that you should be able to pick them up pretty quickly. Okay, so that's a database.

It's a repository for lots of data. And then that's another good term to pick out of there's query where I'm gonna be using that term a lot. It's basically just something that you're gonna send to a database, some action or request or something like that. That's, a query.

Another good one to know is schemas. So a schema is basically what's the shape of your data look like, right? So I had put an object in here. If I have this object here that says name, city and state the schema of that data would be the name and the city and state.

So I could have five different objects and they would all have the same schema. They would all have name, city and state despite the fact that the values of those would be different, the keys would always be the same. Another good way of thinking about that is like think of a spreadsheet like Excel or numbers or something like that.

The columns would be the schema of your Excel spreadsheet. And you'll notice I'm gonna refer to Spreadsheets a lot during this course, it's because a spreadsheet is a database, right? Of some variety, it's a way of storing data in a structured fashion, it's a database. I've actually seen way too many people use like Google sheets directly as a database.

So it's actually possible to use a spreadsheet as one. Not suggesting it, not teaching it, but it is possible. So schemas are important because some databases like Postgres are gonna require us to be very strict about our schema. We're gonna have to tell Postgres upfront, here's every key that I will ever give you.

And if you try and give it another key, it's gonna be, I don't know what to do with that. And that just fails the entire query altogether, right? And you can contrast that against mango, mangoes and be like, give me whatever you want. I don't care. It's schemaless, right?

So we'll get more into that. That's a word that you'll hear me thrown around a lot in this course as well, which is schema. We're gonna be talking about document, relational graph, and key value store databases throughout this course. But I wanted to let you know that there are actually a lot more than just those four.

Those four might cover maybe I'm gonna guess 90% of use cases. But there are a lot of other ones as well that you should know about. One of them is search engines, not like Google, but more like something called like Solar. Or Sphinx is another one that's really popular that allows you to basically create a search engine for your database.

A lot of these databases don't have the ability to do what's called full text search. Which is to say, here is three words, here's I wanna search for local beer inside of my database needs to span across all the different columns. You do something like a search engine to do that.

Some of them had that ability just built in like Postgres and MongoDB do but some don't, so you'd use a search engine. Another one is called wide column databases, these are fairly new type of database if you've heard of what is it, Google's big table. Google's big table's kind of the famous version of a y column database, another one is Apache Cassandra, which is actually grew out of Facebook and how Facebook stores all of their data.

I don't really totally get how to use them. So I'm not gonna spend a ton of time. It's described as a two dimensional key value store and if that doesn't mean anything to you doesn't mean anything to me. So that how they describe it. Message brokers actually might come back and do another course sometime on message brokers cuz I think it's really important and really cool.

This is gonna be something like Kafka or rabbitmq. It's the ability to pass messages between your programs. I think it's really underutilized by node developers cuz it's really a really is conducive to how to write node apps. And then another one's called multimodal databases, which is a basically a database that knows how to act.

Maybe both as a relational and non relational database at the same time, a good one is Azure, Cosmos dB. That's the one that I work a lot with obviously because it's an Azure but another one would be like a Rango dB. Those are multi paradigm cuz they can work in different ways.

That's not all of them as well, but that's probably like 95% of them. Now that I've described to you. But just go look at the Wikipedia article of databases. You'll see that there's a ton of types of databases. ACID, let's talk about that one for a little bit.

So it's an acronym that gets thrown around a lot as databases. And I just wanted to make sure that you understand why that gets thrown around. It stands for atomicity I think is how you say that, the, the atomic nature of databases, consistency, isolation, and durability. And it's kind of like the qualities that a database can have, not all of the databases that we're gonna be looking at today, have that.

And some of them do, but you need to understand like the trade-offs that you're making, when you choose one of these databases. So the first one, admittedly is does this query happen all at once? And so it's atomic in nature, so it means you literally cannot divide the query into multiple parts.

It is a single query. That's important for things like you can imagine big bank tracks transactions, right? If I'm gonna subtract $10 from my account and add $10 to your account, you want that to be one transaction because if that happens in multiple transactions, that means. What happens if one happens and then the server goes down?

I mean, someone lost $10 and someone didn't gain $10. So that means you lost $10 altogether, right? So you need that to be an atomic transaction where the loss of $10 here and the gaining of $10 here is all one atomic transaction. Does that make sense? It ends up being extremely important, these are called transactions.

So yeah that's really important, you'll see it down here, we'll talk transactions again as well. The next one is consistency. A good example is as you always have multiple database servers running at the same time. So if I have five database servers go down, and I have a sorry, if I have five running and I have one go down, the other four need to kind of pick up the slack.

And if your database isn't consistent, that means that they're gonna be out of whack a little bit. So, you don't actually know which one knows the truth. And even worse, you might actually lose data. So if your main database goes down, and you have four others and they're lagging behind the main one.

That means you could lose data in the process which again for like banking stuff, that's a disaster, right? Someone could be losing millions of dollars just because your database went down at a bad time. You'll notice a lot of these times I frame this in terms of money.

It's because banks have been really driving databases for the past 50 years, right? So a lot of these features exist because banks have made them that way. Isolation. Isolation is just kind of, we don't necessarily talk about this one as much. But if you have a query that's gonna be doing a bunch of things all at once.

It basically means that you can run them concurrently, so you can run the transaction broken down into pieces, all at the same time. Or it can run it serially. So, do this one, then this one, then this one, then this one, right? And everything will happen the same way.

So it just basically means you can decompose it into parts, running it in different parts and that's still gonna work okay. And then lastly durability that just says, if your server crashes, you can restore it back to where it was, right? And that just means it's writing to disk successfully.

And it's means your data is durable to a crash, okay? So that's ACID. So if you hear database described as ACID or if you hear a query is described as being ACID, that's what it means. It means all of those boxes are being checked. So you might be thinking right now is like well, I want everything to be ACID, right?

I want everything to be totally done this way, but the thing is you're making a trade off, right? ACID is safe, but it's slow. If you're reading through these something like durability means if your query is durable, it means that it's not only writing to the main server, but it's actually going to be writing out to all of the other servers and they're all going to be writing to disk.

And that you're gonna have to wait for all that stuff to happen before the query is marked as successful. That is very slow, particularly if your database is spread across the entire world, right? So instead of waiting, five milliseconds, you're now waiting two seconds for your query to totally complete.

That might be worth it again, if you're moving money around, but if you're just updating a profile picture, it's probably okay if that's not totally consistent and totally durable, right? There's acceptable loss in data. So that's what I'm just saying is know trade offs that you're making, know which data is acceptable to lose, know what data is totally unacceptable to lose.

And you're gonna have to make choices on databases based on that. Transactions the one, I just wanna call this out specifically. Sometimes one query can't accomplish all the things that you wanna do. So you're gonna have to send multiple queries to do accomplish the thing that you wanna do.

But again, you can't break it up. Most of these databases, in fact, I think all of these databases allow the ability to do transactions. Which is I'm gonna send you five queries, but I just don't want you to do anything else besides those five queries, right? So it actually makes those five queries act as one query and that's considered a transaction.

And you're just guaranteeing this either all happens or none of it happens. Because I can't have it partially happen, right? Which going back to the money thing, if you're gonna take $10 from this account, you must add it to this account. Don't do anything in the middle. I can't have that inconsistency in my queries.

If you have a multi threaded database, you're sending a query that can be broken down into multiple threads. That query needs to work the same way in parallel it then if it ran sequentially, which basically just means this query needs to work the same way every single time, no matter what you send it.

No matter how the database is processing it, the query needs to work the same way. That's probably a better way of putting. And to be frank, I don't really understand isolation as much because it's not something you ever talked about. It used it used to be important and now basically everything checks that box.

>> So a transact transactions relate to a person a all databases?
>> No transact transactions relate to almost every database. It's how's databases get around the problem of I have a thing that must be atomic, but I don't have the ability to do this in one query, right?

It's like I have to say this idea in one sentence, but I have the grammar structure doesn't allow me to do it one sentence. Therefore I'm gonna be using semi colon to get this into one sentence. Does that kinda make sense? So it's related o the A, I guess you would use it to fulfill the A.

Yeah, okay, I guess they are kind of tied to each other.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now