Transcript from the "Firestore & NoSQL Overview" Lesson
>> So this entire time we've been very general. I've been like well I'm gonna create with the emulators, I created a Firebase project, I'm gonna do this. I wanted to show a very general workflow of what the Firebase ecosystem looks like but now you have a good idea of when we're using what tools and how we deploy websites and how we connect out to databases and sign in users.
[00:00:22] I wanna get very focused so from here on out we're going to be very focused on individual Firebase services and I wanna start out with a very big one. This section is all about Firestore. So Firestore is a NoSQL database with real time and offline capabilities. And whenever I teach NoSQL databases I always am a little bit hesitant because most developers come from an SQL background.
[00:00:51] And what I've learned is cuz I too came from an SQL background, is that NoSQL can be a little bit jarring at first to SQL developers because they're very different. And it's not that one is better than the other, they're both two totally different tools that are great for different purposes.
[00:01:13] It's that they prioritize different things. So what I wanna do to explain Firestore is I wanna talk about how SQL databases work at high level and then sort of map those concepts on to NoSQL databases. And now just a quick show of hands or maybe in the chat, who here comes from background whether at work or whether from just learning, that you primarily use SQL or SQL databases?
[00:01:44] Okay, who here is NoSQL databases? We actually got a good mix here, wow. When I first started doing this, that was not the case. It was about 90 to 100% of people. So in the SQL world, we have all of our data formatted and structured in tables. And with tables we have two aspects of them.
[00:02:10] We have rows and we have columns. Columns are individual data types. So first name is a string, high score here is a number. And some columns are more important than others, well all columns are very important but some are more significant I would say than others. And in this case the UID column is very important because it uniquely identifies a user and that is called a primary key.
[00:02:35] So having that primary key I can find a user row which is what we usually retrieve back into our system and hydrate into our objects. Now this is just a table of users but what if I wanna store data that this user owns? Well I could go out and if I'm making an expense tracking app, I could create a table called TBL expenses and it has its own primary key of ID.
[00:03:04] I can store the cost of the expense, the category, the date but then to assign the users ownership of that record I create another column of UID that shares or relates that user back to TBL users, this is known as the foreign key. And what's great about this is that all my data, just looking at it, it's nice, it's squeaky clean, there's null columns, everything is very well structured.
[00:03:31] There's no duplicated data everywhere, it's highly what we call normalized. And if I wanna get this data back I just create this little query where I say, select these fields from TBL expenses where the user ID equals this user. Now one of the things about columns with SQL is that all columns must exist on all rows, you can't have a conditional column.
[00:04:00] And so let's say that we're building this expense tracker app and there's a new feature that's gonna come out to it. And someone says, hey, expenses, they need to be approved but not all expenses, only business expenses have to be approved. And so we need to store a field that's shows whether that's the case.
[00:04:24] So I could say all right, let me add an approval column but if it's true the expense has been approved, if it's false it's been not approved. But what do I do about the cases where it's a personal expense and it doesn't have to be approved? Well I guess I can make it nullable so it doesn't apply to those categories.
[00:04:47] But now I'm I kind of in a situation I don't like because this column type is Boolean but it has three states. It has true, false and null and if the a column has three states is it really a Boolean? Is a nullable Boolean still a Boolean? True, false and null, I don't know.
[00:05:08] Those are the kinda questions that I think keep you up at night. So maybe I wanna rethink my data structure differently and I say okay, well what if we go and let's just create another table called TBL business expenses, and then that has an approval table and TBL expenses doesn't.
[00:05:26] And now if someone says get me all of the expenses together, I can write a query for that. And I say okay, let me select the ID, the cost, the UID from the TBL expenses, from this user and then let me union it from the business expenses and so this all comes back.
[00:05:45] But now as my data model changes I might end up having to do more and more things like this. And so the pro here is that the data integrity I have is really high. As long as I keep my data normalized, as long as I keep structuring everything out that way, all my columns they make sense, I can be very sure of the data that's in there.
[00:06:05] But some of the cons are is that as the data model grows our queries become more complex and the more complex of a query is the harder it becomes to scale that query and get it to respond in a very fast nature. And these highly normalized data structures can lead us into writing slower queries.
[00:06:29] And to scale this up traditionally, in SQL databases you have to scale what is referred to as vertically. So the two sort of main ideas or methodologies of scaling within Cloud computing is you have vertically scaling and horizontally scaling. Vertically scaling is when you say okay, on SQL databases I'm going to just get a bigger better machine.
[00:06:52] I actually am not a hardware person so I don't know if they're bigger per say but they're more expensive, they're more powerful machines and so as my dataset grows, as my queries need to become more powerful I need to get better machines to scale up. Whereas with NoSQL data bases the data model is different.
[00:07:12] The way the database forms is a lot different so I can actually scale what is referred as to horizontally. And this means that instead of getting bigger, better, faster machines I can just get more machines in general. I can just reproduce my data across multiple machines and the way that Cloud providers works and specifically to with Cloud Firestore is that scaling up happens kind of just completely transparent to you.
[00:07:38] Is that you are just getting data and maybe you are spiking through a ton of traffic or your data is growing at an incredible rate. Well in the background it's horizontally scaling for you and you don't have to make any of those decisions. So when it comes to looking at SQL verse NoSQL is that in the NoSQL world is that we prioritize reading data over writing data.
[00:08:05] So we prioritize getting a very very fast read from the database, rather than making sure that all of our data is written in a very great and normalized format. And to think about this go think about the top five apps that you use on your phone and how often would you say that you consume content from those apps versus enter in data into those apps.
[00:08:29] So think about any social media app, how often are you scrolling and seeing every post versus how many times are you actually posting yourself or even just entering a like on the post or something that? It's usually magnitudes higher that we are consuming, so reading the content from any given system than we are actually writing data into it.
[00:08:54] So as we go through this section we're going to move away from but not entirely forget the principles of normalization, having high integrity but rigid models and even things like joints. And what we're going to get out of it in return is we're gonna get fast queries, denormalization, we're gonna learn some principles of that.
[00:09:15] And then all of our queries that we're going to operate are all gonna work in real time. So we saw how data is structured in SQL databases, how is it structured in Firestore? Everything in Firestore is done with hierarchy. We have collection, document, collection, document and it all flows down a tree like that.
[00:09:37] You can't store documents at the top level, everything starts with collections. So if we think about tables, while we always move from tables to collections they're not exactly the same but we can kind of think about them in that same idea. So in this case we have an expenses collection and then within here we have two documents with generated IDs.
[00:09:57] And the documents have IDs and we can think about them as primary keys. And within there we have data which is sort of in this object format and this is the cost UID category, we refer to these as fields instead of columns. And what we noticed here is that in the case of our business expenses versus our regular expenses is that I can actually say approved true on the second document rather than on the first document and that's fine.
[00:10:26] Documents do not have to conform to any type of schema, column format or anything. You could have a collection called grab bag and every single document could follow a completely different schema and different structure entirely, there's nothing stopping you from doing that. There are ways which we'll see what security rules, to actually enforce that documents do follow a certain structure and data types but at the database level there's nothing preventing you from just putting whatever data structure you want.
[00:10:58] So you have total flexibility now and you don't have to worry about adhering to different columns as your data model grows. And then as far as thinking about data and rows and a table, while rows are individual documents within the table. So we have this document ID and then all that document data and just like with columns in SQL is that all of our fields can have complex types.
[00:11:25] It's not just JSON, it's not just strings and numbers and Boolean, you can store timestamps and geo coordinates and binary data and lots of complex types inside of a Firestore document