Transcript from the "MongoDB Ops" Lesson
>> Something I just wanna put in a section for all the various different databases here. So I wanted you to be a little familiar with how these get run from the operations perspective. Most of you, I'm assuming that you're a developer writing code to access a database. You have to remember there's also someone on the operation side that's not writing code or writing data to databases.
[00:00:22] They're just keeping the servers running, and spinning them up, and spinning them down. Making sure they're resilient, making sure that they're copied and backed up and all that kind of stuff. You as a developer, most of time, this should be relatively transparent process to you shouldn't really have to know or care how it's being run.
[00:00:38] However, there is some good things to keep in mind, we're just gonna talk a second about that. So primary, secondary is replica sets, these are terms that you'll hear fairly frequently in association with MongoDB. Every replica set, which is a group of MongoDB servers has a primary, which is the thing that you can write to, right?
[00:00:58] And it has secondaries, which are the secondary servers that are running, basically, real time copies of the primary. So you'll write to the primary, and then the primary will then flush out that data to the secondary, so that they're all relatively kept in sync. The thing to keep in mind is not fully consistent, so remember our acid, right?
[00:01:19] A-C-I-D, it's not fully consistent, which means that if I write to primary, and then Mongo says, okay, I accepted your query, it has not yet gotten to the secondaries. This is a term that is known as eventually consistent, which means it fails the C part of the acid test.
[00:01:38] Unless you tell this query must only be accepted once it's been written not only to the primary but to all the secondaries. And then you achieve that see part of the the acid, that's much slower and usually not what you need, right? Usually eventual consistency and when I say eventual it's usually milliseconds, usually not more than seconds, probably not minutes, right?
[00:02:04] It just depends on how overloaded your servers are, how busy they are, blah, all those kind of things. But that's the point here is that you have primaries, you can only write to the primary, that's a big key here. Only the primary will accept writes, you can read from secondaries, right?
[00:02:19] Because the secondaries are just eventually consistent copies of the primary, so it's okay to read from secondaries. It's a good idea if that's an acceptable trade off too, because the secondaries don't get as much load as the primary, right? The primary gets a lot more load because it's handling all of the writes.
[00:02:35] And so a group of primary, primary and secondaries is called the replica set. These are all running things called the MongoD process, the Mongo, Damon demon, however you prefer to pronounce that. And yeah, that's a good thing to, but let's talk about arbiters and elections. So what happens if your primary goes down?
[00:02:59] Well, if your primary goes down MongoDB I'm not joking, this is what is called, is holds an election, right? So each server, it gets a vote and they vote on who they think that the new primary should be. And it does that through a bunch of different factors like which one has the lowest latency, which one Is the most up-to-date it does a bunch of stuff to decide.
[00:03:20] This is gonna be the new primary, what it'll do is it'll immediately reassign that role to that new primary. And then it becomes a new primary and then your ops person can go around and kick up the old server and say, okay, you're back in the pool now as a secondary, you can also give them priorities.
[00:03:38] So let's say you have one really beefy computer and five smaller computers are inside of your replica set. Because your primary is gonna take a lot more resources, so what you actually might do is you might have two really beefy computers inside of your replica set. And then a bunch of smaller ones, and then assign a higher priority to the beefy secondary so that when the election happens, it's gonna be that server because it has more resources, right?
[00:04:02] So you can kind of play some of those games there. Something that can happen is if you don't have enough servers, you can have arbiters which are basically servers that don't actually, that are not secondaries. All they do is vote their entire purpose in existence is just a vote in prime these kinds of elections.
[00:04:22] Usually don't need those, but they do exist and let's talk about sharding. So let's say you have five, let's say petabytes of data, right? 5000 terabytes of data, you can't hold that on one server system, we don't have storage arrays that are that big yet. So how would you handle that?
[00:04:45] Well you would have multiple replica sets and then you would shard the data. So you'd split up whole junks of data that would go to various different replica sets. And then you have a process called Mongo s, Mongo sharding, right? That you would write a query to the Mongo s, and then that would get shorted out to one of the replica sets.
[00:05:07] So that's what sharding is it allows you to handle even larger amounts of data. And then lastly, I told you all these things, just so that hopefully you can ignore all these, I always use a managed cloud version again, I work for the cloud. So I'm a bit biased in the matter, but there is a bunch of managed versions, we would have to worry about any of this.
[00:05:28] All they do is like, hey, give me a Mongo server, and then it'll give you back a connection string and then it handles sharding, collections arbiters, all that kind of stuff for you. You just pay him you're their money and they give you fully scaled model of MongoDB.
[00:05:47] So MongoDB Atlas that's from the official people that make MongoDB that's one of them. Microsoft Azure, Cosmos DB, that's the one that I work with and there's Amazon Web Services Document DB. I'm sure Google has their own as well there's a bunch of these that exist out there.
[00:06:02] I mean they're expensive, databases are expensive, but it means you don't have to worry about ops, which is worth a lot to me anyway. It's what I mean, it's what I do if I was gonna start a new company today, that's would exact that's exactly what I would reach for is a managed service.
[00:06:15] That's the last thing I was gonna say is you can write to like document DB and Cosmos DB as if they were MongoDB. They are totally compatible with a Mongo API, despite the fact that they're not actually running Mongo. What skills should we consider managing our own database instead of using a managed service?
[00:06:32] It's a great question. Here's my opinion on the matter, if you have your own operations team at that point, you can probably consider running your own replicas and things like that. But I'm not even sure if that's 100% true even when you have your own operations team, let me refer.
[00:06:55] I would not consider it until I had my own operations team, and even when I got my own operations team, I would let a certified expert in operations make that decision. And me the developer, I would never make that decision. Things like Cosmos, things like MongoDB Atlas, this scale to enormous sizes, right?
[00:07:19] Like these companies entire businesses based on this, so they're not gonna crumble under the pressure, it's their business, right? The problem that you're gonna have with is it's gonna get too expensive. So if you're running massive clusters and doing a lot of reads and writes, at some point] it will become cheaper for you to not only manage your database but also pay the team to manage the database.
[00:07:41] And at that point, that's when you might consider switching over.