Transcript from the "Wrapping Up" Lesson
>> So asked, is database migration a common occurrence? Yes, the answer is yes. When I worked at Reddit, we did a lot of database migration. So let's talk about database migration for just a second. I guess, particular in a different question. So I'm going to answer the question that I had my head first and then I will go back and ask her pratiques question.
[00:00:22] So, database migrations from the perspective of I have one schema in my SQL database, and I need to have a different schema in my database. This is called a migration, which is why I was thinking about that, it's expensive and it sucks, and if you can avoid it, you should.
[00:00:38] So with SQL, it's really good for you to anticipate your schema needs upfront because when you have to move and do alter table commands, it's unfunded, unfunded is the correct term for what that feels like. So there's a Django that's what we used to write at Reddit. Django has an ability called South which they've now renamed to something else which will actually do ALTER TABLE statements for you.
[00:01:01] And it's messy, it takes a long time, so it's possible, but it's better if you can anticipate your needs upfront politics. Actual question is about migrating from one database to another like wholesale, right? If I need to move from Postgres to MongoDB what's that look like? I don't see that necessarily as much, I've never been party to that we actually did consider when I was at Reddit moving from MySQL to Postgres to get some of that JSON goodness.
[00:01:35] And we ended up not doing it because it's expensive. It's expensive in the fact that you have to write a lot of code that takes data out of one database normalizes and sticks into another and it just never typically goes the way that you want to. I would only say you need to have a very good reason, and even if you've that good of a reason, it still might not be worth it to move from one database to another.
[00:02:03] Migrating databases is really tough. So I would only do it if you had catastrophic reasons that you had to move off one of them. Mina is asking we had a lot of trouble with subscription using pub sub Redis. Any advice to optimize it? Unfortunately, I don't have enough experience with the pub sub mechanisms of reticence, to give you any sort of useful advice.
[00:02:27] But when you find the answer, I would love for you to tweet me at and tell me what your experiences because I haven't heard of a lot of people using it in production. So, best of luck, I think it's what I have to say, [LAUGH] Congratulations on making this far.
[00:02:43] This is a very long course about a lot of information all at once. So just congratulations, I'm impressed with myself that I made it this far, right, I know these things. I just want to give you a some like parting advice on like how to choose the correct database because I think that's kind of one of my points of this course is I want you to know when you're going into these decisions, what trade offs you're making, right.
[00:03:07] I want you to have the pros and cons here so that whatever problem you're tackling, you choose the right tool for the job. So I'm going to walk you through my process of how, if I'm starting a new startup and I have a new app, how I'm going to choose my database.
[00:03:23] The first thing is like what's the paradigm of your data? Do you have highly relational data? Do you need to do a lot of joins? Then you probably need to use a relational database, that's the most obvious one. Do I have highly unstructured data where I have collections of related objects, but might have shifting schemers over time.
[00:03:43] Then something like MongoDB is gonna be really good for that. Do you have something where you have to describe a lot of relationships between various different nodes in your database? In that case, you probably need a graph. Do you just need simple needs of setting and getting, then something like Redis can do that for you.
[00:04:01] And one thing we didn't talk about but one that you should probably be aware of is Kafka. And Rabbit is, do you have a lot of messages that are being passed around? Do you need to broker messages between the different parts of your app? In that case, something like RabbitMQ or Kafka can be really useful for that.
[00:04:18] So your answer is probably some combination of all of the above, right? It's usually not, clear-cut, it's definitely that it's usually like I got this middle area here. So how am I going to choose beyond that? And specifically when you come to something like Postgres versus mySQL versus MongoDB, that line's pretty blurry, right?
[00:04:38] They all can kind of accomplish similar things. So yeah, consider your data needs, that's probably the first thing I'm going to tell you to do. Something else for you to consider, and this is going to be actually more towards data architecture rather than which database. But we talked a lot about reading and writing, but usually your application has either a lot of reads or a lot of writes.
[00:05:05] Let me give you an example, an application like Netflix, you're in general reading from the database, right? You don't constantly go in and change your Netflix profile every day, you generally write that once, and then you leave it alone for a really long time. But that data gets read a lot, right?
[00:05:24] Because it's gonna read from your preferences, it's gonna read from which movies you wanna watch, and all that kind of stuff. That's a very read-heavy situation, so in that case you want to architecture data so it's optimized to be read. So you're probably gonna write all of your data generally in one place because you don't want to be doing 17 joins to get all the correct data together, because that's expensive.
[00:05:46] Joins are not cheap. Again, I'm referencing my friend that works at MongoDB Harry, he told me that like, it's okay to do joins in MongoDB. As long, you're not doing it a whole lot so you can actually have multiple collections and do joins in MongoDB As long as you're not doing it a lot right because then it's even if it's an expensive query if you're running it once a day doesn't matter.
[00:06:13] So that's always something you want to keep in mind is read heavy versus right heavy do I will not be optimizing my data for being read from or is it do I want to be optimized idea so it's because I'm rewriting it to it a lot, right? So Twitter is a good example of something it's going to be very right heavy because people are tweeting all the time you're writing, going to your database, right?
[00:06:31] So make those trade offs. Okay, so let's say I've I decided that my I have a multi paradigm need for my database, which one do I choose next? Generally, I just go talk to my teams all right, raise your hand if Mongo, raise your hand if Postgres, and if everyone in the room knows Postgres, and no one knows Mongo, I think you pretty well have your answer at that point, right.
[00:06:57] Familiarity buys a lot in terms of experience just knowing how to administrate these kind of things. You don't have to like relearn all of the trauma that comes with learning a new database. So familiarity is a big one, quality of drivers is one that I think often gets overlooked.
[00:07:15] If you're gonna Going to be writing you know to Postgres but you're writing elixir. I don't know the quality of the Postgres connector to elixir, it might be good, I'm not saying it's bad or anything like that. But you might want to just go check out the quality of the drivers for the language that you've chosen.
[00:07:32] Operations costs, you want to look at this from all perspectives like what does it cost me to run this database in production in terms of like just the dollars moving out of your bank account, times you have to spend writing code for it. Cost of downtime, but you also need to look at how expensive is it to hire someone that can do this?
[00:07:52] So again, I'm going back to like, how many elixir engineers know how to write to Neo4j might be zero, right? So hiring for those kind of things are very expensive. So look at it in terms of hiring, an expertise that exists in the field as well. And like this is probably like the least fun advice I'll give you today, which is be boring, right?
[00:08:18] Usually it's a good idea to just go with boring tech choices. And when I say boring tech, like maybe tomorrow, there's a brand new like Super DB that comes out that does everything perfectly and everyone's on Hacker News is saying that like, this is the greatest thing ever.
[00:08:35] It solved all my data needs. It also cured my arthritis, right? Like it just does everything. I'm gonna say, let everyone else suffer through all of the growing pains of super DB and wait like five years for it to become boring too. That's my advice to you, I'd rather spend my time on the product and like working on finding product market fit than I do want to spend getting super DB to play nice with my configuration of terraform or something like that.
[00:09:05] In general when you're making technical choices be as boring as possible. All right, let's talk about caching for just a second. My very strong advice for you is use caching sparingly and use it only when you need it. In other words, don't prematurely optimize your application, and why do I say this?
[00:09:27] Caching adds a lot of an extra layer of indirection to your application. My good example of this is that, if you make a call to an end point, you don't actually really know, typically did that come from cache or did that come from the database? And so when you have a bug is a problem with the cat, is there a problem with the database?
[00:09:50] Is there a problem with the code? Is there problems to the code the rest of the cache has a problem with the code that writes to the database. You added this like extra layer of debugging to your app, that sucks debugging the cache is the worst. And that's why I'm just telling you, as some of you are probably my future coworkers.
[00:10:06] Please wait to use the cache until you absolutely need it. Make sure that your caching is solving a problem. Don't just cache because it seems like a good idea, right? Just my general mantra with writing code is have a problem to fix. Don't try to solve a problem that you don't have yet because you usually don't end up solving the right problem.
[00:10:30] Yeah, and just like a big reason I say that is, while databases are expensive databases today, like Postgres and Mongo DB are designed to handle enormous amounts of traffic, have a little bit of faith in them. Like, let them show you that they can actually handle quite a bit of capacity before you actually say like, I don't believe you.
[00:10:49] Here's a cache in front of you, so it's a very, very sharp sword. It's and it definitely cuts both ways. So just be careful with caching. Yeah I mean that the joke is there's two things that are hard in computer science, caching, cache invalidation, naming things and off by one, right?
[00:11:13] One of those is cache invalidation, and there's a damn good reason for it. It's hard to figure out how long to cache data. It's not there's not normally a clear answer to that question. Okay, I'm just going to throw out here, there's a bunch of really great courses in front of masters that cover some or all of what I covered today.
[00:11:35] So I'm just going to throw out some two of them that I think from me that I will personally benefit you if you like this course, are the complete intro to Linux in the CLA talk. I have a lot of good stuff in there about how to work on Linux stuff and how to like understand the CLR better even for people that know Linux pretty well.
[00:11:54] And another one here is complete intro to containers. If you were interested in the Docker stuff that we were running, I go Really in-depth probably more in-depth than anyone wants to go, but I like it, right. So this will teach you everything that we did today with containers, and then from some other amazing structures on front and masters, Scott Moss personal here in mind always kicks my ass at video games.
[00:12:18] Intro to MongoDB. That one uses Mongoose, so if you want to learn stuff about ORM specifically with MongoDB, hop right into that one. And then API Design with MongoDB, it goes into more Node code with Mongo, A plus teacher, good friend of mine, I like Scott a lot, he also has some courses on GraphQL.
[00:12:39] So GraphQL from a server perspective, GraphQL from a client perspective and then advanced GraphQL. If you'd liked that stuff that we did today, Scott Moss has good courses on that as well. And then another one of my favorite courses is from my ex coworker at Netflix, Jem Young.
[00:12:58] He does a full stack engineering for and that's another really good course, and that's going to be more on the management side. I kind of gave you like little footnotes here and there, Jem's going to get really deep into like, let's start up a server. Let's talk about DNS.
[00:13:12] Let's talk about all that kind of lower level stuff. Okay, thanks all for sticking with me. You unlocked the achievement of listening to me for too many hours, and working with some databases, hopefully this was helpful to everybody. Please tweet at me your success stories, that just warms my cold, dead heart if you found any issues, I've actually noticed a ton of issues in my course notes here.
[00:13:38] So I'm going to go fix a bunch of those, open a pull request. I'm deeply grateful for that kind of stuff. All right, everyone, I really hope you enjoyed the course. Thanks for sticking it out with me, hope y'all have a good rest of your week and I hope your your weather is better than mine.