Check out a free preview of the full Introduction to Backend Architectures course
The "Maintenance" Lesson is part of the full, Introduction to Backend Architectures course featured in this preview video. Here's what you'd learn in this lesson:
Erik discusses the importance of maintaining a system after it has been deployed and how it can impact the overall design. He also covers best practices such as using a modular approach, planning for scalability, ensuring robustness and flexibility, addressing security concerns, keeping up with technology trends, and incorporating user feedback to continuously improve the system.
Transcript from the "Maintenance" Lesson
[00:00:00]
>> Erik Reinert: However, [LAUGH] this goes back to what I kinda mentioned before about deploying and putting it out there is only part of it, right? The other part of it is learning how to maintain it. And sometimes you're really gonna never know how to maintain a system until it's in production.
[00:00:18]
Until you've actually got users using it, you've got actual data going through it, this can be things, user data sometimes and most of the times is only really accurate in prod, right? At Hippo where I work we have something called prod copy. And it's a literal copy of production that is closed off still, but it has backups from nightly databases of the production environment.
[00:00:51]
And so that makes it so that we can test maintenance, we can test deployments and things like that against real data. And until you get to the point you are maintaining the architecture in a way that you regularly updating it, maintaining it, ensuring that is effective, accommodating any changes or things like that.
[00:01:09]
This part, is really going to tell you if you built something that is easy to maintain. And this part of alone, can make it so that you might end up going back and changing a lot, right? Testing something and maintaining it are two very different things. Testing it is to make sure that it works, maintaining it is making sure that it's up all the time, right?
[00:01:37]
And so maintenance around this can be very learning to how you actually end up with the final solution.
>> Erik Reinert: So I do have some best practices that I'm gonna go over, which are kind of, again, just reiterations of some of the stuff that we've talked about before. You should really involve all stakeholders from the beginning and going back to your question about should developers be involved or not?
[00:02:07]
There might be an opportunity where developers should be involved, right? Maybe you've gone through challenges in the past at a company where you've dealt with, maintainability being difficult and you wanna improve on that. Maybe they didn't like the type of design or the type of technologies that we were using the past so that you can hear those needs and those expectations, right?
[00:02:28]
Which phase do you think this should be in?
>> Speaker 2: That should be in a research.
>> Erik Reinert: Yep, research exactly, yeah, this should be in the research stage. It's never too late or too wrong to reach out to somebody and ask them, what do you think? And it never hurts to ask that question either, and so the more people that you get as a stakeholder in what you're building, like I said, it's easier to get that out.
[00:02:58]
It's easier to get more backing behind it, it's good to get more people involved in the project you're working on.
>> Speaker 2: Question.
>> Erik Reinert: Yeah.
>> Speaker 2: Would you say that it would have value or I guess what it makes sense to incorporate things what Google does with SLEs you have your SLOs as your SLIs.
[00:03:15]
>> Erik Reinert: Yeah.
>> Speaker 2: In where would you put in, I guess in here?
>> Erik Reinert: So that would be in the research stage, you'd wanna think about any uptime requirements, any SLAs, anything like that. Anything that exists already that could impact the design or how it works should happen in the research phase, because, I mean, again, as I'm sure you understand what's SLAs, it could change a lot [LAUGH].
[00:03:42]
That can be a very simple system to being, well, we need not many four nines, four nines is tough, that is a very tough thing to provide as an SLA. I mean, almost literally, I think four nines is one day of downtime a year, right? And that's crazy, and if you have that requirement for that system, then it may be more necessary to look into it, but it might also just be good to ask because maybe you don't.
[00:04:12]
When you talk about building a modular system, it doesn't mean that everything needs to have a high SLA, right, it may mean that, well, yeah we have a high SLA. But you know what? You don't even have to worry about it, it's really just for the frontend. We just want to make sure that users can still access the page.
[00:04:26]
We don't have to be worried about, this deep, deep low level system that, only really gets fired whenever a specific event happens. So yeah, SLA would definitely be something you'd want to investigate, early on into the research. Again, best practice, use a modular approach like modularity is really good.
[00:04:48]
As I kind of mentioned before, when you have that modularity, it makes it so that specific components can be easier, updated. Modularity is good, but you just, there's like it, It's an art [LAUGH], it really becomes an art, how much do you abstract, how much do you modularize?
[00:05:06]
And if you feel like you have to be honest with yourself, like again, going back to that first workflow I showed you of Quirk V1, I can look at that now and be that is way overkill [LAUGH]. And you need to be able to be honest with yourself and just ask yourself the question.
[00:05:24]
Is it too complex, is it not modular enough? Don't look at these as, I would look at them as Legos, do they fit well together? If they don't fit well together, then it's a good idea to figure out how to make them fit better. Again, plan for scalability it should definitely be something that you can think about.
[00:05:45]
I would say that plan for scalability doesn't necessarily have to be massively in the research phase, but it should be in the testing phase, right? In the testing phase, cuz that's really when you're gonna start asking or answering questions around, can it scale, do we have good performance through things?
[00:06:05]
And that doesn't really start getting answered until you start hitting it and actually trying, yeah.
>> Speaker 3: Yeah, how do you think about scalability in terms of, I don't know, time horizons, I guess, I don't know if you've heard like YAGNI you ain't gonna need it [LAUGH].
>> Erik Reinert: Yeah, I've heard it a lot.
[00:06:21]
>> Speaker 3: Yeah, that I think it's really easy to be, we need to add a bunch of complexity so that it's more scalable. But you're, considering scalability at a level that your company's not gonna achieve for 10 years, in which case everything's gonna be different anyways, like where do you find that balance [LAUGH]?
[00:06:36]
>> Erik Reinert: So, okay.
>> Speaker 3: Theoretically [LAUGH].
>> Erik Reinert: No, it's great that you're asking that because I work at a publicly traded company, right? Our value is described entirely by what we're building and whatever. So you're never gonna be, there is a terrible thing to have for a publicly traded company [LAUGH], you eventually want to get there, right?
[00:06:57]
You wanna have your stock go over a hundred to $200, right? They have good intent, but they're not a good approach, to say, we'll never get there just means to say, okay, well, you telling me the company's never gonna grow?
>> Speaker 3: Yeah, I think there's something there though about, okay, maybe you're not never gonna need it, but are you gonna need it on a time horizon where it's relevant to worry about it, now?
[00:07:27]
>> Erik Reinert: So I think a good way of kinda thinking about this, and don't know if this will come off the right, cuz I'm doing it off the top of my head, but think about loading dock for boat, right? You can easily just drop a boat in the water, but it's really hard to get that boat out of the water if there's no, ease in, right?
[00:07:48]
So I think the goal there is, is to have that ease in, sure, we don't need that yet, but can our auto scaler support it at least, right? So that if we get there, we don't have to rebuild everything, right? Or, can this service handle this? So here's a good example, Quirk V2 is built entirely to be serverless.
[00:08:13]
And the reason for that is for and I was gonna talk about this later, but I can talk about it now, too, is for costs. I wanna sell Quirk at a very low cost because content creators are cheap, [LAUGH] it's the easiest way to put it. I'm cheap. I don't wanna have to spend a lot of money on a bot or something like that.
[00:08:31]
But the value I'm providing with that bot is also hopefully enough to pay for and make people pay for it. And I had a conversation with a really good friend of mine who works on probably one of the most known and scalable bots on the Twitch platform called Fossabot.
[00:08:51]
And I was, yeah, we're going to build it serverless to start and he goes, I could never do that. And I was, well, what do you mean? And he's, dude, I process hundreds of thousands of requests, if I pop that through Lambdas my cost would go through the roof [LAUGH].
[00:09:07]
And I was, well, we're not there yet, so should I worry about this right now or should I not? And so I decided, well, maybe if I can build the system to modular enough to where just that one component that I'm worried about could be replaced with a service instead of a Lambda.
[00:09:25]
Then it's easier to prepare for that future and I don't have to be as worried about it. And that's going back to the whole, plug and play of the bot engine, why we built it that way. It was to make it so that, for example, discord can't be serverless.
[00:09:42]
Discord needs a WebSocket connection, we can't get around that. And so we already even know that we're going to be building services versus serverless and so we can take this time over here where we know it's gonna be valuable, to build that loading dock and get comfortable. So once I get to a point where, I'm connecting this to a huge streamers platform, I can just say, okay, rip out the logic out of the Lambda, put it into service, plug it into the API or to the gateway and then bam, now we have that scalability.
[00:10:15]
So make sure you have an out basically, just make sure you have some kind of path to where you can get to there, yeah.
>> Erik Reinert: And honestly, this is kinda similar to what we were just talking about, but you wanna make sure that the system is robust and flexible, right?
[00:10:32]
Think about, what kind of deep problems you're dealing with today, while also making yourself a pathway to solving or dealing with other problems in the future. Like I said, I knew at some point, Lambdas are gonna be costly, every single time we literally fire Lambda for every message in chat.
[00:10:55]
And for me, who's, smaller streamer, I stream 150, maybe max 200 people at a time, that's not too bad. But when you're streaming to 25,000, 50,000 people and chat's just, [LAUGH] what I mean? That's your Lambda bill, baby, and that's gonna shoot real quick. So you have to be prepared for that robustness and that flexibility, and like I said, we knew that, but what's also nice is you might even find really awesome opportunities.
[00:11:23]
So for example, this was brought up in my chat the other day and they were absolutely right, which was, well, couldn't you have both? Couldn't you have dedicated things for huge streamers, but still offload other things to Lambda and still have that cost effectiveness. And we could, yeah, we totally could, now that we have that modularity, we could even do something and say, okay, well, if you're on this plan, then you use the Lambdas, right?
[00:11:50]
And that's because you're low traffic, we don't have to worry about the cost there. But if you're on this plan, then we'll actually route you to the running service so that everything is going through this, fire hose and we don't have to worry about your traffic affecting other people's traffic.
[00:12:06]
And so that's robustness, that's flexibility that we were able to get because we said, no, we're gonna do this first, but we're also gonna make it so that the system can be kind of plug and play with other things. Again, security is heavily, heavily important, making sure that you're constantly addressing security if you have a SecOps team do involve them in the research part.
[00:12:32]
I would involve any teams like security or anything like that in the research because I don't know how it works at your guys' companies, but the moment security walks in and says no, its a hard no [LAUGH]. I cannot move much when our CTO or not our CTO, our CSO, comes in and is, hey, no, that dude lays the law.
[00:12:56]
And so you don't wanna go through building a design and then having your CSOs, or your security team being like, no, [LAUGH] we can't do this. So those are a good part to include in the original design. Again, keeping up with latest technology trends and stuff like that to make informed design decisions.
[00:13:18]
It's even good to keep up with these after you've deployed, right? Being able to say, okay, well I see that this is out there, but you know what? Our system is still good that can help validate what you've already built. Or there could be opportunities for change even after I'm saying, well, you know what?
[00:13:34]
There's this new thing that we can easily drag and drop and it'll work pretty well. Let's give it a shot, let's go back to the research, let's change the design a bit, and let's make our changes where we want. I would say, the hardest part about architecture design is the initial design.
[00:13:51]
After you've got something that you're sitting on top of, that you're comfortable with, that you've already got in production. Again, this is why companies like Netflix, Google and Amazon are actually able to iterate so well, is because they've solved a lot of these problems already. They can go, okay now I'm just gonna tweak just a little bit, tweak a little bit here and there.
[00:14:14]
And again, I have friends who work at Netflix and their sole job was to just tweak that one lever, just make that one thing much faster. They didn't have to worry about the entire system, because a lot of that was already solved for, or it was taken care of.
[00:14:29]
And that's where, you get those kind of engineers who have worked on these crazy, weird situations that are, very unique, but also very hard to solve, that laser focus can help a lot with that. And again, incorporating feedback from users to improve the system this should be in all phases.
[00:14:49]
This should be in research, this should be in test, this should be in development, this should be in deployment and this should be in maintenance. It should be in all of those phases. If there's ever a point where you feel like, you don't understand what problem this is solving for the user, then you need to take a step back and understand what it is, and why you're doing it.
[00:15:10]
It can remove from that scope, it can realign your focus and can help either slim the design or even make it more complex.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops