Check out a free preview of the full Introduction to Backend Architectures course
The "Distributed Pros & Cons" Lesson is part of the full, Introduction to Backend Architectures course featured in this preview video. Here's what you'd learn in this lesson:
Erik highlights the benefits of independent development and deployment in service-oriented architecture. He also discusses the challenges and complexities that come with distributed systems, such as the need for developers to consider how other systems work and the difficulties in development, testing, and data management.
Transcript from the "Distributed Pros & Cons" Lesson
[00:00:00]
>> Erik Reinert: Okay, so let's talk about the pros of service-oriented systems, right? Now, again, my goals are not to just be reiterating this to you guys, but it's to hope that by now, you see what I'm talking about. You understand, and it's just like obvious. You probably didn't think this was obvious at the beginning of this course, but it should now, it should be a lot more obvious.
[00:00:23]
So one of the first pros of doing service-oriented design is independent development. Each service can be developed independently by a team that is focused on that service. This is one of the biggest reasons why companies move to service-oriented architecture. Because they wanna be able to say, this team, you focus on this, this team, you focus on that.
[00:00:45]
And at Hippo, we do the same thing. We have teams that focus on customer portal. We have teams that focus on lender agreements. We have teams that focus on different parts of the system. And we don't have to worry about them. Now, why you should watch my other courses is because I show you guys how you automate and deploy those systems and serve those systems for those teams, because that's another challenge in its own.
[00:01:10]
Although you want these teams to be able to independently develop, the cycles should be the same, right? We should all deploy services the same way. We should all configure services the same way. We should all have expectations around how services run the same way. Again, unless the architecture is different for some reason, that's how we build off the self-solutions.
[00:01:32]
And again, that's why I really do recommend watching those courses, because all of this plays together, right? You design it, you build it. You maintain it, but then you also figure out how to automate it, how to deploy it, how to scale it, right? And all of it kind of glues together in this cohesive architecture that you get to build and maintain.
[00:01:53]
And if you do it right, you enjoy it [LAUGH]. You actually don't hate it. And that is what DevOps is for and that's one of the main reasons why it is tough. It is tough, I know a lot of people who have built systems or have worked on systems with distributed services and gone this is a nightmare.
[00:02:12]
Like this sucks, there feels like there's no design thought around this. We just kind of threw a bunch of services out there. And even at Hippo, we definitely have some of that technical debt as well. And we're trying to get better at improving it. But if you do it right, the benefits of independent development is remarkable.
[00:02:33]
It's a game changer for a company. That means that teams can easily work separated and they don't have to think about each other and what they're doing. This one, however, is huge, okay? Now, I have a story for this, independent deployments. Services can be deployed independently. Up until about a year ago, and if I give you guys PTSD, I'm sorry, but this is what I went through.
[00:03:03]
Up until about a year ago, Hippo had to, every two weeks, get together on a call with every team, and we had deployment nights. And we would sit on a call for about five to six hours, and we'd have a deployment leader. It was like a raid in a dungeons and dragons thing.
[00:03:22]
Somebody led the group into battle and we went through the systems one by one. Okay, go deploy your system. Okay, go deploy your system. Okay, go deploy your system and I hated it. I absolutely hated it. The fact that we had our system so tightly coupled meant that every time we had to deploy, we had to get together.
[00:03:45]
Otherwise, if we didn't, we could have cascading failures. We wouldn't know where they came from. And so it made more sense for us as an organization to all sit on a slack call and just go through the whole process of deploying like 40 different services from 10 to 20 different teams all at once.
[00:04:02]
It was terrible. It was terrible. And these are like the days that shall not be spoken of because we hated it so much. But what we implemented to fix this was something called FBR, which is feature-based releases. And the idea behind this was is we already kind of had this service-oriented design, but because we didn't have the confidence in deploying it, we needed to fix that.
[00:04:32]
And so what we did was is we kind of developed this process of like a bus. The bus is going around the school every day. It's picking up kids, it's dropping kids off. And you can hop on that bus anytime. And when you hop on that bus, that bus is gonna go all the way to production, and anybody on that bus can go with it.
[00:04:50]
And so, this feature-based release made it so that we moved from a, okay, everyone deploy all at once to, okay, teams, you can deploy as many times as you want during the day, and you will not impact other systems. And this independent deployment made our, God, it's so much nicer.
[00:05:09]
It is, it's so much nicer. It made a lot of people happy, just genuinely happier because you don't have to sit on a phone call until 8 o'clock, 9 o'clock, this was a thing. Nobody has to stay up late anymore. Nobody has to be worried about if a deployment is gonna happen or if it's gonna be bad.
[00:05:26]
And also what's really nice is it made our deployments happened earlier in the day. We didn't have to think about, well, let's deploy that after hours just to be safe. That disappeared. A lot of that really disappeared. And so now developers and teams can easily deploy whenever they want.
[00:05:45]
And that deployment is just that one little change of that commit. Like it's literally just the commit and the Docker build of that commit. And they just drag it all the way to production. And it's literally like taking the container and going, okay, you're in dev. Okay, that looks good.
[00:06:02]
You're in staging, okay? You're in prod copy, okay? You're in prod. And that separation of independent deployment is really what made Hippo be able to move a lot faster. And honestly, even though the system itself is something that we took probably like a year in itself to be like, okay, let's try and do something better.
[00:06:24]
It's still works fantastically. And going back to the Kubernetes, me trying or my team trying to get us to move to Kubernetes. It was a hard sell to be like, let's change this again, because we already found a solution that works so well, that it was hard to be like, well, let's find something even better, right?
[00:06:43]
And so independent deployments, if you're at a company that has to do that, that has to get together every week and do deployments, this should be one of your top priorities, right? You do not wanna have to worry about other teams deploying with you and are you gonna break somebody else's components?
[00:07:01]
That's an architecture problem that needs to be solved for.
>> Erik Reinert: And as we noted before, fault isolation, right? So we talked about services and microservices being distributed. The nice thing about it, as well, is that you can Isolate failures and things like that to specific services and your outages don't become massive.
[00:07:29]
Again, if we talk about the monolith, what happens if your front end goes down? Everything goes down yeah, it's everything. It's not just your front end, it's your back end. It's everything, right? And so when we talk about having to worry about fire drills at two o'clock in the morning, right?
[00:07:49]
If we're not worried about one specific component, you know what? Let it be broken until the morning. We'll fix it or we'll wait until the other team in Poland or whatever hops on and fixes it. It's not as big of a fire, right? And these two, oops, sorry.
[00:08:04]
Independent deployments and fault isolation can make your developers live so much better, right? This is not just architecture, but this is also developer experience. You don't have to make your developers go through the mud. You can make, I like to say, you can have nice things and we can have nice thing.
[00:08:25]
So this is why distributed is so valuable, right? But again, I want you guys to keep in your mind when we talked about what and when. It's still doesn't mean that monoliths are valuable, right? But hopefully this gives you a better understanding of the real differences between the two, right?
[00:08:44]
And then, this is really awesome as well. Say you do have that one rogue team that just wants to build Rust. Yeah, you know what? Go for it. As long as you guys can maintain your own systems and support it, and you guys can plug into our existing stack.
[00:09:02]
Sure, go for it, right? Nobody at a C level is gonna give a crap if you're using Rust or Go. [LAUGH] Nobody at the corporate level is gonna care. Nobody's gonna argue that. So if you have a good argument, you have a good solution and you have something that works well and plays nicely with everything else, then distributed systems also gives more freedom to the developers to do what they want.
[00:09:31]
And this can mean that you can try innovating with new things easier, right? In the scenario where you say, you know what, let's try this one system, maybe we'll try Elixir, or maybe we'll try this other thing, right? And I know for a fact, like again, I know friends who work at Twitch and other large companies like I do, and they've talked about this as well, that there's always this one weird service that exists that is weird.
[00:09:57]
But they tried it, and now we're just keeping it around. And that's okay. Those are welcome in this kind of architecture as long as they're built with the considerations around them. So distributed definitely allows you to have a much more diverse technology stack, for sure. So let's talk about cons, right?
[00:10:18]
And we've already kind of talked about this, but distributed system complexity, right? Developers must deal with the additional complexity of co-creating a distributed system. This means that developers might still need to consider how other systems work. They might need to also think about how their systems plug and play into the other systems.
[00:10:40]
Again, going back to the video of the guy talking about everything jumping around and all the complexity there, this is a prime example of that, right? If somebody adds something, and then somebody adds something else, and then somebody adds something else, and nobody thought about the other three things that happened, and they just add their own thing, now you've got four levels of complexity, right?
[00:11:03]
Excuse me, one of the things that we did at Hippo that I actually think was a bit of a mistake is we made it so that developers can pretty much create their own services whenever they want. But to the point where we kind of stopped doing like approval processes and things like that because we just wanted developers to be agile and now we've got a lot of challenges with that, like for starters, development environments, right?
[00:11:29]
How do you create a development environment when you've got 150 distributed services that all need to be online, talking to each other, know about routing, know about service discovery? That becomes really challenging. And so in our case, we actually saw the provisioning of development environments in the cloud go up by like 20 minutes.
[00:11:49]
Just because it's got to go through every service, restart every service, make sure it's got the latest Docker image and all these other things. And this is a big reason as to why we wanna move to Kubernetes. Because in Kubernetes, we can at least describe them in namespaces, we can just focus on containers versus ECS, which is you need to have hosts, you need to have clusters for it, and all these other things.
[00:12:11]
And so, distributed system complexity is out of the box, gonna be a challenge you're gonna deal with right away with distributed systems. So being aware of that scale as it's happening Is important.
>> Erik Reinert: Also with distributed systems, development and testing becomes a lot more difficult. Like I just mentioned with things like developer environments, testing locally.
[00:12:39]
Because we have such a distributed system, we also have tons of repos. For an example, at Hippo, we probably have around 200 to 300 active repos, which is a lot. And we've built tools that I even talk about in the DevOps for developers course, like something we call Hippo Build.
[00:13:03]
So think about this, right? Say you're a developer and you wanna create a new service, but you wanna create that new service exactly like other services, meaning you get the same deployment pipelines. You get the same configuration files, you get the same workflows and everything, but you wanna build it a different way.
[00:13:18]
Cool, just create a template, no big deal. Awesome, what happens when you need to update it? What happens when you need to update 50 of those repos at once, right? This is where that complexity really becomes a challenge because you need a tool that knows how to go out to all those 50 repos, update them, update their build configurations.
[00:13:38]
Here's a good example. We were talking about Nix earlier, Nix mentioned, we were talking about Nix earlier, right? What happens when I wanna move all of our development flows to Nix? That's 200 repos I have to update with flakes and other things to make that happen. And so, how do you solve for that?
[00:13:57]
How do you batch update repos? How do you batch update tests, right? How do you batch up data other things, how do you make sure that all of these components are being properly tested, right? What we ended up doing in Hippo was is we started creating integration repositories for service repositories.
[00:14:13]
So you'd have a service repository like billing service, but then you could also create a billing service integration tests repo. And so the billing service is in its own repo and coupled with all of its logic, but then the integration test repo is all of the end to end testing, reaching out to environments, doing API calls and stuff like that.
[00:14:33]
But that means that everything doubled [LAUGH]. So now we've got 80 repos for 40 services, right? And so again, that complexity becomes basically exponential with how many services and how many things you have. So it definitely does make development and testing much more difficult just because of the fact that it's so distributed and you have more things to think about.
[00:14:59]
Again, let's think about monoliths, and then think about what I just described, right? Monolith's way easier. You go one place, update one thing, and you're done, right? This is a bit more. And again, data management is also a big challenge here. We talked about the databases and how if they don't share databases, how do we make sure that the data schemas match, that they have the proper representations.
[00:15:23]
Well, what happens when you need to first update a user table and then update a payment table and the payment request fails? What do you do? How do you fix that data inconsistency, right? It's things like that become much more challenging when you have multiple services that are distributed and this is also something we deal at Hippo.
[00:15:45]
As a matter of fact, I'm just giving you all of Hippo secrets. As a matter of fact, one of the biggest challenges we're dealing with today is we have integration repos that talk to the same services while they run. So some integration repos talk to the same services as other integration repos and they have version mismatches.
[00:16:06]
And we literally have this error now showing up a lot in our integration reports where it's saying version mismatch, version mismatch, version mismatch. The reason for that is because we're running those things at the same time. We're running both of those integration tests at the same time, and they're both trying to update user data or mock data, but it's the same mock data.
[00:16:27]
So you have to be considerate of that. You have to think about like, okay, is this service going to touch or impact the other service as it's working as well? And that consistency can be really challenging.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops