Check out a free preview of the full Full Stack for Front-End Engineers, v2 course:
The "Load Balancers" Lesson is part of the full, Full Stack for Front-End Engineers, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Jem defines what load balancers are, and how they determine to which server they go to, using scheduling algorithms.

Get Unlimited Access Now

Transcript from the "Load Balancers" Lesson

[00:00:00]
>> Jem Young: All right, we talked about containers and orchestration and elastic cloud computing. Someone remind me what elastic cloud computing is. It's what it sounds like.
>> Jem Young: Yes.
>> Speaker 2: It expands and contracts based on your needs over time.
>> Jem Young: Exactly, and it's useful for, I think the example I gave was during the holidays there's a lot of streaming.

[00:00:23] Another example would be Frontend Masters. At midnight, there might not be a lot of people watching, so they can take down some of the resources. And during the day, when more people are watching, they bring them back up. But we do that with a concept of knowing the load of the servers and then measuring that and saying like, hey, this thing is at 90% load.

[00:00:42] We need to route traffic somewhere else. Or we're at 10% load, we actually don't need these servers. But a load balancer is what routes all the traffic to the appropriate cluster. And it's something that you don't think about until you're at scale, when you have even two servers.

[00:00:58] We'll take two servers for an example. One server might get all the traffic because that's just the way the traffic's getting routed. And then the other server, it's just not doing anything. It's just wasting cycles. But the problem with running a 90% load is that's a bit slow.

[00:01:12] Whereas, the other server is not doing anything. That's where the concept of a load balancer comes in. It makes sure that traffic is split evenly between all of your servers. Cuz it doesn't matter if you have 10,000 servers, if three of them are doing all the work, the rest of them aren't doing anything.

[00:01:30]
>> Jem Young: Load balancers work with a concept called a scheduling algorithm. I know, this sounds dull but this, to me, is some of the most interesting parts. All right, what do you think some of these scheduling algorithms do? They're kinda in the name. The IP Hashing is a little tricky so I don't.

[00:01:48] So think if you had ten servers and I have a request coming in. How does the load balancer determine which server it goes to? That's what a scheduling algorithm does. So, what do you think Round Robin does?
>> Speaker 3: Just goes to the next one in the sequence.
>> Jem Young: Exactly, that's exactly right, that's the default load balancer strategy for most load balancers.

[00:02:11] There's also IP Hashing, which hashes your IP. That means it runs it through a hashing algorithm, converts it to a set of characters, then matches that against a other set of IPs. So the benefit of IP Hashing is that you're guaranteed to go to the same server every time based on what your IP address is.

[00:02:29] There's also just random. And what's funny is we've spent a lot of time and research into scheduling algorithms. But Random Choice actually works pretty well, just picking a random server works okay. It's not the default, but it's actually not a bad strategy. There's also Least Connections. So that means the load balancer needs to know what the load is on every single server and it's saying this server has 20 connections, this one has 10.

[00:02:52] I'm gonna go to the one that has 10. There's also the concept of load. So if you have really process-intensive applications, the server with the least amount of load is gonna get more of the traffic until it balances out. And like I said, Round Robin is the default on most load balancers.

[00:03:10] So let's take a look at what our server load is right now, let's run top.
>> Jem Young: So to run top, we can say top.
>> Jem Young: And this gives us a real-time view of what's happening on our server. It tells us the amount of threads we have available, which is probably gonna be one cuz we only have one CPU.

[00:03:34] How much RAM there is, things like that, and it's real-time, so it's always keeping track. You don't wanna run top all the time, because top itself takes resources [LAUGH] to run. And on our single tiny little server, if you're running top all the time, you'd see the resource consumption go up.

[00:03:51] But honestly, this isn't that user friendly. I could break it down. I could write a script that extracts out everything that this is telling me in detail. But let's install something a little bit cleaner. Let's install htop, and to quit out of top, Q.
>> Jem Young: And let's install htop, which is a little bit nicer.

[00:04:17] So sudo apt install htop.
>> Jem Young: And it's actually already installed, I think. So let's just run htop now. That's much cooler. This is one of those things you can leave up on your computer and people'll walk by like, what are they working on? They must be really smart, look at all those numbers!

[00:04:38] But it's really the same thing as top, it's just in a prettier format. And it tells us our CPU load, our memory consumption. Funny enough, if we look at what's using up all the CPU, htop is near the top. [LAUGH] It's using a lot of CPU just to run this.

[00:04:53] And if we go to Fn+F7, wait, no I want the tree mode, F5. Remember I said PM2 can help with clustering? It'll actually spin up multiple instances of your Node application for you and it adds and subtracts them for you. We can see that here, that PM2 is running node but it's also running all these different instances of node and there's all these different instances of PM2.

[00:05:23] So that's just tree mode.
>> Jem Young: But really, if your system is running slow, running top or htop is a good thing to pull up and say what exactly is using up all my CPU cycles? The Mac equivalent would be, let's say, Activity Monitor. And I can pull up Activity Monitor for my Mac in a much cleaner UI.

[00:05:42] And we can see that Activity Monitor is actually [LAUGH] using up all my CPU, but it's generally gonna be Chrome as the main culprit or VS Code. But top is just a good thing to know your load. And so when we talk about load balancers, this is what we're talking about, understanding the CPU load, the connection load, the memory load.

[00:06:03] And to quit, I can just Ctrl+C, I think, or actually F10.
>> Speaker 4: What are the color codes in the memory section, do you know?
>> Jem Young: I don't know authoritatively, but I wanna say green is you're still okay. I think we're in blue, so we're getting close to halfway.

[00:06:18] And I think when you get a little further, it's gonna change colors. Wait, I might have that backwards. I couldn't say authoritatively what the color codes mean exactly.
>> Jem Young: I wanna say red would be bad, but sometimes this goes in the red, which I don't know what that means.

[00:06:37] Maybe it means the CPU is being used or not. If anybody knows, you can tweet it at me. And we're just gonna quit out of here, so F10, and we're back.
>> Jem Young: And earlier I said Nginx is the jack-of-all-trades. Well, you know what? We can use Nginx as a load balancer, too, because it's just that cool.

[00:06:57] We're not gonna implement one today because we only have one server, so there's really nothing to balance to. But if we had two servers or three servers, we add this upstream backend. And then we proxy_pass all connections to one. So what we can do is we can run a Node application on different servers or different clusters.

[00:07:17] And then if we proxy_pass to one of them, and Nginx decides based on the strategy we're using which server to go to. That's pretty nice. So out of the box, I could probably run 1,000 servers with one Nginx load balancer and it's just gonna handle the traffic perfectly.

[00:07:35] We can set the strategy in the upstream backend. If you ever get to this point where you have a very high resource intensive service, we can say least_conn. That just means we're gonna use the least connection service. Which is awesome that Nginx does this for you. Cuz it automatically connects to a server, gets the load, reports back every, I don't know, couple milliseconds or so.

[00:07:58] And it reports back how many connections you have every single time. We can also do Round Robin. We could do IP Hashing, things like that. I don't know if I talked about sticky sessions. Honestly, I love Nginx, it's great software. It does all these things for you. Literally, I can run entire web applications with hundreds of servers with just Nginx and it would work seamlessly.

[00:08:21] But here's a question for you, what about session data? So, what if I log into a server? I am authorized on this one server. How do I know that I won't get routed to a different server where I'm not logged in? And this is a real problem that I run into.

[00:08:39] It's mostly solved now, but what do you think would happen there?
>> Speaker 5: Well, it depends on if you're using sticky sessions.
>> Jem Young: What's a sticky session?
>> Speaker 5: It's sticks you to a particular server so you don't get bumped between them.
>> Jem Young: Exactly, it's something we don't think about until you have to implement some sort of login or authorization function.

[00:09:00] Be like, yeah, we'll throw in load balancers, I'll spin up 20 servers, I'm good. But then I'm logged in, but I hit a link or refresh the page and I'm no longer logged in. That's frustrating, because the load balancer can route me to a different server every single time.

[00:09:14] Ways of solving that would be IP Hashing. So that means when I hash the IP, I'm always gonna go to the same server every time. Another way would just be implementing an authorization layer before we even hit Nginx. So it just makes sure I'm authorized completely. If I'm not, it just bounces me out.

[00:09:28] If I am authorized, it just goes to the server and it doesn't matter if I'm logged it or not, because we know we passed through that firewall. It's something to consider when we think about we can't just blindly implement things. As easy as [LAUGH] Nginx makes it to be, these are considerations we have to make.