Lesson Description
The "Asynchronous Tasks & Components" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:
Jem introduces asynchronous workflows, explaining how tasks can be queued and processed without blocking the main system, improving responsiveness and scalability. He covers components like message brokers, queues, and worker pools, and emphasizes the importance of notifications, back pressure, and careful use to avoid added complexity.
Transcript from the "Asynchronous Tasks & Components" Lesson
[00:00:00]
>> Jem Young: Last core concepts, asynchronous workflows. Originally when I was putting this together, I was trying to think of how to logically group these and I had a different chapter I was going to call chapter 6, performance, and I was going to have sharing, replication, caching, and asynchronous workflows in there, but I decided to group, you know, caching and charting and all that with the data store. But asynchronous workflows, along with caching are something that is, what's the cool term, OP.
[00:00:30]
Overpowered. You can solve a lot of problems with caching and doing something async. And async is, is what? You've probably done it before. Explain it to me in like simple gem terms. Well, I was going to ask, when you say async, do you mean like JavaScript async or like parallelism, multi-threading stuff? What is it, what does asynchronous mean? I mean, I guess letting yourself do multiple things at the same time.
[00:01:01]
Yeah, that's what you can do with async, but you don't have to wait for the response to continue. Yeah. Set it and forget it. Hey, here's a job, I assume it'll be done at some point. Let me know when it's done. I'm going to order my pizza from Pizza Factory. I'm not going to sit there and wait for it. Let me know when it's done. I'm not going to stand at the counter watching them make it the whole way through.
[00:01:24]
Consider that an AI workflow. We actually touched on this already, when we talked about caching with the write behind caching, remember that one? The service writes to the cache, it immediately returns, it says, hey, your updates good, we can move on very fast. But the cache is still responsible for getting that rent to the database in an asynchronous way. So we touched on this a little bit already.
[00:01:49]
I like to use the big fancy term asynchronicity. Throw that on your resume somewhere and people are like, this is a person who knows what they're talking about. Asynchronicity. Hey, I don't need to wait for things to get done. I can just set it and forget it. That's the phrase I like to use. When we talk about this in the context of system design or actually any, any even programming, it means we can work with computationally expensive tasks, but they're no longer the bottleneck.
[00:02:23]
We say, hey, we started a task. I assume it's going to get done at some point, but I'm not going to wait for it anymore. That is something JavaScript and Node does really well. Node, especially with the event loop, where we do async programming all the time. Hey, async, cool, I'm just going to wait for this. Come on, come around, come around, come around, it's done. Now I'm ready to move on to the next, the remaining part of the call stack.
[00:02:47]
So familiar with people who are used to that sort of programming, we can do the same thing in a system design on a much larger scale. So examples of things that will take a long time that we shouldn't have to wait for, or if we had to wait for them, it would just be not a great experience. It would decrease our, starts with an A. Availability, it decrease our availability. Remember, availability isn't just is our system up or down, it's how responsive our system is.
[00:03:25]
Is it taking a long time to respond to requests? So, if we had to wait for, say, a large video to upload and finish, that would decrease our availability, because now every other request is blocked behind this thing that's doing a lot of computationally expensive work. Generating a report is a really common expensive task. You write those buttons like generate PDF and we'll get back to you, we'll notify you.
[00:03:49]
I've never done PDF generation myself, but I hear it's actually an expensive task. That's why it's always put in a queue somewhere. Payment processing, there's a lot of verification has to do, mostly across many systems, which just takes time. Any sort of image processing, video processing is going to take time. What are other computationally expensive tasks can you think of? Asking an LLM something.
[00:04:27]
Yeah, that's a good one. That's expensive, query. Anything else I was going to say probably like image generation like specifically I don't know if we're going to get to it, but like potentially just like generating new video frames, especially if you're something like railway that does all that in the browser. Yeah, if you think anything doing something with an image or a video, automatically think that's a lot of data.
[00:04:57]
It's going to be computationally expensive, we should put it async. If you weren't streaming video, it gets really slow. More of a download, I guess. Yeah. Or downloads in general. Downloads. Music processing. If you're not doing this on the client, preparing a zip file is computationally expensive, lots of ones you're going to run into geolocation. Or GPS or anything that you're calculating in real time.
[00:05:35]
The satellites and whatnot. Am I missing. I mean, maybe. Tell me, tell me more. I mean, that would be computationally expensive, right, if you're constantly pinging something and then like. I don't really know how GPS works, I'm sure there's some graph underneath it, but. I can see that. Yeah. The key thing is you don't want these to slow your system down. Fortunately, we have a way of avoiding that in some way.
[00:06:10]
So the components behind an asynchronous workflow. And let's jump here for now. How it works is there's some sort of service or server. And it says, hey, I need to do a long running task. I want to resize this video. That's going to take a while. So, let me put that in the job. So the job says, hey, here's a video, here's a task I need to do, here's the path to that code, and we pass that in a message.
[00:06:38]
That gets passed to the message broker. The message broker is kind of the overall system managing the queues. And the message broker will put that message and job into a queue. The queue is just exactly what it sounds like, a big queue, it's a long line of jobs. They just queue up, just like standing in line at the coffee shop to get out of a matcha. That's where Mark's gone casually, yeah. And when, and then you have a worker pool.
[00:07:11]
So when a worker's free, it says, cool, give me the next job. Let me pull it off the queue, and that's how async workflows work. Pretty cool when you think about it. It's like handing someone a message, say, hey, make sure this gets to this person across town. Or mailing a letter. I don't have to worry about it anymore. Put it in the mailbox, there's a whole system that's going to handle it, it's going to get it to this right place eventually.
[00:07:36]
That's the downside here is the eventually, you don't know when it's going to be done. So the one thing we have to remember with async workflows is you need some sort of notification at the end. The worker needs to be able to, or the message worker needs to be able to say the job is done. Then the server can come and say, OK, let me pick up that video file from some location and notify the user, something like that.
[00:08:02]
So the message broker, you know, does what it says, it brokers messages. It routes and manages the messages and the message queues. The messages, they queue up, queue up jobs and they hold them till there's a worker ready and the workers are actually the ones doing the work, just like in real life, the workers do all the work. So you're probably familiar with some of these terms. Maybe you didn't know what they meant, but you've probably heard of them, Kafka.
[00:08:33]
Kubernetes. RabbitMQ. Yeah, familiar terms, you're like, what does that mean? You know, Kafka can do a lot of things on Kubernetes, but RabbitMQ is definitely going to be a messaging service for async workflows. The tricky part here that, you know, I think it took me a while was. These workers can be specialized. Or they can just be generic workers of some sort. So you're actually going to have a lot of message brokers and a lot of different worker pools, and you need something like Kubernetes to manage the worker pools.
[00:09:04]
So you say, hey, 90% of my workers are occupied all the time doing these jobs, we need to scale this up, we need to scale this down. That's the advantage of facing workflows is it automatically scales up and down really, really easily. And you can scale your queues up and down really, really easily. There's this concept in system design or any sort of design called back pressure. Which is this idea of what happens if the jobs are taking longer and longer, there aren't any more workers to pull from.
[00:09:36]
But the jobs are still queuing up, they never stop. The queues fill, fill, fill, fill, what happens? Your system can crash when the jobs have nowhere to go, the server is still setting jobs, and it's just getting errors back, that takes down the whole system. So you can have this idea of back pressure which is, hey, we actually build this into the service itself. Let's slow the rate of jobs down a little bit.
[00:10:00]
Let's let all the servers know that hey, we're not accepting any more jobs. We throw some sort of error on the client, but it won't crash the service. We wait for the queues to fill clean up a little bit, and then we finish the job. Super powerful concepts, super powerful concepts. You can, it's kind of a cheat code. You can get away with a lot of stuff with this. You're like, oh, it'll get done eventually.
[00:10:26]
Not my problem anymore. And it's really nice, everybody has their own job here. What do you think is the most common mistake when it comes to async workflows? Because they are so powerful, not watching your queue depths. Like if all of a sudden all your consumers are gone. Then things will just build endlessly, we're not having TTLs or like all kinds of things. Yeah, those are all very real things.
[00:10:55]
Even simpler what's probably a tendency people have when they understand this concept? I was going to say probably just like overusing them and getting to the point where it's just like, oh, I can add them for free and it just makes the system more complicated and maybe even adds extra overhead, yeah. It's really tempting to put everything async, really tempting, because when you have this, you don't have to worry about it anymore.
[00:11:19]
You know performance is guaranteed, it's going to be fast. When the request comes into the server. It's going to come back immediately because the service is going to post a job saying like, hey, we'll let you know when it's done. And it really improves performance on the surface. Behind the scenes, the job didn't go anywhere, it still needs to be done. But it's really tempting when you first learn about this to put everything into a queue.
[00:11:45]
Especially when you get to a big system diagram, you're like, oh, this is slow. How could I speed this up? Well, I can actually just batch if we're not talking about a transactional database, just regular good old fashioned non-relational, but the writes are still slow, we could speed that up by just putting everything in a queue. That way we ease up the load on the servers by just backing it up a little bit more, saying it'll eventually get in there.
[00:12:14]
This isn't data that's very important for us to save right now. It doesn't have to be real time. You could do that but nothing is for free. This is expensive to maintain, and it does add complexity when something goes wrong. Let's say one of the queues falls off. And you're like, but I thought that job was complete. No, it wasn't complete, it just queued up, then something happened to it and you have to track down, was it the message broker that lost it, the queue?
[00:12:42]
Did a worker crash out while I was doing the job? And it didn't save that job, didn't get re-queued, so it's actually very complicated here behind the scenes, but RabbitMQ Kafka, very good at managing that. But it's not for free. You only want to use this when you need to. Would you be, um, scaling with that you could do different parts of the service that you're scaling up and down with Kubernetes potentially like.
[00:13:11]
You could have this where it's like, yeah, I'm just actually scaling a really simple workflow, but I just need a lot of read and writes for my tasks or you might have a microservices approach where you. Are using the events to trigger different services, yeah. I guess, uh. That's not really a question, just more of a think out loud. You're bang on though. This makes scaling really, really easy, because now you're like, huh, what's the slow part?
[00:13:36]
Is it the queues filling up too quickly? Well, we can add more queues, which is kind of cheating, but it's like opening another cash register, and another line. It's only going to get you so far, but we can add more worker pools really, really easily. We can add dedicated worker pools. The message broker could say, let me do a deep introspection of all the messages in the queue. Oh, they're all of this type of job.
[00:13:57]
Let me open up a dedicated worker pool just to do this kind of job, and we can easily pinpoint the slow parts of our system and then solve this really quickly. And you could do that dynamically on the fly. So, the good thing about having multiple parts is you can scale them up and down really easily. The downside is complexity. This adds a lot more complexity, a lot more to think about, a lot more to reason about.
Learn Straight from the Experts Who Shape the Modern Web
- 250+In-depth Courses
- Industry Leading Experts
- 24Learning Paths
- Live Interactive Workshops