Backend System Design

Video Upload Diagram

Jem Young
Netflix
Backend System Design

Lesson Description

The "Video Upload Diagram" Lesson is part of the full, Backend System Design course featured in this preview video. Here's what you'd learn in this lesson:

Jem diagrams a video upload system, showing the server, storage, processing services, worker pools, and a notification service, emphasizing stateless, asynchronous workflows and data flow between components.

Preview

Transcript from the "Video Upload Diagram" Lesson

[00:00:00]
>> Jem Young: This is easy in theory, right? Then you get there and you're like, ah. What do we learned so far? You got something to get done, create a service. Call it a day. And then move on, and then when you get to the next step, you're like, oh, that server should actually be a database or something. Totally fine. OK, we have the video. It's now in storage somewhere. We have the metadata somewhere in storage.

[00:00:24]
The user, the client's been notified. The upload is done. The notification service says, hey, the upload is done as well. It's talking to the video service. That is now going to pull from the. That's a way to make the arrow, make it that way. And we can add more specifics if we want, to make it clear, we can say the notification service actually is going to notify, here is the ID of the video or the URL of the blob storage.

[00:00:58]
We never gave it an ID. We'll leave that out for now, but we could say, you know, at this stage, we never associated the metadata with the data, with the video. We do need to do that somewhere, but we'll assume we modified the metadata to include the source video storage like the URL of that, make it easy on ourselves, give us some ID. But, you know, we're making a lot of assumptions on what is actually a pretty relatively straightforward process, but it's not that easy.

[00:01:27]
But it's OK for the sake of learning and the sake of getting us to the next stage. OK, yeah, I'm kind of confused what because we have a video storage blob at the bottom, metadatabase at the top. What is the right database for? Oh, this is just bonus, yeah, that was remnants. Yeah. Good call. We left that the whole time. I didn't even see it anymore. We didn't have user, but I just assumed there was other things like the user or whatever, but that would, when we split the metadata DB, that's assuming that that's only doing metadata and there's something else.

[00:02:09]
We could have a user rec, I mean, there's only how many entities? With you, so we could potentially keep any text records and metadata DB. We'll say just so our own memory, this is the user ID, metadata, and source URL something like that. You need multiple URLs, right, for the um, or when the video processes, is it, you rewriting the original video, is it replacing it in the blob storage or is it are you keeping both the unprocessed and processed?

[00:02:57]
It'll replace it eventually. We don't want our storage to blow up if we have 10 copies of it already, so there will always just be one copy of the video then. Once we get done video processing, there'll be multiple copies, but right now there's only one, the source. I think I follow you. So we have source URL on metadata DB, but once we've processed, we might want to also store the process video URLs.

[00:03:25]
Yeah, we will, yeah, we'll have to pull from the DB and say, hey, is that a source URL? These are now all the other URLs associated with it. But we're not there yet. We will eventually. And hopefully I didn't, we take off metadata DB, it's called a DB because then we might overindex on the metadata part when it's actually holding a lot of other stuff. Maybe I should have just kept that other database there, like they threw me off, man.

[00:04:03]
OK, we have a video processing service, uh, what does it need to do? Say process video. Take the source video, OK. If we're splitting audio and video tracks, that I might have to do that and store those processed things somewhere. Transform them into the formats that we want to store it, store them in or to serve them in. OK. I'm buying that. So, actually, regret that.

[00:04:41]
Let's write out what we want to do. We could just build services, but let's talk about it just a little bit. It's. All right, so video processing, we need to, we'll say split the audio, extract audio, and we need to convert, convert source into, say various formats. And then grab the thumbnails here too, I suppose. We need to create the thumbnails too. Good call. OK. What's the trick here?

[00:05:52]
We need storage for processed audio and video. We're not, yes, we do. We're not there yet though. We haven't done that yet. Or is the video process service doing all that? That's what I was thinking. It could. We could write it like that. I'm open. Actually. You could have kind of a mediator service that handles the initial when you get when it gets the upload, it gets visual video, and then it passes that to the other services that maybe handle audio.

[00:06:38]
And it looks like that's what we'll be going, yeah, you could have the video processor that, yeah. And then so we could just say, hey, the video process services converting into various formats, that doesn't, does that make sense though? Is that going to be a very expensive task? At a certain point, we probably want to throw some async workers on that thing. Yeah. What?

[00:07:07]
We were just talking about async and we did a video, yeah. So why have one service do that because now it's a center point of all the performance issues. When we know what's the job, it's actually multiple jobs. We'll say there's 4 formats we need to convert it to. It actually doesn't matter too much. So that's 4 jobs essentially it needs to dispatch. That should be a worker pool that does this, does that, and then when they're all done, it notifies something to let us know that it's all complete.

[00:07:42]
Yeah, so Kubernetes time. All Kubernetes. So let's create a, how do I make that fancy for the surface become the queue, or do we want a service to create and manage the queue? Probably not. You need, you want something to manage the queue. You don't want the server to manage that because then now it's doing a lot, extracting the audio, dispatching the jobs, still creating the thumbnails.

[00:08:16]
You want to delegate that out as much as possible. Rabbit or Kafka. I don't know, call this convert conversion broker matter, and we'll give it a, are used for a queue. That one. Put in the queue. Put a couple of them in there that's. OK. Pretty good. So now, this is going to pass it some jobs. We'll just list what those jobs are. So it's going to say convert to 4K, convert to HD, something like that.

[00:09:35]
We're cheating a lot, but it's OK, etc., etc. Does the jobs. So now we need some sort of, right, worker pool. OK. The worker's going to pick up the job. It's going to do all the conversion for us. Now, we could get fancy and create different pools. We can say this is the 4K pool, this is the HD pool. Why would we do that? Because these tasks are going to be different sizes.

[00:09:55]
Converting a video to 4K is a very different size from converting it to like an HD, completely different size on the outlets, but we don't have to be too fancy. What else do we need to do? When the worker is done, it needs to notify our notification service, which is doing a lot of heavy lifting at this stage. It's not pretty. Then are you notifying the user for every conversion or just the one conversion, I would hope, or just the whole job, I mean.

[00:10:35]
We are kind of leaning on the notification service to do a lot right now, maybe. I don't know, what do y'all think? Should we have a different one, because the job's not. We're notifying that one of the conversions is done. Like we're just notifying that one job is done, but to be complete, like ready for the user, all the jobs need to be done. Would process service make sense to handle that in some async manner?

[00:11:12]
It could. I feel like we're leaning on it maybe too much though. We don't have to be frugal here with our services. We can spend up more, and we need to update the database, right with the paths for those uploaded videos or those process videos, yeah. Yeah, because we can, I think. If we pass, if every job is done and every job just passes back to notification service saying like job 1 is done, job 2 is done, job 3 is done, what's that doing?

[00:11:42]
That's adding more alone. Yeah, but it's adding state. It's adding state and notification service because now it has to keep track of all these jobs done. Service goes down. All this is lost. We have no idea. Workers don't know, they're not tracking that. So we could have a service here. That keeps track of that actually wouldn't have a service. Can we just send it back to the server?

[00:12:09]
Still, it's still keeping state. I mean, if you send, if you have your conversion broker, whatever, it goes back to the video service, video service alerts the server, server writes to the DB that it's done. Back to the video processors. Yeah, so you're after your conversion broker, right, like sends off jobs to the worker. It's brokering communication, it gets that response, OK, that job is done, it lets the video process server know, which in turn lets the original server know, and then the server writes the database at that point.

[00:12:50]
Yeah. Which database? The metadata database where the rest of the records are. That makes sense to me. Yeah. But you, but you were thinking about doing another database. I worry, I worry about leaning on the video processor too much for doing all these things, but. Yours makes sense. I'm good with it. You're good with it. I'm good with it. Maybe I just like creating services.

[00:13:31]
That's like my thing. OK, yeah, I'm going to get these arrows right one of these days. Yes, no, no, uh, yeah, uh. Yeah. Cause wouldn't the audio to be part of the worker pool, so it would just be like essentially one direction, I mean. Communicating bidirectionally, but one path, essentially, it's not. Yeah. Yeah, we probably should just pass the audio along too as part of the workers, because we actually have to convert the audio into different formats as well to match the videos.

[00:14:17]
Hm, OK. Assuming those workers are the thing that do the work, because like if I'm thinking about rabbit, it's a message on a queue. Some other service is going to pick it up and do the work. I would in this case, because we're talking about a large amount of data, I would play on the safe side and just say. Audio broker. Take that out. Well, hm. It's not quite right. We still I think even I made a mistake here.

[00:14:52]
We're still assuming that the video is being passed along here, and that's not what's being passed along, we're just passing along the message to grab the video. So the workers still have to go to video storage individually and grab that video. We didn't draw an arrow for that. We could, but now we have a lot of arrows. Oh, I see. I missed the worker block. Yeah, those are the guys that actually do work.

[00:15:22]
Yeah, because what we're passing here is just, here's a job to do, here's where to find it, and here's who to tell when it's done. They still have the, uh, request the actual video to work on from the blob storage, exactly. Otherwise, we're making video process service very, very stateful. Now it's holding the whole video and the audio, and that is a recipe for something to go wrong.

[00:16:04]
And just to. Let's make it super complicated, so many arrows, poop. OK. It's great, it's beautiful. If I had thought this far in advance, I probably would have laid this out in the way, we didn't have so many arrows. This makes sense to me. So now workers are grabbing from the video. They're going to convert it. The workers at this stage need to store that video somewhere.

[00:16:24]
So we need another database. We could put it back in the source video storage. But now I'd say probably not do that. Let's separate the two storages just to keep it simple. One of those, you don't want to put all your eggs in one basket sort of thing, but also when we're done, when the workers are done and all the videos are done, we probably would just want to delete this source video because it's going to be huge.

[00:16:57]
We don't want to keep it around too much longer. So let's not muddy up that blob storage with our actual video. And ideally, wherever we put that final cut of the video is where we could serve it from too, whereas the other video storage, we don't need anyone on the big bad internet having access to it. It's so big. I don't know. I'm not creative with names, completed videos or converted videos.

[00:17:29]
I don't wanna be thumbnail guy too much, but like, let's just say that the conversion broker also pays the thumbnails and dumps them alongside the videos. You know what, I'm going to wear thumbnails, uh, like a badge of honor. You thumbnail. It's yours. We'll let it slide, yeah, yeah. What we could do, because we have to create the thumbnails from the video itself, we could have the worker, that's part of the job, is it actually calls thumbnail service and pass along the message thumbnail service grabs the video and process the thumbnails.

[00:18:04]
The workers convert the videos and then we post them all into its own database. And then what we do at the end is we rectify all that, we update this primary DB with user ID, the metadata, all the converted, the IDs of the converted videos, and the idea of the audios to match those and the thumbnails go along with it. And then from there, we'd have one more service that is watching all this and it would notify hit notification service.

[00:18:33]
And then we send it back to the client and say it's all done, and here's the links to your videos. Easy peasy, mac and cheesy. By easy, I mean, it's not easy at all. This is a relatively straightforward flow, and you see how complicated it got really, really quickly. One of those is because I'm bad with arrows, and I probably made things more complicated, but it's a lot to think about every stage of the process.

[00:18:55]
Hey, I don't want to keep my server stateful. Where's the data at right now? Especially with async things, how do I know when that job is done? How do I not lose data in between? What if a worker pool goes down? What if the broker goes down, the queue gets filled up. All these things we didn't really count for. We didn't talk about replication or any of that stuff. We'll leave it out for now.

[00:19:23]
Um, if I were an interviewer, just kind of spitballing, I would probably really wanna be analyzing any cycles in the graph here cause maybe you could get an infinite loop between these services and something could blow up, but. Is that something that you would agree with, or? Yeah, it is, it is, the ultimate when a video upload flow or conversion flow is going to be a, what is it a DAG, a die.

[00:19:48]
Uh, was it dia, a cyclical graph or something like that. I'm forgetting right now. Directed, thank you. Yeah, that's what it's going to be. It's going to be one to one, back, one to one, and it's like that's the whole queue. So really what we do is we segment all of the stuff off into its own kind of process, uh, instead of having so many arrows, and it would just be one arrow going in, and then these are all just doing the work, and they pull it from the right database, so you're right, it is a graph.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now