
Lesson Description
The "Training Stable Diffusion with DreamBooth" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:
Steve demonstrates utilizing DreamBooth to fine-tune stable diffusion models in a lightweight way, focusing on subject matter rather than style. By training the model with a few images of a made-up term, users can effectively teach the model to associate that term with specific images.
Transcript from the "Training Stable Diffusion with DreamBooth" Lesson
[00:00:00]
>> Steve Kinney: Let's do like the next one is either incredibly practical or not practical at all. But it's like, it's the treat we all earned from listening to me try to explain Transformers before, which from the name alone should tell you, this is gonna be good. I decided that the appropriate thing to do at this point in the presentation was to use the cheesiest stock photography.
[00:00:37]
This is not AI generated, this is stock photography. And it just felt right. I think all these, every time you see a presentation about AI, it's AI image here, AI image there. All these images are stock photography. Cause I felt like that was like to show that we have been destroying art way before, before AI.
[00:01:01]
Okay, so Dreambooth is a technique. It is effectively just to ruin the lead here. If the stuff we were doing with Laura and parameter fission fine tuning a little bit ago was ways to basically fine tune a text model in a lightweight way, then Dreambooth is effectively a lightweight way to fine tune stable diffusion.
[00:01:35]
And if we said fine tune is for getting the style of something that you want, then this is, I would say, more for the subject matter, but a way to train one of these models very easily on a subject that you choose and with very few inputs. And it's effectively a hack, which is we know everything.
[00:02:04]
This is the prize that we deserved, right? We know that the way these things work is they take to your question earlier. What happens if I give it a word it doesn't know, right? We did not get the reward we wanted to by seeing weird unknown tokens. We just got garbage.
[00:02:20]
If you make up a new term that stable diffusion doesn't know and then you show it a bunch of pictures of that term, guess what? You've taught it? You've taught it a new term that it uses when it's denoising things, right? So basically the canonical example is SKS dog.
[00:02:44]
That's somebody's dog or whatever. But you have to make some kind of made up, nonsensical term, right? And then what you do is you upload some number of pictures of that thing of that made up term and you say this is made up term, right? And you train it so that when it hears made up term, it thinks about that image and you can get away with like three to ten made up images and get shockingly close.
[00:03:26]
And again, I'm gonna use, I think with this one, I think it works on the free one, it does. It will get a cup of coffee if you're running on the free one. If not, I will tweak it a little bit later. I'm pretty sure it does. But I was trying to refactor it and things went a little poorly.
[00:03:47]
So I definitely tested a lot more on the big boy one. Cuz I was like, I needed the cycles to go a little bit faster for myself. But you can basically then as long as you reference that thing, you will get versions of that, right. And so I will shows you.
[00:04:04]
Cause it kind of begins to open up the door of like even like the image to image stuff is cool. I think like some of the generating the images, like the piecing these things together. Right. I always say that like there are some things where it's like definitely you should learn this thing.
[00:04:17]
Cause it's important for your job. And there's some that are important because like my hope from some of this is that like a week from now you'll be like, but what if I took the question answering stuff and the classification stuff and this other thing. The interesting part is the overlap.
[00:04:32]
But Dreambooth basically is a way to effectively to fine tune one of these models make playground. What we're going to do is definitely going to install some dependencies. Nothing totally special here. There's a lot of extra little libraries. We're gonna walk through code step by step. I did at one point try to comment on a lot of the lines explaining what they do.
[00:05:10]
But even me reading them to you is gonna be rough. I think the concept is again a version of fine tuning with a relatively small data set on a relatively small model gives you great win, right? So what we're going to do is we're going to take. I forget which version of stable diffusion I chose at the end of the day.
[00:05:30]
But luckily it is very clear in the code somewhere we are going. And you can do this too. You can run this. In fact it'd be cool if somebody wanted to do it on the small one. Just to verify. Just don't tell me on camera if it doesn't work, I'll fix it later.
[00:05:42]
Just tell me privately. What we're gonna do is we're gonna come up with some new parameter. Like I said before, the canonical one is this SKS dog because it doesn't mean anything. The only core criteria is that it has to be something ridiculous and you need to use it consistently.
[00:06:08]
This was good enough in a lot of cases, but. And that should be interpolated everywhere. If it doesn't I will go find it and fix it. And what we do is Google Collab has this neat feature where you run this cell and you can pick files on your file system and upload it into the notebook space.
[00:06:42]
So if you just try to hit run all the only thing that will block you is this one because it'll be waiting for you to upload pictures until you do it. So yeah, my dependencies are installed. So next I'm going to do this and you can see I get this choose files.
[00:06:59]
Here I have ten pictures of my dog. And I think to make the magic trick important, I'm not trying to show you my dog like that guy who shows you his dog all the time. But like this is what Bonnabelle looks like. This is Bon Bon. I'm not looking for like pandering.
[00:07:18]
She's cute, we all know that. The point is commit this dog to memory. It will be important later. Okay? That's all I'm trying to say. I'm not trying to show off my dog. She's a pit bull, she's fine. There are cuter dogs out there. I've seen them, A lot of them.
[00:07:34]
There's huskies, there's tons of cuter dogs in my dog. But that's my dog and that's important for the story that I'm telling right now. Then we have some amount of just preparing the data. I will show you. It's just mostly setting the size based on the property. It's just mostly creating a data structure that has everything I need in it.
[00:07:56]
Arguably this is where I was like I should simplify this all down. And that's when I broke things so I didn't, mostly. Okay, go get all the images in the right directory. Just some non glamorous given the directory. Go look in the directory, get all the files. Only the ones that are images, not sub directories, all that kind of fun stuff.
[00:08:20]
We do use this thing in this technique which I think is interesting, which is you can also get better results by priming it with this prior preservation, right. And so I gave it ten pictures of a pitbull, right? A trick is then also have it also like generate a bunch of pictures like especially for a dog.
[00:08:48]
If you had like a very unique like favorite action figure or something like that, you're gonna get this could be a little bit easier. But for a dog, you know, to prime it with like more pictures of pit bulls doesn't hurt. Totally optional step, right? In fact, even if you have like at one point I had Like I did a cat first and the cat was in there.
[00:09:07]
I still got good results. I grabbed a stable diffusion 2 and then this is how strong you want that prior art to be. Right. Like clearly I want my pictures to be more. I'm just trying to like prime the pump a little bit. More descriptions of stuff we already talked about before that U-Net is basically how much noise do we want?
[00:09:26]
And the denoising, all of the fun stuff that we talked about in previous slides. So then what we're going to do is we're going to load the model and as you can see effectively we're pulling in that stable diffusion model. As you can see with this pre-trained model, name of path, pre trained model, name of path we're pulling in stable diffusion 2, I think I picked.
[00:09:50]
Yeah. And you can try out different models, they're here for you. Especially if you are trying to do this on the free tier GPU definitely grab a smaller older model as well. You won't get better results, but you might A get results or B, cause depending on the images and the size and stuff like that, you will be coasting right on that GPU memory threshold.
[00:10:18]
So like if you need to shrink your images or something like that. But yeah, find a pet, a sock puppet, a favorite action figure, a guitar, your MacBook. I don't care, yourself, right? Find three to ten pictures and that's all we're gonna do, right? For this magic trick.
[00:10:34]
We are going to find three to 10 pictures, we are gonna upload them and we have. No, don't embarrass me. Did I not run the right cell? Cuz you got to run these cells in order. Yeah, I didn't run that cell. So we'll download those real quick, coming.
[00:10:57]
And then we'll set this one to follow up right afterwards. So yeah, three to 10 pictures, you don't need many more than that. Again, every picture you add is going to add more time so you will get better results. But three times, about all you need. Obviously the closer to 10 it's gonna process those and do all of the.
[00:11:17]
As you were asking before, that image is effectively going to associate those pictures with your made up term. The other thing is yes, your term needs to be a word that does not exist. So if you put in, if you upload a bunch of pictures of your computer and you say computer, you're not going to get results that you want.
[00:11:34]
You need to pick a word that does not exist in the network at all. And we're gonna basically do the lightest weight. Fine tuning to this image model that we could possib come up with. All right, those models are downloading. We got these models coming in again. All the training parameters.
[00:11:56]
Explain them all one by one. But again, the pre-model training path is that one we picked earlier. These are just data structures that we're going to pass in there. We do want to center crop the images when we make them smaller. Lots of little fun things. Again, if you are depending on the size of the GPU you're running or if you're running on CPU on your own computer, you need to turn that number down.
[00:12:19]
Go for it. The higher the numbers, the more time everything's going to take. But this seemed to be the sweet spot for testing it. But yeah, I annotated every last one of them. Then we'll define a training function which is simply passing it to this one hugging face library that we saw before with some error catching.
[00:12:45]
The idea is with this made up word that does not exist in the network. What we should be able to do. I didn't, go away, hide yourself, hide yourself. I didn't even want to really show you. We're gonna run this training process, right? No, I'm gonna hit Run also, there you go.
[00:13:06]
It will go in. We'll train the network with it and then hopefully what we'll see if the live demo gods love us is the ability to kind of then use that model to create pictures of that thing you uploaded. So if it's your dog, you should be able to see.
[00:13:22]
You'll be able to create pictures of your dog in space, right? Your dog underwater, right? Cuz, again, you'll take that model, you'll fine tune it with your special things, right? You'll create a new model on top of again three to ten images. And then this is the part that takes long.
[00:13:39]
So three to 10 images and then you will see whatever it is that you wanted, whether it's your kid, your pet, your favorite action figure, whatever, in different contexts using that term to prime the network. So yeah, I'll show you a few examples. Like we have the one from when we ran it earlier.
[00:13:54]
Technically true. We have the one from when we ran earlier. And then I'll show you the kind of examples of other ones that they have as well. And you can kind of see, let's see. Is this the one with the examples? No, Let me see if I can find.
[00:14:08]
There's a bunch of like. It's interesting to watch the other ones like based on sock puppets or whatever. I might pull that up in a second, because I thought it was gonna be easier to find off the top of my head than it actually was. But the point is pulling the stuff in.
[00:14:24]
We have the one that came out of the oven, Right. If we saw the dog earlier, that is the aforementioned dog on the moon submarine did not work out super well. I will say that the capes were weird for flying. I think the bigger models with more time will 100% work better.
[00:14:42]
But, like, considering the fact that she has a bandana on most of the time. I know my dog has a bandana. It's weird. It's fine.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops