Open Source AI with Python & Hugging Face

Generating Images with Stable Diffusion

Temporal

Lesson Description

The "Generating Images with Stable Diffusion" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve demonstrates text-to-image generation, model selection, and fine-tuning prompts to achieve desired outputs. He also discusses the importance of exploring different models, adjusting parameters, and understanding the nuances of generating images using diffusion models on GPUs.

Join Now

Preview

Transcript from the "Generating Images with Stable Diffusion" Lesson

[00:00:00]
>> Steve Kinney: So I'm going to start with the dependencies. We can look at some of this real quick though because again we're going to switch in that other lower VRAM scheduler and stuff like that. But actually I want to open this piece that I hid for a second, so I will grab the diffusers instead of Transformers stuff where I get rid of the output.

[00:00:25]
The other thing that I'm doing here which is just trying to detect whether or not we have an Nvidia card which if we have like let's see which runtime am I in? So I'm on the free one, the T4, like whatever, we have an Nvidia card, note that we have that available and we're going to use that and set that as the device.

[00:00:50]
We'll see that down here as well, send it to the graphics card effectively, so there's some stuff in here that is maybe not interesting but arguably important or vice versa. For the model ID, I think later I'll have, there's ones where you could change the one we use, but you can also just any of the text to image models on hugging face are fair game or the ones I had in that previous slide, they're also fair game.

[00:01:19]
I will seek to turn that into a drop-down later, so instead of before you saw that auto tokenizer from pre-trained now it's auto pipeline for text to image. We give it the model ID that we want to use the safe tensors that we're seeking to use low CPU usage here, so on and so forth, memory usage, so on and so forth, swap idle layers with the cpu, absolutely.

[00:01:53]
Again, we're just trying to slice it up here, we got a Pygmy Hippo watching Instagram on a phone. Can't remember if that was one that was going to go well for us or if that's where I left playing around with it, I need to grant access to that token from earlier, that's the one that lives over in here.

[00:02:18]
Once you do that once, you're good, but I started the playground so it's technically a new notebook and for some of these it will become a lot more interesting to watch this. I will say I tried in a lot of cases given the way that we saw that you run one cell and then that variable is accessible in memory.

[00:02:45]
A lot of these are not like if you were just running a python script, it would run, you'd have your image, you'd be done. There are some times where it's like, obviously if you run the same thing over and over and over again, like it doesn't necessarily free up all that memory in the same way.

[00:03:03]
So if you truly end up in a place where you're getting out of memory errors, just go to runtime, hit, disconnect and delete runtime and start it from crash, in some cases, I did start going out of my way to trigger garbage collection. And then I stopped and asked myself, how many times are people gonna run these things over and over and over and over again other than me rehearsing?

[00:03:26]
And this is pretty fast, on the free GPU, he have a baby pygmy hippo in plausibly New York City, not quite sure, He could be on a phone, we don't know what's going on down there. But like, kind of uses this built in the built-in layers on these GPUs, watching Instagram.

[00:03:54]
Also like question of when this was even trained, not bad, all things considered, fairly, fairly quickly using again these same basic concepts. Again, this is an open source model, this is presently other than a PIP install, this is presently all of the code involved. And most of this, as you can see from my own notes to myself, was to get the memory down on the free tier, if you didn't even have those restrictions, one could theoretically do even more.

[00:04:41]
And again, the number of inference steps are how many times you would like it to go, the guidance scale is how seriously it should take the prompt, so you can play around with these numbers as well. So for instance, I don't know what happens if we turn that to a 20, I should, while I'm teaching, go to the fancier GPU.

[00:05:05]
This is not taking all that long anyway, so later when I show you something that will like, make your day, I will go to the fast GPU. Hey, it's something plausibly looking like an iPhone.
>> Student: [LAUGH]
>> Steve Kinney: Again, this is like I have the cats walking across the top of the screen at this point.

[00:05:26]
When my wife came in and said, would you like to spend time with me?, I said, honey, I'm working, and she believes me because she's the best.
>> Student: She gave you that look?
>> Steve Kinney: Yeah, I mean she also loves Moo Deng, that was the inspiration of this, I was pandering, I knew what I was doing.

[00:05:45]
But so these are in here right now with some of these tools, again, number of inference steps, like lower is faster, higher is better. Let's play with it for a second. Since t hings seem to be pretty fast today. What happens if I do five? I should keep looking at this number up here for like.

[00:06:11]
That was pretty fast, but the results speak for themselves, right? Because again, if we think about the process, it started with random chaotic noise. And I kept trying to peel away the random chaotic noise. As you can see, that didn't work. What is guidance scale? It's how closely you want it to stick to the script.

[00:06:42]
>> Student: Is higher better or lower?
>> Steve Kinney: I think, it's like temperature, so this is one round of denoising, from the chaos, it's not much of anything, it is kind of cool though. [LAUGH] Yes, it does not look like a pygmy hippo in New York City on an iPhone, noted.

[00:07:04]
I don't hate it, I like contemporary art, like modern art, if you will, what if we gave it 100
>> Steve Kinney: And not quite sure where we're going with this one? I again, strangely don't hate it, but that was with the guidance turned all the way. All the way down.

[00:07:34]
What did we have before?
>> Steve Kinney: But this, this is why I'm saying enough to be dangerous, and you know what, you're gonna really learn how to like tweak these by tweaking them, and you know what? I get it, I get it, I get it, the text wasn't really rewarding, but I couldn't have started with the text or the images that gone to the text.

[00:07:59]
>> [LAUGH] Whoa, what is that?
>> Steve Kinney: I don't know, it's got the six toes, though. [LAUGH] What else did I have in my actual guidance skills?
>> Student: Can you go backwards?
>> Steve Kinney: To previous versions?
>> Student: No, from an image to a prompt?
>> Steve Kinney: There are image to text classifiers.

[00:08:24]
>> Student: That's not stable diffusion, that's generating images.
>> Steve Kinney: I think I don't remember exactly which because I've not done a lot of the image to text stuff that famous, The Silicon Valley episode of is it a hot dog or is it not a hot dog? Those are trained effectively though, all these things kind of learn the same way though, you show it picture, that picture is labeled hippo, all these things.

[00:08:49]
Like the same way if you pull out your phone and you type in dog, you will get those pictures of a dog. And so like, that's just actually it uses a different technique where it kind of like looks for the edges and all the shapes and stuff like that.

[00:09:06]
And like makes them super high contrast and just goes based on these, like the pixels next to this pixel, over time, the knobs return to show dog, hot dog, what have you. That's terrifying, though, I didn't like that one.
>> Steve Kinney: I talked about this, which is basically like we do need to figure out if we can send it to a GPU, which you saw the initial block and that's effectively happening in the pipeline, I send it off to the GPU effectively.

[00:09:41]
So the device is set up here, device go to Cuda, which is Nvidia's thing, if it's available, otherwise stick to the CPU. And so we start with the pipeline, we send it to the device and again, like you can go up to a higher card and turn these off, tweak all the numbers, like everything is fair game, this is code you can use and have because you should play with it.

[00:10:16]
>> Steve Kinney: I think I set that at some point because I wanted some amount of stability, it's like where we start the randomness. The forward process, we take a clear image and gradually turn into noise, and then reversing that process gets us to the Stable Diffusion. All the stuff I had in the slides earlier, the prompt structure I had in the slides earlier.

[00:10:42]
Here's one where I've got, this is with negative prompts, and this is where I was smart enough to put the ability to swap out the models. So you can pick any one of these in a Jupyter notebook, this weird comment is what makes a variable controllable from these things, so if I literally.

[00:11:03]
Unfortunately, I'm in the playground now, but I could do it if I pasted this up here, no, now you don't like that.
>> Student: Whatever, leave me alone.
>> Steve Kinney: It's just that this isn't one of the ones on that list, it's like, now my variable is wrong, now it's right.

[00:11:35]
>> Steve Kinney: So now I could theoretically swap this one out, anyone remember what my original numbers were like, I think it was like 50 inference steps and 20 on the guidance scale. Yes, seems to be, so now I can pick a different model, obviously I'm going to pay a cost to download that model for a second, so let's go talk about something else while that downloads and then we will hopefully be pleasantly surprised when we scroll back up.

[00:11:59]
So again, smaller models will be faster, but like time doesn't seem to be totally bothering us at this point, I am just going to say show me the code or hide the form for a second so we can see everything. So here we've got the prompt, the positive, ultra realistic cinematic photo of pygmy hippo Jay walking through a neon lit downtown street at dusk, you can tell I took the original one and put in ChatGPT.

[00:12:27]
I was like, make this more detailed at some point because I didn't have the patience to write this one. And I definitely wouldn't have thought about a Kodak Portra 800 and then like things I don't want blurry, grainy, low res, overexposed watermark text, logo, extra limbs, cars, people.

[00:12:47]
This one, I definitely was angry, that one definitely looks like I wrote it from scratch, this one, I wasn't feeling up to it. So we'll run that one as well, let's see, is my other one ready?, no, we're almost, almost, almost, we're gonna have the big reveal all at the same time.

[00:13:10]
>> Steve Kinney: I don't know if I'd fully give that studio Ghibli, but like I also said probably photorealistic in here or something, no, I don't know. But it's worth playing with the different models and seeing what the different results again, I don't hate it though, is it what I wanted?, no, do I hate it?, also no.

[00:13:30]
Scrolling back down to the negative prompt again, like we shouldn't see something too much, I like that one.
>> Student: It's the best one yet.
>> Steve Kinney: Yeah, I am using a slightly different model on that one, no, it chose.
>> Steve Kinney: I wonder if I said like
>> Steve Kinney: Because if you say something like Renaissance oil painting, you owe it to yourself to do that, to be clear.

[00:14:05]
I feel bad doing this because like the creator was like really salty when Sam Alton changed his profile picture when he's made by like 4o or whatever. And like, I should probably just do oil paintings or something like that, but, well. But at least these aren't very good, with this cheap model that will run on a free GPU with all the memory stuff that I'm doing, so I don't feel that bad with that model.

[00:14:39]
Let's just do one more and let's grab.
>> Steve Kinney: Let's actually do that one so we don't have to pay, when I say pay, I just mean pay in time. Download a new model, we'll say like
>> Steve Kinney: But it's worth playing around with both the prompts and the negative prompts to kind of figure out the feel of getting what you want, and the rest of these too, like make it smaller.

[00:15:11]
Play around with the guidance scale and the number of inference steps why I stopped at 28 was, some of these are the end result of towards the end of an evening of preparing all this stuff when I was just like, I deserve to make pictures of hippos for a little bit.

[00:15:33]
>> Steve Kinney: New favorite, not a Renaissance oil painting, though, so talking a little bit about the attention slicing. If it's too big to fit on the table, instead of doing the whole page at once, it'll try to break it into smaller pieces and do each kind of chunk of it separately and pull it back together, because, again, it's trying to denoise it effectively.

[00:15:57]
So there's various ways to make it a little bit more and if you want a very, you know, like, what are all these models?, they honestly were just four or five different ones of me scrolling around. I went to text image and just picked lightweight ones and went for it.

[00:16:11]
>> Student: Can you try watercolor?
>> Steve Kinney: Yeah.
>> Steve Kinney: And sometimes watercolor, I would probably have better results, watercolor painting stuff, you know what I mean?
>> Student: Yeah.
>> Steve Kinney: And the parts of live, whatever you would call what I'm doing, that are not super interesting to watch are definitely will get, like Absolutely.

[00:16:43]
Like, if you've ever played with something like Midjourney before or even ChatGPT's 4.0, you definitely find that more so with text, there is definitely a science to getting the prompt right. So let's try that out, and again, these are on the free GPU, to be clear, with older models that I picked, now, are they watercoloring super well?, does that look like a walrus?, it's not important.

[00:17:18]
Might I, at some point, go and turn off some of the optimizations and grab a bigger model, and we might see something, I might, but I want to show you some other interesting things that we can do.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now