Open Source AI with Python & Hugging Face

The Fine-Tuning Process

Temporal

Lesson Description

The "The Fine-Tuning Process" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve demonstrates of fine-tuning a language model using a library from Hugging Face with 355 million parameters. He walks through tokenization, training loops, and the impact of fine-tuning on model behavior, showcasing how a small dataset and quick training can significantly alter model outputs.

Join Now

Preview

Transcript from the "The Fine-Tuning Process" Lesson

[00:00:00]
>> Steve Kinney: Cool. And so we know how many parameters do we have? So our layers, we're gonna train of the 355 million again. If you were to get the tiniest open source model today, you would like, [SOUND] 7 billion. That's a weak model, right? Yeah, like the tiny ones, I'll show you LLM Studio in a little bit.

[00:00:30]
You can see the big LLMs on hugging face that you can download and easily run on your MacBook today are in the billions of parameters. This one is 355 million parameters. But also, does anyone wanna watch me do this for six hours? And we're only gonna train a subset of those, right?

[00:00:52]
With our extra data. So we've got this library from Hugging Face where we can pull in this parameter efficient fine tuning with that model and this configuration. The task type is that causallm that we saw before. Some other little settings that we can talk about if we're interested.

[00:01:17]
Then we tokenize. Basically each example, we pad it to whatever the maximum length is of that tokenizer. So we want them all to be the same length. So this is going to be like whatever the quote is, I don't care how long the quote is. The quote is how long the quote is.

[00:01:33]
And then pad it with that extra space so they're all equal. And so put that in place. And this is what like our first, like this is what the first record is going to end up looking like. So we can see that this is our blank tokens at this point, right?

[00:01:55]
So all the stuff we saw before. So this quote from Oscar Wilde is going to be quote by Oscar Wilde, colon, space, open quote, the quote, close quote, end of statement, right? And so you can see the tokens. It breaks it up into the various tokens. It's like cool.

[00:02:14]
It was this long. We don't actually care about the rest of them. They're all these BS tokens which are slightly different for each one. If we chose a different model and a different tokenizer, the I don't care token is probably different. This is the padding token in this case.

[00:02:31]
And so that is effectively what the, that is the number version of this string. Because this is the text that we made really when we mapped everything. And then we just say, basically, here's all the fancy numbers, I'm going to feed them into you and you're going to go for it.

[00:02:50]
So the trainer API has a training loop where we evaluate it, we log it, we do some checkpoints along the way. These are all the arguments we're going to take. We're gonna write it to a file called gpt2-medium-quotes. We're gonna do one full pass through the data. You can do more full passes if you want.

[00:03:10]
Obviously doing more full passes with a law of diminishing returns is good, but then we gotta wait for that and I don't want to. So play with this. Of course, right? We don't necessarily need larger batches cuz everything's small. Yeah, these are. I think I was just tweaking stuff, trying to make stuff that it wasn't particularly long because again, if you are doing it for production or for something that you're trying to do, letting fine tuning taking 30 minutes is not the end of the world to have it forever.

[00:03:46]
If people are watching you on a Friday, Dustin would like to go home at some point. We do the shorter version for the fine tuning. How does the model know which parameters are tunable? Well, that's the cool part is because we're freezing the model, right? We're not actually fine tuning the model.

[00:04:05]
We're putting like a layer of effectively zeroed out layer, like a few layers of zeroed out stuff on top of it and fine tuning those, right? And so it's just effectively like. And that's the nice part, you don't have to do the whole model and you don't have to even do a subset and the underlying model doesn't change, right?

[00:04:21]
And so that's what makes it like pluggable. We can take that off and you don't have to store a whole second version of the model. You can kind of just keep that piece to it. Yeah, so we train this. And again like I think it took like five or 10 minutes on when I did on the free one.

[00:04:40]
But for our own sanity I bumped up to a bigger one where it took two minutes. And like, if it is truly like you are trying to train a model, get like stuff in a different format or style, whatever, honestly, three hours would be fine, half an hour would be fine.

[00:04:53]
Right, like whatever. But four, for the purpose of all of us in a room today I went with a relatively small model with a relatively small data set on a relatively fast GPU. But I did this on that T4 and it's yeah, go make yourself a cup of coffee or something and it'll be done and not too bad.

[00:05:12]
And like you could also, if you've got a little more patience and again, if you've got like a gaming machine, go do it on your Computer, right? Absolutely, on a M2 Pro I didn't have the patience for that. But you can do it right? Like easily. But like you can also like there are many, many a service out there that will rent you a GPU these days.

[00:05:35]
Turns out that is a widely invested in part of the ecosystem for reasons. So we did train this model and like it's saved. You know where we have this GB two Medium quotes, final whatever. As you can tell I'm still a JavaScript engineer because I did not use underscores in my naming convention because I don't like them.

[00:05:58]
So that's the telltale sign, that and when I can't tell the difference between integers and floats in most languages is another way to tell. Okay, so what I'm gonna do is I've got the base model where we're gonna have a pipeline base generator and we're gonnasay text generation.

[00:06:15]
We're gonna give it gpt2-medium and the tokenizer which I probably could have left out but I didn't. And we'll look at the result for one sequence and we'll look at the response and then we'll go to our fine tune model where I pass it in directly here with a few more stuff.

[00:06:37]
Low CPU memory usage. Again you can play around with some of this. You'll obviously get better results if you don't do all of the please don't blow through the free tier that I did but also I did that for you so whatever. Cool, cool, cool. And yeah, we grab the fine tuned model.

[00:06:58]
Yeah because I'm loading from memory when you put the string in Hugging Face Transformer library grabs from Hugging Face knows how to do that. Obviously I could theoretically if I wanted to push this model up that I made up to Hugging Face and you would have. My model didn't seem worth it.

[00:07:15]
Happy to do it if somebody really cares. But so it could just be a model on Hugging Face and that's the GitHub aspect of Hugging Face where you could fine tune a model to something. You could push it up, somebody else could fork it, fine tune it in a different way, right?

[00:07:32]
It's like the fun days of GitHub, right? Not that GitHub's not fun, but GitHub is very much a utility. We all take it for granted but it seemed new and interesting that you could fork code and repos. You can still do that. It just doesn't seem as amazing anymore just cuz we're used to it, we're dead inside.

[00:07:53]
That's a pre generated one. That's from last time I ran that. That's my newest, latest and greatest fine tuned one. So let's rerun that. We'll see the difference where. That's the original one where okay, it starts with again I wrote quote by Bob Dylan was my initial prompt.

[00:08:15]
So it's picking up from there. So quote by Bob Dylan comes from me. Everything after that comes from the model where you can see like okay, it doesn't know the colon. I mean it plausibly seems like a quote, but it's not keeping that format. And it's kind of like rambly and going on and probably until it hits the max tokens, if it ever hits the max tokens.

[00:08:46]
And then like I don't even know what it's talking about. Right. Versus the one that came into my fine tune model where if you think about the string that I formatted that first Oscar Wilde quote that we saw where it's like quote by Oscar Wilde, colon, open parentheses, like a reasonable size quote, end your conversation, right?

[00:09:09]
Even before, I think some of the stuff we were doing before wasn't like there weren't a lot of those end of, you know, end of sentence, end of whatever, end of segment tokens. So even when I said hey, listen for the end of segment token and stop talking, GPT2 was still rambling.

[00:09:22]
But now it has been pretty well trained. All those quotes are pretty short cause they're just quotes that you would see on an inspirational poster. And then they all have that end of sequence token which causes it to stop. And so you can see that two minutes of training on 2,500 lines of an open source data set radically changed how that relatively small model behaved.

[00:09:54]
All right, so who's tired of text? Is everyone bored with? Dustin's bored with text. Everyone's bored with text. Who wants to make images? Dustin wants to make images.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now