Open Source AI with Python & Hugging Face

Model Contents & Text Generation

Temporal

Lesson Description

The "Model Contents & Text Generation" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve discusses the contents of models from Hugging Face, which consist of tensors, weights, and numbers representing text. Models like GPT-2 use text generation based on previous words to predict the next word or punctuation. Parameters like temperature can be adjusted to control the creativity of generated text.

Join Now

Preview

Transcript from the "Model Contents & Text Generation" Lesson

[00:00:00]
>> Male: I had one other question. Just what is the model format? Or when you're downloading it from Hugging Face, what does that actually look like on disk?
>> Steve Kinney: I mean, you could see. Can we actually. I don't know where it hides it. Here it is in a lot of ways, a bunch of tensors and weights and stuff like that.

[00:00:14]
It's not a file that you want to go digging through yourself, per se. And you probably saw if I scroll up, I don't see it anymore. But you'll see when it's downloading stuff, because I cleared the console, it'll download these things called safe tensors, which are ones that they've scrubbed for arbitrary code execution and stuff along those lines.

[00:00:35]
But as we get into how some of these models work, there were also just a series of numbers because all this text is getting turned into numbers and then back out to text again. So yeah, you can go in depending on where it puts it on your machine.

[00:00:52]
You could also Literjit if you go to the Hugging Face website. We go into Models, I'm pretty sure. Let's just grab a random one, Go into files and versions. You can go poke around and see some of these are probably huge, but you can actually not even on your machine.

[00:01:10]
You can go in for a given model and see all of the files that would have gotten downloaded onto your machine, so on and so forth. Obviously, Quinn 30 billion parameters are that sweet spot where they're obviously not as big as something like whatever OpenAI or Anthropic is using, but that sweet spot where if you are a developer with the laptop that most developers might have with a decent amount of RAM and a decent amount of disk space, this is the size of an open source model that you could 100% run on your machine.

[00:01:46]
>> Male: And it looks like it's four gigs. Is this like a packed format where it'll unzip or like.
>> Steve Kinney: No, a lot of times, like that one maybe is probably four gigs. Some of them are like quantized. And so quantize is like effectively, you know, there's both how many parameters you have and then like effectively how big are the floats, right?

[00:02:07]
So like they'll change in size, but like a lot of times they might. They take a lot more RAM than that. Like, disk space is usually not going to be the thing that you're crying about. It is going to be ram. Especially if you have a Mac where you don't have a GPU per se.

[00:02:20]
You have that integrated CPU and GPU and they share the memory. You just think about it in terms of ram, that will be where your pain is felt. And you can change different settings and do stuff to tweak that as well. So moving down to text generation, which is the second one we talked about, again, they are all effectively that same method.

[00:02:42]
I actually did not necessarily have to import it a second time, but I think in the beginning ones it made sense to do that, where if we just give it the first argument as text generation, again, we don't necessarily need to give it that model. It will choose a default, but I would kinda want it to also expose what the defaults were.

[00:02:59]
So if we wanna, instead of sentiment analysis, actually I wanna do text generation. We can do that same function, just a different first argument, and you will get a text generator function. At this point, there's all sorts of little knobs. You can tweak that. We'll tweak some of them later.

[00:03:15]
But showing you every knob in the beginning felt rude. So here we've got kind of like partial sentences, right? And you will see that we will have to do some stuff over our time together to make this kind of do the thing that we want because it gives you a little bit of appreciating how some of these other tools work.

[00:03:36]
So here are a bunch of sentences. And again, text generation is the one that we think about when we think about Gemini and Claude and chatgpt, but all it does is based on the previous words, guess what the next word should be or punctuation or should I stop talking?

[00:03:51]
Or something along those lines, right? And again, the wild thing is, it's like using GPT2 gives you a deep appreciation, I think, for how far some of these things have come. So we kind of create that pipeline, this text generator function, and then when we call it, we have some options that we can pass in.

[00:04:13]
And again, when you see that equal sign is very much like having the last argument in JavaScript be an object, where you just define the other properties and the first one being the first argument, so on and so forth, where we will then take our array of prompts, we'll go through each prompt, I'll print it, console log it, print it first we're going to say like, give me about 100 new tokens.

[00:04:35]
And you're like, what is a token? It's kind of like a piece of a word. We will talk about tokenization at length later. But the good enough for right now version is that it is a piece of a word. And then we talked a little bit before about that temperature, right?

[00:04:51]
Temperature can go. It starts at zero and it's basically if you give it a temperature of zero, it will always pick the most likely next word, no matter what, right? The higher you bump up that number, and we can bump it up, we can play around with it a little bit, the more creative it will get about this, right?

[00:05:13]
And there are some other ways to tweak that as well. Do you want it to pick from the 50 most likely words? Do you want to. How exactly you wanted to choose? Some of that are also other parameters that we will definitely see and talk about as well.

[00:05:28]
But let's go. Let's play around with it and let's tweak some stuff in here and let's see how it goes for us. So we'll go ahead, we'll hit play. It is gonna pull down the model. So arguably, one could argue that I should have hit that button. But it's a tiny little model.

[00:05:40]
It'll take about a second or two. Cool, cool, cool. I mean, even this model is, GPT2 is 548 megabytes. So that's interesting. And you can see that they are like not super good, right? And we will learn how to make them better and we'll learn about fine tuning if you want it in a certain format and all of that stuff.

[00:06:03]
But for our first flight into this, you can see one, that the sentence will just end when it hits that 100 words. That one happened to work, but it will just kind of generate. Yeah, there kind of stops magically as it goes through. So, yeah, future of artificial is at hand.

[00:06:25]
And if it makes a breakthrough, it will have a huge impact on everyday life. The world will be far more connected and more connected. It is not good, particularly one, because of GPT 2 and 2, because we have not done any of the important work that we'll do in a little bit.

[00:06:40]
But it is kind of interesting. You can see probably what some of the source material was there as well. Or the clickbait blog posts that were definitely fed into this thing are all very apparent, particularly with the early. It's like a lot of fun, honestly, just to kind of see sometimes how bad it will be.

[00:07:08]
But we'll learn how to make it better too. Cool. So the text generation, we can tweak some of these. What happens, for instance, if we turn the temperature down to like zero? So I can just do that. I can hit play. It's angry with me. Zero might have been too much.

[00:07:28]
It needs to be a float. Can you tell that he's usually a JavaScript engineer and just treats all numbers as the same. Get a 0.1. And now you can see that no matter what, it'll hit the hundred tokens. But it is going to give me effectively the same thing over and over again.

[00:07:53]
It will show no creativity whatsoever. Each time it, [LAUGH] that's gonna be my new motto from now on, by the way. The future of artificial intelligence is uncertain, but it is a possibility. That is going to be my official stance when my in laws ask me about AI.

[00:08:15]
So as you can see though, other than when it hits the max token, it shows no creativity whatsoever. It's not picking a different word every time. Given the same initial sentence, it will, it will go ahead and just say the same thing over and over and over again versus if, let's say we turn this up to a 1.0, you're like, well, I want creativity.

[00:08:42]
With creativity comes hallucinations, right? The more creative liberty you get on a wider range of what the next word could be. Because if you think about it, it's guessing what the next word is going to be and then for the word after that, it's using the previous words, including that word it just guessed, to come up with the next one.

[00:09:01]
So the more creative license you give it for the one word, you can see as this gets exponentially out of control. Let's see, the fun part about this is you never actually know what's going to happen when you hit the button. So very quickly it starts to turn into a motivational graduation speech.

[00:09:22]
In this case, the future of artificial intelligence is only a small part of the future. The rest is to come and it moves on to I am honored to be part of the new generation of entrepreneurs that I know and love. So at this point, I guess there were enough words that it generated randomly that it started to go off the road rails and believe that it was giving some kind of graduation speech.

[00:09:41]
Here it's in the middle of a list article, so on and so forth. So as you can see, you can tweak a lot of these and there are implications for both as you kind of go along. And that's true when we get to images too, right? You can see how much do you want to deviate from a given image or a prompt to get to different things.

[00:10:00]
And what's the answer? It's the answer is always in these cases. It depends. Unfortunately, when you have a thing where it is effectively randomness, which is the knob that you're turning, experimentation is effectively the only way as we go through. So yeah, one is that they do continue going versus going.

[00:10:21]
It is positive or negative. It can handle most things depending on what it's trained on. And like I said, you can like we think a lot about when we talk about using AI, like, I'm gonna grab Claude, I'm gonna grab Gemini. But sometimes the right answer might be taking a very small model and fine tuning it.

[00:10:41]
And we'll see how to do that in a little bit. Fine tuning it to just give you the things that you want. It's got some base knowledge and then you can add in the rest of the stuff that you want for exactly how long the strings should be, what format they should be in, so on and so forth.

[00:10:57]
The nice part is like you can do that all with these tools incredibly easily. And even on, like again we are going to constrain ourselves until I get impatient towards the end of this. But everything we can do, we can do on the free tier of this. Everything we're gonna do is on effectively.

[00:11:16]
Either things you could run on your laptop, if you had patience for the image stuff, you could run it on your laptop, just like, I don't know, like go out for the day or go to bed. But like generally speaking, all of this is stuff that is relatively low resource, not very resource intensive.

[00:11:35]
So some of the parameters that we had in there, the max length, how many tokens, that includes the original prompt, right? In all of these cases. Because otherwise if you do not give it one, it will just keep going so on and so forth, right? Like how many sequences we want to respond with.

[00:11:55]
So like we didn't set that one, we did say what we wanted to end and we'll see what these are later. Which is like there are special magical tokens in there that are not just words that are like end of sentence stuff along those lines. So yeah, we have got some of those put in place.

[00:12:12]
But yeah, you can say how many tokens that you want. The number of possibilities you want to generate at a given time, the temperature, which is that kind of creativity, low values are gonna give you the words that you think are gonna come. Like mediums will give you some amount of creativity in there.

[00:12:32]
And after you go past the 1.0 mark, now you're just having fun right now, you're not being responsible and you just wanna see the world burn, that's fine.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now