Open Source AI with Python & Hugging Face

Transformers Q&A

Temporal

Lesson Description

The "Transformers Q&A" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:

Steve answers students’ questions about how token IDs work, the difference between fine-tuning and inference, how models handle memory, and how tools like vector databases can provide external context, such as a code base, without retraining the model.

Join Now

Preview

Transcript from the "Transformers Q&A" Lesson

[00:00:00]
>> Speaker 1: So when you tokenize it, you're assigning a number to a word or token, right? But then through the transformer block, it's almost like you're developing like another layer of meaning to each token, given the context.
>> Steve Kinney: Yeah.
>> Speaker 1: Is that called a vector that-
>> Steve Kinney: Effectively, right?

[00:00:22]
>> Speaker 1: The vector-
>> Steve Kinney: The vector is the number, right?
>> Speaker 1: But it's just like another number.
>> Steve Kinney: Yeah.
>> Speaker 1: Right.
>> Speaker 3: Like bank and bank, when you token.
>> Steve Kinney: They're the same ID, always, right?
>> Speaker 1: One number, but then now you've got like a different set of numbers given context or something.

[00:00:36]
>> Steve Kinney: Yeah, so in that neural network, for instance, whatever token has the ID of 35 is not necessarily closer to the one that has 36, right? They are just effectively like numbers, right? And then it goes into this giant neural network where you start out, where everything is just the relationship between any two of those IDs is completely random, right?

[00:01:04]
And you start training it, which is like, hey, I'm gonna give you two words. Guess what the next word is? You have the third word already, right? Nope, nope, nope, nope, nope. So it keeps twisting the knob to figure out what it needs to do to get there, right?

[00:01:17]
And so those initial numbers that are the IDs of the tokens aren't. Because they're numbers, they're not actually meaningful in terms of the organization. It's like, what function did you have to go into to come out with the right answer for the next one? And all the weights and the math is tweaked through just brute force effectively.

[00:01:36]
It's like, for the 50 years of AI research that we've done, we're like, what if we just brute force it? And what if we just start everything random and just hammer at it, right, until the weird math? Cuz it's not like any human is tweaking that math. It's just a brute force.

[00:01:55]
As you random, you turn the dolls in the giant neural network to the point where it's like, cool, we've just hammered at it so hard that now all those dials from randomly tweaking them at the scale of billions more than that, right? Cuz you've got a billion parameters, I can't do that math [LAUGH].

[00:02:17]
And you end up with figuring out where all of them come along to. So it's effectively, there's no PhD student writing the math of how you get robbing banks and the banks of the Mississippi River. It is just like you feed it a whole bunch of strings, right, which should be the desired output, right, it takes a guess.

[00:02:39]
You slap its wrist when it's wrong, it tweaks some knobs randomly. You keep slapping its wrist until it figures out its own algorithm for not getting its wrist slapped.
>> Speaker 1: So just like you can get the tokens, you can convert it back to strings, can you get those vector numbers out?

[00:02:59]
So it's gone through, it's processed it. Now he knows if it's a bank of a river, a bank.
>> Steve Kinney: Yeah.
>> Speaker 1: Can you get those numbers? I would say, okay, now this word represents this number.
>> Steve Kinney: You can see all the tensors. If it's an open source model, you can.

[00:03:14]
No you can't for GPT 4.5, but for the open source models now there's not a lot. Can you look at them? Yes. Will your brain make any sense of it? So when we get to fine tuning, what's interesting, the idea of fine tuning is you take a pre-existing model that has been tuned and you just pick up where it left off, right?

[00:03:37]
And you start feeding it more strings and you say this is the output I expect to get, right? And you tweak those knobs a little bit more. But you don't technically need an open source model because like OpenAI has an API where they will allow you to give them money to have a layer of fine tuning over theirs.

[00:03:56]
>> Speaker 1: I had this argument with my boss about, about we're using Cursor and it has access to our code base.
>> Steve Kinney: Yeah.
>> Speaker 1: And I'm like, my understanding was like, okay, well now we're just taking our code base and we're uploading it to the cloud and it's going to train the model and you know, it's going to influence.

[00:04:14]
He's like, no, no, it's not like that. It's sort of a temporary thing that doesn't actually train the model. I don't actually trust him.
>> Steve Kinney: Yeah, those are tricky cuz on one hand, like Curses of Rapp, OpenAI, and Anthropic, but at the end of the day, bytes are going on a wire.

[00:04:35]
To be clear, there are privacy modes and stuff like that. Somebody who previously worked in an open source project, I didn't have to flip any of those switches so I can't speak to all their various privacy modes and stuff along those lines.
>> Speaker 1: We use some particular privacy mode, but I don't understand how it can ingest our code and adjust itself, it would readjust the weights or whatever to like what comes next, right?

[00:05:01]
>> Steve Kinney: Yeah.
>> Speaker 1: And not be changed. Unless you had your own private VPs in the cloud that nobody else access, but I don't think it's like that.
>> Steve Kinney: Well, it's different. There's whether it's the pre-training or the fine tuning, slash post training. Like then you are adjusting the weights when you go to ChatGPT, for instance.

[00:05:22]
>> Speaker 1: Yes.
>> Steve Kinney: They're not changing the weights at that moment. That is inference. That is, I'm putting this stuff out. It's coming out the other end of this, right? The act of then, they can then take a bunch of those strings later and train it. But like, as you use it, no one's doing it.

[00:05:36]
>> Speaker 3: If you start a conversation with ChatGPT, how does it remember what you just typed in like five prompts ago?
>> Steve Kinney: Because you're sending the whole thing back and forth every time.
>> Speaker 1: You're sending the whole thing. You're sending it the entire conversation every time.
>> Steve Kinney: Yep, that's when you start blowing up the context window.

[00:05:57]
If a chat gets too long, Claude will actually be like, it's time to start a new chat, right? Cuz everything's going over the wire every time. It's effectively a stateless protocol, right? You are sending the entire chat back and forth every time. Yep, so it's not remembering, now, granted, play a game with ChatGPT later, cuz OpenAI, you can turn this off, by the way.

[00:06:20]
OpenAI has a memory feature where they will also do that when we're talking about retrieval augmented generation, where they will also store your stuff and then find. ChatGPT knows what kinda car I drive, right? That's not cuz the model does. The model doesn't technically know. They are also storing that information on top of that and feeding it in with my prompt, right?

[00:06:47]
>> Speaker 1: So it's just like start off on the side.
>> Steve Kinney: Exactly.
>> Speaker 1: And then it's like, here's context.
>> Steve Kinney: Exactly.
>> Speaker 1: So how big is the context window? Like how many bytes?
>> Steve Kinney: Depends on the model. Gemini and Claude Opus and 03 are like 2 million tokens, right?

[00:07:04]
Gemini was the big dog for a while there. Yeah, Gemini was the big dog for a while there with the 2 million. And then Opus 4 and 03, I think, definitely have 2 million. And wait for the next generation, they'll be bigger, right? So a ton of context in there that they can load in there as well.

[00:07:23]
But yeah.
>> Speaker 1: But it's also interesting if you could run it locally, right? You had enough GPU to just run it completely locally, then you could actually train it as you go. It wouldn't have to feed the context all the time.
>> Steve Kinney: Yeah.
>> Speaker 1: It could just be learning as you go.

[00:07:37]
>> Steve Kinney: And I mean, you can technically do that with like a ChatGPT or cloud, which is again, like one of the tricks that people use. Its not really a trick, it's just a thing, which is you do basically your version of that OpenAI memory thing, right? Where let's say you have a bunch of data that's important to whatever you're working on, right?

[00:07:59]
You vectorize all of that. So you turn into those numbers, right, and a vector database, like Pinecone or Lance DB or like a thousand other ones, right? You can basically, you take your text, you vectorize it, right? You turn into those numbers, you store it in this database, right?

[00:08:21]
And then let's say you are gonna use ChatGPT, but you wanted to know more about. You wanted to know more about your information because you're trying to query your own data, right? What you do is I'm gonna type something to ChatGPT. Before it goes to ChatGPT, you grab it, you tokenize that prompt, you find all of the strings of your data that seem relevant, and then you kind of just like at the end of your prompt, go, and here's some context for you, and you jack your stuff in there, right?

[00:08:51]
And then it goes, based on the other stuff in this prompt. Hello, right.
>> Speaker 1: Okay, so going back to my original question, which was that, did they tokenize our entire code base and then use that as context every time we ask it a question?
>> Steve Kinney: Yeah, they tokenize it and they send it along with that prompt.

[00:09:14]
Whether or not they're storing it is a different question. Because they have to. If it's a given question you're asking, it will scan that file and the related files and there's a whole set of heuristics. It's like the first 250 lines of any given file, so on and so forth, and it will vectorize that and figure out which parts are relevant and send it along to get you an answer, yeah.

[00:09:37]
Whether or not they're training the model on that, they're not technically their models anyway, so, no, but like, is it scanning your code base, doing stuff and sending them over the wire, absolutely, yeah. So, yeah, and then we guess what the next word is. And then we also, again, in here, there are mechanisms in these models.

[00:09:58]
Right, in the underlying architecture that are also going to try to figure out how important one given word is in relation to another one, right? And so, the is probably not super meaningful to river unless you're talking about Bruce Springsteen. Right, but generally speaking, certain words are going to pull the meaning of those words closer together than others.

[00:10:26]
It's probably not strong words, but, again, thief and bank, river and bank are probably gonna pull the meaning of bank towards a given meaning, right? In that mechanism.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now