Check out a free preview of the full Build an AI Agent from Scratch course
The "LLMs Overview" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott explains what a large language model (LLM) is and how it differs from other AI models. He also introduces the idea of transformers and tokenization and token limits in LLMs.
Transcript from the "LLMs Overview" Lesson
[00:00:00]
>> Scott Moss: What is an LLM? Well, I guess it's right there. I was gonna ask, does anybody know what an LLM stands for? It's right there, it's a large language model. But what does that actually mean? So I'll kinda talk through how I understand it, and what's the difference between a large language model versus all the AI that's been happening before the large language models come to present.
[00:00:26]
And how is that different? So essentially, a large language model is a, first of all, what is a model? If you're kind of new to AI, a model is just, it's a file that's a set of statistics. It's a set of basically, some trained system that has a set of probabilities based off of some input that somebody trained it on.
[00:00:51]
So a good example might be a classifier that can detect cancer inside of X-rays. So, someone will create a system that takes these images of X-rays and they will label these images as like, this has cancer, this doesn't have cancer. And then they would train that on an AI, and then that AI can correlate all these different pixels, they turn that image into a matrix of pixels and RGB values.
[00:01:20]
And it can correlate those RGB values and pixels to whether or not this image has cancer or not, and then it can be corrected. So if the AI gets it wrong, you can introduce things like punishing it. So it can go down a different path on the probability.
[00:01:39]
So if you think about decision making, you can go left or you can go right. So like, if you say that, hey, that was wrong, then maybe next time, when it gets to that decision, it'll go left instead of right, and then it'll see if that's right. And just does that over and over and over again until it reaches a point where you're not punishing it anymore.
[00:01:54]
And then now you have this file of these probabilities that have some form of accuracy on the data that you train. And I would say that's a very specific set of AI, but that kind of goes in towards what you would see with LLMs and diffusion models and things like that.
[00:02:11]
And obviously, there's different types of AI, but a model is mostly just, represents some statistics, some probabilities. There's this other thing inside of it called a neural network. We're not gonna talk about that, nobody knows how that works. Even the people that make neural networks have no idea what's inside of them, which is kind of trippy, if you think about it.
[00:02:29]
But an LLM is a variation of that. So what they discovered was, they can get pretty good at having a model predict what the most probable next word in a sentence should be. At the end of the day, that's all an LLM is, it's really good at predicting the next thing.
[00:02:49]
And it's not something new, we've tried to do this in the past, and there have been tons of models that predict things for you. Like, imagine typing into Google on search, and it shows you some prediction of what you might search. That's always kind of been there, but that is way different than what an LLM does.
[00:03:06]
An LLM, what it does is, it takes an account of everything you've typed so far up until this point, every single sentence, and then that allows it to keep more context as it's trying to predict what the next character is. We haven't had that before. And we'll kinda talk about that architecture a little ways down on this page, but high level, that's kinda what an LLM is.
[00:03:28]
It's a model that's really good at predicting the next thing. So sometimes it feels sentient sometimes, but in reality, it's just predicting the next thing, which as you know, if it has to get good at predicting the next thing, then has to be trained on a lot of stuff.
[00:03:47]
So that's where the word large comes from. It's very large, cuz it has to be trained on billions of parameters, words, sentences, for it to understand what the next thing is. So, that's my rant on LLMs. The part about the LLM keeping track of context and knowing what to do, that's something called transformers.
[00:04:11]
So I'm just gonna go down to that part because I think it's interesting. So, here's a white paper, created by some folks, I think they were at Google at the time when they made this, and it's called, Attention Is All You Need. Now, this is super deep, but I do highly recommend people looking into this.
[00:04:29]
You do not need to know this at all to do anything that we're gonna be doing. And unless you're gonna be a research scientist, literally, this doesn't matter. But the reason I'm recommending you read this is because of how much I benefited from just understanding what transformers were.
[00:04:44]
And it helped me understand the limitations of what an AI or an LLM can do. But basically, they describe this concept of transformers. And you're gonna hear that so much, that if you ever heard of GPT, the T in GPT is transformers. So attention is something that they describe as what you get from transformers.
[00:05:04]
So I highly recommend reading this white paper. If you get deep into the AI space, you'll be waking up and reading white papers every single day. So if you're really serious about that, you can just start now. But this is probably the first one I recommend reading. They describe how this transformer architecture is really all you need to have some type of GPT-based model that is really good at predicting stuff.
[00:05:26]
And from there the bet is like, let's just throw more data at it and then it'll get smarter. And that's kind of where we are right now as a society is, throw more data at something that's transformer based and see how better it can get at predicting things.
[00:05:38]
And we haven't really deviated from that yet. So, I highly recommend reading that it. The trick is, if this just sounds crazy or whatever, take it, throw it into ChatGPT or Claude and say, tell me about this. [LAUGH] I'm a software engineer, I make React apps, tell me what this means, and it'll tell you, right?
[00:05:56]
So, use AI to to learn more about this stuff. Highly recommend it. There's some more stuff in here about like training processes and stuff. I don't think I'm gonna dive into this because it's, I don't think you're ever gonna do any LLM training in your life. Unless someone in here or someone online or whoever's watching this is gonna go be, maybe they're doing LLM ops.
[00:06:15]
Well, even LLM ops, probably not doing training foundational models from scratch. But like, yeah, I don't think anyone's gonna be doing a lot of training and stuff, just know that you need a lot of information to train a model, yes. What are tokens? Basically, tokens, depending on the model and how you encode it, are three to four characters.
[00:06:34]
So if you have a sentence, every three or four characters is a token. It's broken down that way because the AI is trying to understand things like, I'm gonna get really deep here. [LAUGH] So there's a numerical representation of all the words that the AI understands, and those are called vectors and embeddings.
[00:06:56]
And they represent the semantics, the meaning of a word in a numerical way. So like you might have, depending on how many dimensions that embedding is, let's say it's 1536. So you got 1,536 numbers inside of an array, and those numbers will represent the meaning of one token or something, right?
[00:07:20]
So the tokens are just the the alpha numeric way of representing those vectors and embeddings, but the AI sees it as numbers. And the reason why is because then it could do math to kinda figure out how things are similar. So like, if you imagine, if you had, I don't know, an array of three numbers and another ray of three numbers and you threw them on a plot, a three-dimensional plot.
[00:07:43]
You could do math to figure out how close those things were in relation. Okay, that's very similar to how some of the stuff works with AI, but across many dimensions, not just three dimensions, but 1,500 dimensions, right? So it's like, you can't even plot that out in your head.
[00:07:57]
What does that even look like? Yeah, we don't know, but all you need to know is, tokens are every three or four characters, that's a token. And the reason you need to know that is because every LLM is charging you based on how many tokens you give it and how many tokens it gives you.
[00:08:12]
So you need to understand that token count, and that is the only reason you need to know tokens, yes. Does the size of the token change model to model or API to API? Yeah, good question, so, token limits are determined by the model that you're using, and some models have what they call, it's called a context window size.
[00:08:36]
It's the size of how many tokens you can give this thing before it dies. So every model is different, there's generic models that strive to have the biggest context window so you can fit as many tokens as you want. There's very specific models that are hyper-focused on one thing that don't really have big windows for tokens.
[00:08:56]
So yeah, look at the documentation for the model that you use and it'll tell you how many tokens it can handle on the input and the output. And if you go beyond that, it'll just break. It'll just tell you, sorry, I can't do that. And the reason that is, is for a lot of reasons, but it's like, it could be a physical thing too, right?
[00:09:13]
When you think of all these tokens, this stuff has to be held in the GPU memory. So I'm not sure how much you might know about GPUs, but GPUs have memory, they have RAM associated with them, and you can only host so much of that in memory before you run out of memory.
[00:09:31]
So there's that part, and then there's also just the limit of that model and how it was trained, and everything that can be fit into its process, its neural network. So there's a lot of limits on that. But people are working hard to create million plus token size foundational models that solve problems and stuff.
[00:09:52]
But at the end of the day, yeah, you're going to run into a token problem and there's no way around it today. No good way. We'll talk about different strategies.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops