Check out a free preview of the full Build an AI Agent from Scratch course
The "Types of LLMs Overview" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott discusses different types of LLMs such as GPT, Claude, and Llama. He explains the concept of weights in AI models and how they are essential for the model's functionality. Scott also mentions the limitations of LLMs, including hallucinations and context window constraints.
Transcript from the "Types of LLMs Overview" Lesson
[00:00:00]
>> Scott Moss: Types of LLMs, we got base models or you'll hear foundational models, GPT, that's everything from OpenAI. That's there since ChatGPT 3 when they released that. Well, I guess really 3.5 is when ChatGPT came out. That's what took the whole world by storm. But they've been doing GPT for a while, GPT 1, GPT 2.
[00:00:17]
You can actually go make that in Python on a couple lines of code. It wasn't until GPT 3.5 where they took a bet and was like, let's train this thing on hundreds of millions of parameters and see how good it can get. And guess what, it was pretty good.
[00:00:33]
Claude is another one. That's some people left OpenAI, made another company called Anthropic. They have a thing called Claude, which is very similar to ChatGPT. It's actually pretty good. It's actually the main one that I use when I'm interacting with AI. Llama is Meta's version of this. I hesitate to say open source, cuz it's not open source, because you don't have access to the weights that make the model, but you do have access to the model.
[00:00:59]
You can do whatever the hell you want with it, and that way, it is open source, and they have many different sizes, many different parameters. Most models are shipped in different sizes. So you'll have this is Llama 2B, so that means this is a model that was trained with 2 billion parameters, right?
[00:01:14]
You might have another one that's 12B, it was trained with 12 billion parameters. The more parameters, the bigger the model. It might be gigabytes on your computer, but that means it's probably also better. So it really just depends. Fine tuning can really mess this up and stuff. But generally speaking, the more parameters, the better it is, the bigger the model.
[00:01:35]
For instance, GPT4, no one knows how many parameters that was trained on. Estimates are maybe trillions, who knows? [LAUGH] The whole internet, it was trained on the whole internet plus more, we don't know. And it could take months with tons of GPU compute, hundreds and millions of dollars to train.
[00:01:52]
And then most foundational models are really just good for generic things. Cuz a lot of these models are trained on generic data sets, so they're really good for generic things, which can be good or bad. It's like you can use this thing to do something generic, but because it's not specialized at one thing, it might not get that very specialized thing right.
[00:02:13]
So sometimes it's best just to not use a generic tool and switch to a specialized tool. Instruction-tuned models, these are models that build on top of these foundational models, right? So ChatGPT Claude, those models themselves. ChatGPT is an app built on GPT, and it has instructions associated with it, and error clauses.
[00:02:34]
It's all these things to help the model be good. Same thing with Claude, stuff like that, yep.
>> Participant: What are the weights in AI models?
>> Scott Moss: The weights are the set of probabilities that go into the model, right? So without the weights, the model is basically useless. So the weights are the output of the training.
[00:02:54]
So I guess the best way I could think about it is the model itself is kind of the brain, and then the weights are the thought process of how to actually use this brain, right? If without the weights, it's pretty much useless. The weights are the decided probabilities from the outcome of the training that you apply to the model so that it's actually useful or at least that's how I understand it.
[00:03:22]
I'm not a research scientist, and can't break that down any lower than what I just explained, but that's the way that I describe it. That's the way that I see it. So yeah, most models have the weights already in the model. They're already there, they're in there. They're part of the model.
[00:03:39]
But, yeah, when you're training and, and building those things, they're kind of separated. You can have the weights out here, and the model over here, and things like that. But yeah, weights are the probabilities. And they're called weights, because they're numerical representation on how much weight to put on a decision as you're going through a layer of a neural network, right?
[00:04:01]
Neutral network is these different decision layers where there might be many nodes on the network. And as it goes in more, the nodes start to collapse. And the weights determine which direction one node might go to another node on another layer. So those weights are the instructions on that, which again, are trained through correction, and human in the loop, and all these different things.
[00:04:25]
So you really need the weights, otherwise, you just have a neural network that has no idea. It's like, all right, cool, you wanna show me a picture of a dog and you expect me to know this is a dog. I don't know what a dog is, you haven't told me that.
[00:04:37]
So the weights are that. Think about it like a kid going into kindergarten, going through elementary school, the weights are the representation of the things they learned at school. That's a great way to think about it, yes. I'm gonna steal that. [LAUGH] You have domain-specific models, code Llamas, one for coding.
[00:04:59]
You have these medical ones. These are like LLMs that were trained either from scratch or fine tuned later on a very specific set of data to do a very specific thing, right? So if you want something that's really good at writing code, then sure you can train it on like human language and stuff, but you probably also wanna train it on a bunch of code and understanding that and intricacies of that.
[00:05:25]
So those are domain-specific models and you're starting to see a lot more of these pop up. So talk a little bit about context windows, some other limitations you might find is hallucinations and just errors in general. If you've never heard of hallucinations, it's just when the AI is lying.
[00:05:41]
Or I guess not that it's lying, cuz I think lying is you are lying, you know you're not telling the truth. The AI literally thinks it's telling the truth. It just doesn't know it. And again, because it can't think, it can only predict the next thing. So when you look at the hallucination, you look at what it output, you think to yourself, I can see why you said that.
[00:06:01]
That grammatically makes sense. And if I didn't know the fact myself, I might think it was right. But I do know the fact and I do know that this team didn't win the NBA championship in 1992, so that's wrong. But there's nothing wrong with that sentence, so you can see why it predicted that.
[00:06:19]
So you'll get a lot of that, and that's probably the number one problem with LLMs today, is that they hallucinate and every company that's building a product to help you with AI is probably trying to solve this in some way. So it's a massive problem. Also, context window constraints, we'll talk about that.
[00:06:35]
Computation is huge, I mean GPUs, they're gold, right? If you thought crypto made GPUs valuable, it pales in comparison to what AI is doing to GPUs. It's not even close. So I would imagine OpenAI and Anthropic probably harness and use more electricity than some countries just for their GPUs.
[00:06:59]
So I think I forgot what this quote that I saw. But it was like, every time you use, or every 100 calls to OpenAI is a sip of water or something like that, all the resources that it used. So it's like, is it really good for the environment?
[00:07:14]
Who knows? But there's a lot of compute that is needed. And really, the only other last thing you need to know here is that we don't need to know anything about setting any of these models up. Cuz we're just gonna use a suite API that allows us to get some AI in the cloud across the world somewhere, and it's gonna come back with our results.
[00:07:36]
And that's all we need to know, right? So we're gonna be good there.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops