Check out a free preview of the full Build AI-Powered Apps with OpenAI and Node.js course:
The "Intro to LLMs" Lesson is part of the full, Build AI-Powered Apps with OpenAI and Node.js course featured in this preview video. Here's what you'd learn in this lesson:

Scott introduces Large Language Models (LLMs) and explains what they are and how they work. The role of these models in the AR world and the use cases for LLMs, such as writing, content creation, customer support, research, and education, are also discussed.

Get Unlimited Access Now

Transcript from the "Intro to LLMs" Lesson

[00:00:00]
>> Anyone know an LLMs, wanna give it a shot?
>> Large language model?
>> Large language model exactly. It's exactly what it sounds like. So let me kinda break it down. Large language model. So in AI world there are these things called models. You can think of models as it's really just like this gigantic file that represents, weights.

[00:00:22] And basically, it represents statistical choices that have to be made by some system, and it kinda guides it throughout those choices. It's just a bunch of numbers. And that's using output of something that has been trained on some sort of data. And those files, depending on how much data went into them, can be large files, they could be smaller files, things like that, but that's what a model is.

[00:00:47] A large language model is something that is going to be useful at understanding and creating human language. And that's why it's called a large language model. Large being that it was trained on, and in this case, billions of tokens of training data, so billions of parameters. Yeah, that's basically what that means.

[00:01:12] So you can think of them as, I mean, if you've ever used ChatGPT, that's an LLM, right? Using GPT 3.5, or 4 if you switch that over, they feel very smart. They almost feel conscious, right? How do they feel that way? That basically comes down to one big discovery not too long ago actually, something called transformers.

[00:01:42] Up until transformers was created the way that people have built LLMs, like some of the early versions of GPT and this other thing called Burke. When those LLMs tried to make predictions on text, they wouldn't consider the characters that came before it, and the text and the words and the semantics behind those characters.

[00:02:04] They would only try to predict what would come next based off of where you are now. They didn't have this recursive graph understanding of everything up until that point. So the context was very limited on what it needed to be able to predict something. So something called transformers with attention basically changed that, so this kind of upgraded the LLMs to be a little more aware of the intent, or the meaning.

[00:02:34] Or the semantics behind whatever language as the input was coming in, which therefore exponentially, upgraded the quality of the output of those LLM systems. So now because they know so much more about what is being said, they have more context on what they can predict. It's basically like, if you were listening to me talk right now and you didn't hear anything I said until just this last word, how would you predict what I was gonna say next effectively?

[00:03:01] It'd be really tough, but if you do everything I said up until this point, you might have a better chance of predicting what I might say next. So that's the difference on a very small scale. So yeah, pretty effective technology. So all today's LLMs are mostly based off of transformers.

[00:03:19] That's the T in GPT is transformers. And attention is the technique in which it uses. So some of the ways that you might see LLMs crushing it today, you have things that you probably know, like writing and content. So if you need a blog post written or something like that, it's very helpful for that.

[00:03:37] For me, I use it to help me make sense of some of the code that I was writing. And in a way that I wanted to present to you also was actually one of the first times I used an LLM to help me figure out the best way to deliver a curriculum versus me just doing it myself.

[00:03:53] So that was a new experience for me. Customer support is a big one. Research is very reliable up until a point and we'll talk about that. Education is huge, using LLMs to learn things a lot more effectively. When I was actually going back and learning a lot of those math concepts, I used GPT, to help me make sense of the things that were reading in a textbook because it didn't make sense to me.

[00:04:17] So I'd always ask you, give me a real world analogy of what the hell this means, because this explanation means nothing to me. And those analogies actually helped me see it a lot better cuz I need to visually see things. And then, obviously, automation, because you now have this system that can understand unstructured content like human text.

[00:04:38] Well, now you can automate things that you couldn't previously have done before. So there's a lot of human only tasks that you couldn't really automate efficiently enough to make it worth it that you can now, so we'll talk about some of those things as well. And if you've ever used Copilot or anything like that, that's what I'm talking about.

[00:05:00] Yeah, so here's this, some other stuff about why they are super cool, if you wanna read a little bit. But basically it just talks about how the predictions are made, how it's trained on tons of data, things like that. I don't think you really need to know any of this.

[00:05:17] I'll just put this in here as context. If you really wanna get into some of the more details about how the LLMs are created and stuff, I think the most important thing to know is that it is not conscious. [LAUGH] It is not or as far as we know, I mean, do we know?

[00:05:32] I don't know, as far as we know, it's not. But then again we don't really know how the inner layers of a neural network works. As far as we know, we could also have a neural network who knows? I didn't say that, but yeah, it's not conscious, it's just predicting things based off the context.

[00:05:51] So it feels very powerful, and that's because it's been trained on a bunch of data, which requires tons of compute that you just don't have sitting around. You need millions of dollars worth of compute and tons of time to be able to train one of these things, and even then, it's not good enough.

[00:06:08] There's still more training after that, that needs to happen. And then, as far as the LLMs that we're gonna use today, like I said, we're gonna use an OpenAI. OpenAI, see if they have a list of all there's somewhere, they have tons of LLMs actually should be one of their playground, there we go.

[00:06:28] They have tons of them. Here, if I click on Show More Models, these are basically all the different models that they have. They start off at like GPT 3.5 Turbo, which is the default one that if you use ChatGPT, that's the one that you're talking to. It's their fastest one, it's pretty good, it sets the standard of what things are.

[00:06:49] Since then, they've made so many more releases. The way that this works is that each number after it is basically, you can think of it as like the higher number here just means a newer version. There's more that goes into it, but that's basically it. And then sometimes you might see something like 16K or something like that.

[00:07:07] That describes something called the token limit. And we'll talk more about the token limit and context window when we get into some of the examples. I don't wanna get into it too far, but this just means that this thing can handle more text than the thing that doesn't have this by default, it doesn't handle 60,000 tokens of context.

[00:07:28] And then you have the next version of that which is GPT-4, so better. Right now, GPT-4 is probably the best LLM out today, there really isn't anything that comes close to how good it is, but there are things that are more than good enough, including 3.5. So most of today I'll be using 3.5, you can switch over to 4 if you wanna spend some money on, when I say spend some money, I'm talking like a penny or less, it's still very cheap.

[00:07:54] But I'll mostly be using 3.5 because it's not gonna be that big of a difference per se, but you can switch out to wherever you want when me run the examples. And then, of course, there's other ones you have, like Claude from Anthropic, which is a very good alternative, it's basically the same thing.

[00:08:13] They have an API, they have a really cool stuff. I think the models that they currently have right now are mostly all better than 3.5, but not as good as GPT 4, so somewhere in between. And then there's Cohear, there's also a huge list of just open source ones that you can go to like Hugging Face.

[00:08:37] I know it's such a weird name right? I love it though. Hugging Face is like the GitHub for AI, basically. And there's just tons of models on here and the biggest one might be called Llama 2. This is probably one of the biggest open-source ones made by Facebook, that if you train it a certain way, it can be just as good as 3.5, if not better.

[00:08:57] But probably not as good as 4, depending on how many parameters you want. So this version is Llama 2, 7 billion parameters. And they have another one that might have like 20 billion parameters or something like that, that makes it better. And that's another difference, Claude, Open AI, those things that are hosted, you don't need any infrastructure.

[00:09:14] You just hit an API, you get back a response. The open-source ones, you have to deploy those somewhere or run them locally, which means you need a GPU. So it's open-source, but you still need the infrastructure to run it. You can't really run these things on a CPU yet efficiently, they'll be really slow.

[00:09:31] And that's just because, like how a GPU runs compared to a CPU, the CPU doesn't run as many processes in parallel as a GPU does. The GPU was built to show frames on the screen very fast, so it's built in such a way that it can do a bunch of smaller tasks in parallel way better than a CPU can, where CPU might do a few things in parallel.

[00:09:55] But those few things are very high compute, like some algorithm or something like doing Fibonacci in one process and crunching numbers in another process. Whereas a GPU is like, I'm throwing frames on the screen. So it's great for something that needs a lot of work done quickly, in the case of generating text.

[00:10:16] So that's why you need a GPU to run a lot of these things, and that's why there's a GPU crunch. Right now, that's why NVIDIA is one of the biggest companies in the world right now. Because everyone needs a GPU to run AI. That's kind of where we are right now.

[00:10:29]
>> Any thoughts on the AI for personal usage, like diet plan calendar set up?
>> I use it all the time for that. I mean, there are people who are creating apps around that to help you but yeah, you can go into ChatGPT. I've literally went into ChatGPT and I was like, here are my goals, create me a fitness plan.

[00:10:49] And here's what I'm good at, here's what I'm bad at, here are previous injuries that I've had. And it'll create me something, I don't take its word as fact because, again, it's just predicting what it should say next. But for me, it's a great place to start. For me, like, yeah, that's okay, I can see that, and wait, that actually doesn't make sense, I'm not gonna jump rope for three hours.

[00:11:10] Why did you say that? That sounds stupid, but I get it. Maybe I should jump rope for 20 minutes, though. The benefits of why you said I should make sense. So it's just like a great place to start, so I highly would recommend. Basically, as far as personal usage, bringing in something like GPT where you would typically go Google something, I'm gonna go look this up.

[00:11:33] Cuz if you think about how that works, you're looking something up, you're hoping Google responds back with some article, or something that you can read, that you can then like try to make a decision off of. You can skip a lot of that by just going straight to GPT because it was probably trained on those same articles that you're eventually gonna click on anyway, so it's just another additional research tool.

[00:11:56] And then even Google and Bing have AI results built into the search engine now that, I don't know, can be useful sometimes. I would say one out of ten, the Google stuff that I get, let me see, what is a car? Right, I can click Generate. And yeah, I'll get some AI stuff here from Google, right?

[00:12:22] So it can be useful, but, yeah, for me, I think of it as more of, it's just another tool to research. So take it with a grain of salt. All right, so everybody knows what an LLM is. Everybody's educated on that? And just to clear, LLMs are not way different, but they're mostly different from things like if you ever use mid-journey and discord, that's not an LLM, that's a diffusion model.

[00:12:44] Very similar to LLM is just that the output is an image. And that's through some diffusion process that we're not gonna cover today. But I actually do have a blog post that goes into that very scientifically that I can link to later if you wantna check out how diffusion models work.

[00:13:01] I actually broke down a few of the scientific white papers and tried to understand it from that level and explained it in a way. So that one's way different as far as the output, but very similar as far as it being able to understand human language and then outputting something.