Transcript from the "Setup Node & OpenAI API" Lesson
>> Okay, enough with the talking, let's write some code, right? Okay, so like I said, we need node, you need your API account. Let's do some Hello world stuff. So let's just hop right in. So like I said, I already have the repo here, I'm just gonna make another one.
[00:00:18] Mkdir ai = course, And I'll make this bigger. So I got enough. Cool, okay, start project from scratch. So I'm gonna use my npm here. And that that, there we go. And then the one thing that you wanna install is gonna be openai, like that. So make sure you install that.
[00:00:47] And for my editor, I'm using Cursor. Anyone ever use Cursor before? No, okay, so if you use VS Code, it's the same thing, it's actually just a fork of VS Code. It's the exact same thing, all your plugins work, your keyboard shortcuts, everything works. The only difference about Cursor is that it has AI features built into it that are, in my opinion, way better than Copilot, it's a million times better.
[00:01:16] And I think they chose a fork cuz it just makes them feel more official, but they could have made a plugin too, and it would have been just fine. I might use some of those today, maybe not. Just an example, but yeah, I use Cursor. If you're wondering like, what the hell is this?
[00:01:29] Yeah, it's just an editor, it's free. You can add your own GPT key to it and not pay them anything, or you can pay them and they'll use their GPT version for you, but that's what I'm using. If you're wondering why I have this thing, that's what this is.
[00:01:44] Let's make a new file here, I'll just call it index.js. Also wanna make sure I'm using modules, so I'm gonna go to my package.json and add a field here called type and put modules. That way, I can use ES6 imports and exports, and not have to use require and module.exports and stuff like that cuz I'm not going back to that, [LAUGH] I'm done with that.
[00:02:08] No more require, I'm over it. If I was using Bun, I would just use TypeScript. So, cool, now let's make something. Okay, so the first thing we wanna do, we installed openai, so let's import that. So we can say, we were to say OpenAI from openai, like this.
[00:02:30] Actually, I'm gonna make a prettier file so I can get my, Get my linting and my fix. I don't like semicolons, so I always put that on false, and I only like single quotes, I'll put that on true. It really bothers me, so there we go. So we got OpenAI, and then what we wanna do is we just wanna make a new instance.
[00:02:54] So I'll say openai = new OpenAI, like this. And right here, this is ready to parse in your API key. So you can pass in your API key like this. You can also just expose it as an environment variable with a specific name, it will pick it up from there.
>> That's actually the way that I'm gonna do it cuz we're gonna be using this a lot and I just don't wanna be writing it a million times. So I'm just gonna make a .env file, like this. And I'm gonna say OPENAI_API_KEY=, and then I'm gonna go get my OpenAI key.
[00:03:35] So if you're following along, you might feel the urge to copy my API key that I'm about to get, but I'm gonna remove it, eventually. So, good luck. So here we go. I'm just gonna call this demo-key, create myself a key. I'm gonna copy it, and I'm just gonna paste it in here.
[00:04:03] Okay, got my open API key, and I just need to load up the environment variables into my environment. And the best way to do that is just to install something called .env. So npm i dotenv, one word, all lowercase, dotenv, like that. I'll install that, and at the top of my file here, I'll just import dotenv/config.
[00:04:31] And what that's gonna do is it's gonna find a dotenv file in this repo, take all the environment variables out of it, load them up in the environment, and then move on to the next line. So by the time my code runs, the environment variables have already been added to the environment.
[00:04:46] So I don't have to do that manually. Okay, there we go. And we can just not even pass an object there because, again, this is gonna read from the environment variable. All right, so let's actually just try to, I don't know, talk to the AI and see what happens.
[00:05:02] So the way we can do that is we can say openai.chat.completions.create, and this takes an object. And this object takes a few things. So the first thing is model. So what model do we want? And if you see, here are all the models that this SDK supports, there are tons of different models here.
[00:05:32] For this example, it doesn't matter which one you pick, they all have different costs associated with them, or 3.5 and 4 have different costs associated with them. And then between those two, like I said, the higher the number, usually, the later the release. And then the number with the k on it just describes something called context limit, as in you can send it more and it can send you back more before it hits a limit.
[00:05:57] That's just what that number means. So you can see they're up to 32k on one of these models. Claude has one that's 100k, but there's problems with that, it's not always the more, the better. And we'll talk about that when we get into the chat example or the semantic search, one of those, I'll talk more about it.
[00:06:14] But for now, I'm just gonna pick 3.5-turbo, it doesn't really matter which one you pick. So I'm just gonna pick that one, and then we're gonna say messages. messages is gonna be an array. And the way messages work, it's an array, so they're gonna be in the order in which they are in the array.
[00:06:34] The format is very specific. So a message is an object and there's always a role, right? And then for the role, it's one of these four things, it's either an assistant, a function, the system, or user. The two main ones are user and assistant. Assistant is like the AI wrote this, this is a response from the AI.
[00:06:57] A user is this is you talking to the AI. So if you make a new message, you have to make sure the role says user. If the role says assistant, that's a message you got back. System is, if you've ever used ChatGPT, there's a way for you to kind of, what's called, steer the system and kinda give it a role.
[00:07:16] Like, hey, you are a pirate that only talks in pirate slang and you're educated on physics. So you're talking to the system as a whole and not so much the assistant. It's like instructions for the entire system. So you typically only do that at the beginning of your chat history, and first, we'll do that now.
[00:07:35] So I might say role system, and then I'll say content, which is the message I wanna send. And I'll be like, You are an AI assistant, answer, Any questions to the best of your ability, right? So that's a system message. This is not me talking to the AI, this is me kind of priming the system in which the AI operates in.
[00:08:06] I like to think of it as the background story for the AI. This is its background, this is its origin story. That's the way I like to think of it. Like I said, you typically only do it at the beginning of the message. And then from there, for instance, let's send a message.
[00:08:23] We're gonna say, role: user, right? So user, and then we'll send a message that says, or I'm sorry, content that says Hi. We'll do that, we'll say Hi. Okay, and then what we wanna do is just wanna get the response back from this. So we save this in a variable, I'll just say results = await this.
[00:08:53] So if you're on version 18 of node, which you should be, you can do top level awaits. If you're wondering how am I doing await, not inside of an async? You could do this at the root of a file now without having to be an async function. If you're on the latest version of node, some's gonna await that.
[00:09:09] And then I think we'll just log this for now so you can kinda see what's coming back and then we'll talk about it. So let's do that, and then yeah, let's see if we broke something or not. So now we'll say node index, we'll run that. And you can see, this is what I get back.
[00:09:28] I get back an id, some pretty cool stuff. I think the two things that you really wanna look at are, one, the usage. So this is how many tokens my prompt has. You can think of a token, basically, I think it averages out to every three or four characters.
[00:09:45] You can kinda think of it as softly is a token as a word. It's not a character, it's not like every character is a token. It's on average, three to four characters, maybe a word. So you see I had 29 tokens, and then the completion tokens are the tokens in which the AI responded back with, so how many tokens it had.
[00:10:04] And then you combine them all, these are the total tokens. And at least for OpenAI, ChatGPT, you get charged on how many tokens you are generating because that is a linear cost for them. However many tokens their system has to generate, that's more compute that they're charged for.
[00:10:20] That's more electricity, that's more time. So they charge you for that, and they basically pass that cost on. So the more tokens you have as a prompt, and the more completion tokens that are sent back, that's how you are charged on a token basis. So there are products out there that help you figure out how to maintain this cost.
[00:10:38] How to make a router that sends specific prompts to specific AIs that are optimized for a specific token causes. It's crazy, it gets really nuts. So you got that, but the other thing is the choices, and this is where the AI responds. So let's actually take a look at that.
[00:10:55] So I'm gonna log results.choices(0). Let's run that again. And you can see I get back an object, it has a message, and you can see role is assistant. I mean, this is the AI talking now. role: assistant, and then content, Hello, How can I assist you today? And then finish_reason is stop.
[00:11:16] That just means it had nothing else to say, like I'm done talking to you. This is important later when we talk about functions cuz sometimes it's not finished. It might respond and it might not be done. Because either a, it hit a context, it hit the limit of which it can respond as far as a token limit, and/or it needs you to do something in your code before it can continue and provide more context.
[00:11:37] So it might not always be done even though it responded, and that's where it can get pretty tricky. If you ever use ChatGPT, there's a button that says, or sometimes it will stop halfway through the generation, you have to say, keep going. This is it, it hit a limit on how much it can respond with, but yet it's not done yet.
[00:11:56] So pretty cool stuff. But anyway, yeah, so we get that back. So if we just log that, .message.content. And then we can just change this, we can be like, Hi! Can you tell me what is the best way to learn how to do maths? I don't know, let's see what it says.
[00:12:26] There it is. So you can see, the more tokens it needed to respond with, the longer it took. This is why the cost goes up, cuz that is a cost for them, right? And yeah, it kinda went crazy here, it really thought about this. I'm not gonna read this, but yeah, the thing to notice here is that if I run this again, what do you think would happen if I run the exact prompt again?
[00:12:50] You think I'll get back the same answer?
>> No, it's similar.
>> Hopefully it's similar, but it's definitely not gonna be exactly the same. So the important thing to remember is that LLMs are non-deterministic, you're never gonna get back the same thing from them twice. And that is probably the hardest part about building apps with these things, is that you're never gonna get back the same thing.
[00:13:12] So if you need some structured data or some consistency around what you're getting back, you're never gonna get back the same thing. So how do you build around? Imagine your database didn't have a schema, and every time you created, it was a different shape every single time, and you didn't know what the shape was gonna be.
[00:13:30] How do you write code against that? There's no contract there. So a lot of the tools and things are trying to solve this problem, and we'll talk about the different solutions around it. But that's usually the most difficult part, is how do you get back consistent, constructed data.
[00:13:45] How do you test this? It's non-deterministic. How do you evaluate this? [LAUGH] It's kinda hard, but pretty exciting. We've all talked to it before. If you haven't, what is a chatbot? I mean, basically, it's an assistant in which you can talk to. I think the main difference in what we just did and what a chatbot actually is that it has memory, right?
[00:14:08] The example that we just wrote, right now I can't say hey, my name is Scott and then ask a follow-up question like, what is my name? It won't know because, let me describe it in a better way, right? So if I say, Hi, my name is scott, right, like this.
[00:14:33] And I run this, this is cool, how can I assist you today, right? I can't run this again and be like, hi, what is my name? It's not gonna know. It's like, what? What are you talking about, right? So the only way to guarantee that it knows what you're talking about is you have to feed it the entire history of your conversation forever, [LAUGH] up until it hits a context window.
[00:15:05] So really, ChatGPT is just what we just built. Plus, it's being smart about how it sends the history, because by default, what it'll do is it'll just keep sending the history the whole time, right? So for instance, if I say, Hi, my name is scott? And then if I got back the response from the bot and added it here, and then I did a follow up of what is my name?
[00:15:31] Then it would know, but that's because I included the entire history, right, every single time. So eventually you do hit a limit of how much tokens you can send, and then it can't remember anymore, it ran out of memory. So then you might have hit that in ChatGPT, where it's like, I can't talk anymore, I forgot, I don't know what you're talking about and hit its limit.
[00:15:51] So at that point what it tries to do, there's different techniques, it'll try to summarize the entire conversation up until that point to create a summary. So it might miss some details specifically about the conversation, but it will just make, basically let's say you have 50 entries, and then on the 51st one, that's when it hit its limit of memory.
[00:16:11] It'll take all the 50 before that and create a summary and make that one message in its history. So therefore, it just summarize the history before that. So that's kinda one technique. There's different techniques in which it tries to remember, but that's basically ChatGPT in a nutshell. So we're gonna create something similar.
[00:16:29] We're not gonna handle the edge case of running out of memory. [LAUGH] Yeah, we're not doing that, but we're gonna make it so you can chat to it. Cool, so that's basically the limits of it, is the memory, so you provide the whole history. And then yeah, I think the other part of it is just having it stream back the responses versus waiting for the response to come back, you can stream the back.
[00:16:55] We're not gonna do that because it's nothing too crazy about doing that, it's really a UI that makes that better. But that's really the only interesting thing we just built in a chatbot, it's mostly the memory aspect.