Building an AI Agent

Netflix

Lesson Description

The "Building an AI Agent" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:

Scott demonstrates how to add a user message to the database, show a loader, get all the messages, run the LLM with the messages and tools, and add the response to the chat messages. Scott also discusses how to handle tool calls and provides an example of creating a fake tool called "get weather" and passing it to the agent.

Join Now

Preview

Transcript from the "Building an AI Agent" Lesson

[00:00:00]
>> Scott Moss: Let's make our agent follow along with me and we're gonna transition to pretty much building an abstraction on top of the LLM function that we already made. We're just going to take a lot of stuff that we already did, make an agent function that calls the LLM function and add some UI to it and then a tool because that's what kind of makes it an agent.

[00:00:20]
And then we'll stop right there, and you'll see, just like, it's really not complete, we kind of got to take it a step further, but that's what we'll start with. So let's go into agent, should be a blank file here, and essentially what we're gonna do is we're gonna create this abstraction called runAgent.

[00:00:35]
You can see it takes in a userMessage, and from there, it kind of does the same thing we're already doing in the index file file, we're just gonna abstract it even more, so let's do that.
>> Scott Moss: RunAgents = async (), it's gonna take this user message here for now, and then, okay, I don't want you to do any of that stuff.

[00:01:03]
Get my AIMessage, get my addMessage, get my runLLM, cool. Sometimes AI is good when it does that, those are some good quality autocompletes there, I like that. But now what it's trying to do is just annoying, it thinks it knows that I want this and I do, but I don't want it to know that I do because then it'll keep doing it.

[00:01:22]
So, I'm just gonna ignore it, I don't wanna think it's right all the time. So, the first thing we wanna do is, we wanna add the user message to the database, so, let's do that. We can say await addMessages{[{role: 'user', content: userMessage}]}, so, we got that. The next thing we wanna do is, I'm gonna show this loader this, this is the thing in the terminal where it's a spinner.

[00:01:45]
It's like, I'm thinking, it's just letting you know that, like it's doing something. So it feels like more alive, so you can import that from UI, yeah, okay, there we go. Okay, can you stop doing that? Thank you, okay, so we got that, so we add the message, we make a loader by just saying loader = showLoader.

[00:02:13]
Then you can put like the first message you want, I'm just gonna put thinking. You can put whatever you want, whatever you want your AI to say, put an emoji there. Actually, that might be a cool one, let me see, it's like a thinking one, right? Yeah, there we go, that'll be cool, a thinking emoji, that'd be kinda cool.

[00:02:30]
So we could do that, then we need to get all the messages, so we can feed it into our LLM. So we'll do that, call this history awaits, getMessages like so call that history. So we're gonna say response, we're gonna run the LLM, and we're just gonna pass in all the messages, which, in this case, I call it history.

[00:03:01]
So we're gonna do that pass in all the history, and then the other thing we want to do is this thing called tools. So tools is gonna be actually, let's just dive into it, so essentially, what we're gonna do here is, if we go into run, LLM, right now, it takes in messages.

[00:03:20]
We're gonna go to another argument called tools.
>> Scott Moss: And we're gonna say tools. And for now, you can just make it actually, I think I have a type for this, let me see, no, I know I don't have a type for it. You could just say, if you want to do types, you can just make it any array for now, so it's gonna take tools.

[00:03:42]
In this chat completions, you're gonna just pass those tools in like this, but not before formatting them in a way that open AI wasn't the beat, because we're gonna define our tools as like a Zod schema. So if you don't know what Zod is, it's kind of like, I don't know, it's a it's a schema.

[00:04:03]
You give it an object, you give it a schema, it'll tell you if the object is validated by the schema or not. It happens at runtime, same thing you'll do with a database just in your code. What we wanna do is just pass the tools in here, and then now we need to, like, make some tools, and we're just gonna make some fake ones for now, ones that don't really do anything.

[00:04:21]
We'll get into the real cool stuff in a minute, but for now, we'll just pass in the tools like this. And then down here we will pass in those tools, but I also just wanna keep yeah, we're just gonna keep l sorry, I don't wanna go here, I wanna go to agent, there we go.

[00:04:43]
I want this to take in some tools and we're gonna keep passing them down, so you take in some tools, you take in some tools.
>> Scott Moss: There we go, and now I can just pass those tools here, so basically I'm just making sure our agent takes in a user message.

[00:05:01]
Takes in some tools which is an array right now and it just passes them down to the LLM.
>> Scott Moss: So we'll get to the tools in a minute.
>> Scott Moss: And then while we're still in the LLM we now have to add the response that we got back from the LLM to our chat messages, and then we can just return them.

[00:05:27]
So, I'm gonna say await addMessages, add our response like that.
>> Scott Moss: I now want to stop this loader from showing, so I can say loader.stop() like that.
>> Scott Moss: That will stop it. Also have this thing in here called logMessage, so it'll actually log it out so we can see it.

[00:05:55]
>> Scott Moss: So we can say logMessage with the response here, don't worry about that. That's just freak it out because it's optionally, no, because I haven't finished this function yet, that's totally fine. And then we're just gonna return, I guess it doesn't really matter what returns. We're not we're never gonna use this, but I'm just gonna return all the messages here.

[00:06:22]
>> Scott Moss: So everything in the run agent, now we just need to go create a fake tool, pass it in, and then we need to make sure that this thing, the runLLM actually uses the the Zod thing. So I'm actually going to go to OpenAI, tool calling go here, function calling, tool calling, very similar stuff.

[00:06:47]
You'll hear me say that a lot, Zod function, look at that, so we're going to import{ZodFunction} from 'openai/helper/Zod'.
>> Scott Moss: And then what we wanna do is we just want to say, formattedTools is just map over all the tools and convert them to a Zod function like that.
>> Scott Moss: And now he's going to say, formattedTools, and now we have tools.

[00:07:27]
>> Scott Moss: The other thing I wanna do is I want to put this other one here that says tool, choice, auto. What this is telling the LLM is hey, it's up to you, I might give you 20 tools you figure it out, I don't know. You could force it and give it like a specific tool or maybe it's just one, not an array.

[00:07:49]
Yeah, you could just give it like a tool or maybe it's an object, yeah, there it's like an object like that. You can like, you gotta call this one, but we want it to figure it out. So that's where it gets powerful. And then, just to be sure, I'm gonna turn off parallel tool calls for like, sometimes when I don't turn this off, it does it.

[00:08:09]
I don't want this to run multiple tools in parallel, as that is a little harder to handle. So we're gonna turn that off for now, and it could still run many tools, just not parallel to run them after each other. A lot easier to deal with. I don't wanna do parallel tool calls that we'll be here for a minute.

[00:08:25]
>> Student 1: Debugging those are really fun.
>> Scott Moss: Yeah, exactly. Yeah, I don't wanna do it. Order matters, I don't want to deal with this order, so we will do that. And then the next thing we want to do is Is right now we're returning message.content. If there's a tool call, there won't be a content here.

[00:08:50]
This content will be null. So let's just return the message for now. Because if there's a tool called, it'll be .tool calls like this. It'll be like an array of tools. So let's just return the message like this. So we'll do that.
>> Scott Moss: So now, if we go back into the agent, you can see this thing's freaking out because it's like, there is no content here.

[00:09:15]
How do you wanna do this? So what we can do instead of adding this message like this, we could just put, like, the whole response in here so we can see it. But I kind of just want to log this so you guys can see when it calls a tool, so you can see what's going on.

[00:09:29]
Because even if we say this is gonna break because we're not responding to it yet, because we haven't done the loop, so it doesn't really matter. So what I want is, I want to go in here. And I wanna say, if (response.tool_calls), I wanna do something here. So for now, we will do some logging here.

[00:09:47]
That way you can kinda see it. And if you really wanted to say this, just because you really just wanted to, you could just throw the response in there like that and that's totally fine. Let me get back to my notes.
>> Scott Moss: And then, yeah, we need to go ahead and replace the LLM call that's in the index file with the agent call, and then we can make a fake tool and pass it in.

[00:10:14]
So let's do that. I'm gonna go into index. And I'm gonna get rid of all of this, essentially. I'm gonna get rid of this and this, and now I'm just gonna import the agent like that.
>> Scott Moss: And I'm gonna say the response is run an agent with this message, and these tools that have nothing in them.

[00:10:47]
Okay, so we have that. Now, let's give it a tool, let's make it do something. So, we'll just hard code one right now. So the way that that's gonna work is, let's just make one called, getWeather, and it's just a function, and it returns, it's hot. It's 89 degrees outside, something like that.

[00:11:15]
That's all it's gonna do. That's the function, actually, I'll make an object here.
>> Scott Moss: weatherTool, or actually, what we can do is we don't even really need to make the function, we just need to make the description. So let's just say weatherTool. It's just gonna be an object that has a name, and I'm just gonna say, let's just say get_weather, and then it's gonna have parameters,
>> Scott Moss: And I'm just gonna make it an empty Zod object, like that.

[00:11:50]
So we have a weatherTool, and I can just pass in the weatherTool.
>> Scott Moss: Import z from zod.
>> Scott Moss: So now we have a weatherTool, we're passing it into our agent. Our agent has these tools. It's gonna get the messages. It's gonna run an LLM with those tools. If the tool is called, it's gonna do this.

[00:12:21]
It's gonna log it, and then it's gonna save it, and then it's just gonna stop. We're not gonna feed it back to the LLM, so it's not gonna loop here. It's just gonna be like, hey, I ran this thing, did you go run the function or not? So we will try that out.

[00:12:40]
And then in the run LLM, let's make sure everything's good here. This looks good. Formatted tools, great. Let's give it a shot. So now, if I go here and I say, what is the weather, let's see what we broke. Got our thinking face and cool. I literally logged everything.

[00:13:02]
So as you can see right here, here's the tool call it called get_weather. This was it, trying to do it. Let me get rid of all the noise on here so you can not see everything. Where am I logging now? All that noise. Is it this? Response, maybe?

[00:13:19]
I don't know where I'm logging all that. I think it's response, yes?
>> Student 1: If you have one of these agents that takes a really long time in Next.js, for instance, how would you handle that?
>> Scott Moss: I've ran into that so many times. [LAUGH] Theres couple answers one, depending on the platform in which you deploy your app, they have different execution times in which, before your function times out.

[00:13:50]
So with Next.js, or with vert cell, if you're like, on their Pro Plan, like, you can actually turn the function up. I think I forgot the exact maximum, don't quote me, but I think it's like near or above 300 seconds. So that might help for a lot. Two, if you don't want to deal with that race condition, then what you could do is you could run the LLMs kind of which is what we do now, which is a background job.

[00:14:15]
So when a user sends a message, we don't run the LLM on that same request and then send a response back. When they send a message, we kick off a job in a whole new process and we respond back immediately. And then that job, it has a queuing system.

[00:14:34]
It is basically like using, using a queuing system that you would have to make to make sure that these things aren't timing out and they're not slowing down, they can be retried. So you're not having to run the LLM again, which is costing you money. So have like a queuing system.

[00:14:49]
You can turn on streaming. So we haven't talked about message streaming, but right now we're not. We're waiting for the LLM to get back everything and then send it back. If we just dream it as it comes, it'll be quicker or at least perceived quicker from the user's perspective.

[00:15:03]
So that's another way to do it. Yeah, there really isn't one way. So you probably got to experiment with a combination of those things. But background jobs cuz streaming increasing your limit, it will work. Like right now we are using Vercel serverless Next.js functions for our stuff and we don't have any issues, so it's just fine.

[00:15:25]
There's tons of products, I don't wanna get into product recommendations, but I'm just gonna just.
>> Student 2: I mean, the key fundamental thing is like taking the AI response out of that immediate web request and being able to send it back some other out of band of that initial request

[00:15:42]
>> Scott Moss: Exactly, if QStash workflows is a really good one, if you go into one that they have specifically for this, which is, let me see which one is it. It's basically, let me find it. It's one where they show you exactly how. No, okay, here we go, Workflow, I'm looking at the wrong one.

[00:16:07]
There we go. They show exactly how to do what you're describing. So if you want to do something, that's going to take a very, very, very long time, let's say a AI generation, they have this thing called context.call, and you can see right here, HTTP request with much longer timeout, two hours.

[00:16:28]
So this one will wait two hours plus or whatever. So if you have an AI call that's taking two hours plus, probably should rethink your architecture, I think. But yeah, background jobs, things like that. There's a lot of use cases, but you should be fine on Vercel, just increase your function timeout limit.

[00:16:47]
>> Student 1: I mean, you need to really get into web sockets and set real time capabilities. Then if you have some agent that's initiating something, you can just send that out to the-
>> Scott Moss: Exactly, yeah, that's actually the system that we have. It's many different steps, and we use WebSockets to communicate all the steps along the process.

[00:17:07]
So, there never is just one long process happening, because if that one process fails, not only is it a bad user experience, but it'll cost you money cuz you gotta do all those things again. You gotta run it through the LLM again, when maybe every step passed except for the last part.

[00:17:22]
And it's like, man, I gotta run all this again. Where it's like, if you could just save the intermediate steps and everything, it's probably better. Cool, so as you can see here, I said, what is the weather? The AI thought about it, and here's the important part, is this part right here.

[00:17:35]
So the response that it got back from the AI was like, yeah, can you run this function called get_weather? It's telling my system that I need you to run a function called get_weather. That's what it's doing, right? That's this log that I have. Where is it? This one, it's this log, this tool_calls is an array, so it's logging this array.

[00:18:06]
So it's like, yeah, somebody's trying to get the weather, go do that for me. We're not, I'm just logging it and ignoring it, and then I'm adding it, and then I'm leaving. But that's what we would have to do, right? We would have to go actually run this, feed the answer back to the LLM, and then LLM could try to answer the question for the user, we're just not doing that.

[00:18:32]
And as you can see, when I said, the content is usually null if there's a tool call, there's also this other property here called refusal. Refusal just means,
>> Scott Moss: You know what? I'm not even gonna try to describe what that means. [LAUGH] We're just gonna look that up right here.

[00:18:54]
>> Scott Moss: This has structured, that's right, structured output, we're not using structure. So, this is basically if you want the AI to do structured output and you say, here's that schema that I want you to output, and for some reason, it refused to do it, it'll say, I refuse to do it.

[00:19:06]
[LAUGH] And here's why. We're not doing structured outputs ,so yeah, structured outputs are cool, though. I do use them in production for generative UI, which is really cool. And you can kind of follow this conversation, right? So here I said, what is the weather? The assistant immediately was like, call the weather function.

[00:19:26]
It did a really good job of deciding that it needed to call the weather function. And what's what's even crazier to me is that, now that I'm thinking about it, I don't think I even gave it a description of that function, let me go see. I didn't even give it a description.

[00:19:44]
The only reason it knows that this function has anything to do with weather is because it's in the name, that's it. If I put a description here and it's like, use this function to get shoes, I don't think it would call this function, [LAUGH] let's see.
>> Scott Moss: Not the weather.

[00:20:08]
[LAUGH] The name is misleading. If I just described it like that, and let me just get rid of all my history. I actually don't know what type of logic. I guess it depends on the AI that you have. So, no, it still did it. It's like, yeah, screw you, that's called get weather [LAUGH].

[00:20:30]
I don't care what you say. It's in the name, right? It could just be lack of other better options. Right, it could be lack of other options as well. Right, it's like this is where logic and reasoning comes in. And this is why it's hard to make with with AI because it's non-determined, you don't know.

[00:20:44]
I might run this again and it might do it. It might be like, actually, I won't do that again. So it really just depends. And this is why you have to have evals and stuff, because you don't really know what the hell is gonna happen here. But yeah, that was really interesting.

[00:20:58]
I didn't know that it was smart enough to just look at the name of the tool. I've never even tried that before cuz it's so dangerous just to do that. But yeah, that's pretty good, I learned something today.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now