Coding an Agent Loop

Netflix

Lesson Description

The "Coding an Agent Loop" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott demonstrates filtering messages to keep the LLM focused, setting up chat history, and streaming text generation. He shows how to handle tool calls and append responses to maintain the conversation flow.

Join Now

Preview

Transcript from the "Coding an Agent Loop" Lesson

[00:00:00]
>> Scott Moss:OK, so the first thing we want to do is very similar to what we had before with the generate text. We want to do stream text, but now that we're using the messages array versus just passing in the user prompt, we need to do a little bit of setup. So here at the top, make sure you import this helper function that I have called filter compatible messages, and I'll talk about this in a second, but this just ensures that the message objects that we put in the message array are compatible with the LLM because what we show the user in the UI and what the LLM sees might be two different things.

[00:00:38]
Like for instance, in the UI we might want to show rich objects that show statuses and token counts and approvals and all different types of things. The LLM not only does it not want to see that, it would actually error if you showed it that. So you want to get rid of those things that are specific to your UI and instead only show the LLM the messages that it actually cares about. So this function just filters all that stuff out so that way we can still show the user the things that are really cool, but also not get errors from the LLM when we try to feed it back to it.

[00:01:13]
In production, you would just have two separate systems. You would have this chat state, you know, here are the actual conversation state. These are two different things, right? But we don't have a database, we're not doing any of that. So make sure you import that from system filter messages if you haven't already, and then just make sure you import stream text instead of generate text. It works exactly the same except the output, as you'll see, will be different.

[00:01:41]
So first we need to create our working history. This will be our messages array that we keep track of as the loop goes on that we'll keep essentially appending to as new messages get generated, right? So we'll say working history is, let's just say filter compatible messages and what we want to do is just conversation history. This conversation history comes from the UI and again, it's going to have all different types of non-compatible LLM messages in it, so we need to filter it first.

[00:02:18]
That's why we're doing this. And then from here, let's make our messages, all right? And it's pretty simple. It's going to be type model message. Put that on the wrong side. There we go. And it's just a simple array. The first thing we want to do is, in most message arrays for an LLM chat, the first thing that you put in there typically would be the role of system. System is the system prompt essentially.

[00:02:55]
This is like the preamble. These are like the hidden instructions that you tell the LLM like if you were going to give it a personality, you're going to give it some tips, some hints, some context that it needs for the whole conversation, you would put it in the system prompt. So it's usually the first message and the messages array is an object of role system whose content is going to be the system prompt that you want to show the LLM so we already have it.

[00:03:21]
I'm just going to put system prompt there. That's typically always the first message. You don't need a system prompt, but it's pretty great. As new models become better and reasoning becomes a thing, it becomes less and less of a thing, but for now, system prompt is definitely something that you want to use. The next thing would be the working history, so like the conversation so far that was filtered from the UI.

[00:03:51]
Oops, I don't need an object there. And then last after that would be the new message the user sent us. So this is what we were using for prompt on generate text. Instead we're just going to put it here. In this case, the role will be user, right? And the content would be whatever the user message was. Right, so we just created our chat history, our initial chat history, which is, here is the system prompt first, here's whatever the conversation was up until this point, if you called this and you passed in some previous conversation, and then the new message that the user sent out.

[00:04:32]
And now because we're streaming, as in tokens come in as they're generated, tokens are essentially characters that get generated. As they come in, we need to piece them all together because they come in when they're generated. So we're going to create a variable that allows us to just hold all of them and append to, so we'll say full response is just a string because the LLM is just going to generate these tokens are just strings, we're going to put them all together the same way we would have gotten them if we called generate text and they all came at one big string at the end.

[00:05:03]
We have to build that up ourselves if we want to get the final output. We don't have to. There's definitely some utilities that help us do it, but I want you all to do it manually so you understand what that feels like in the edge cases and things like that. So we're going to build that up ourselves. And then from there, oops, why don't you go over there? And then from there we're going to do our while loop.

[00:05:31]
So while true, I know you're probably like, I don't want to run a while true loop on my computer, but trust me, it'll be fine. It'll definitely stop. We'll have our stop cases. So the first thing we want to do is, let's get our results. And we're going to, instead of awaiting generate text, we're just going to call stream text and we don't need to await this because it is a stream. It's just going to, it's like a generator, right?

[00:05:57]
So it's going to yield new tokens as they come in. We're not waiting for the whole thing to be done. So we're going to call it stream text. It takes in the same arguments that generate text had as well, so it's literally the same thing. We got our model or our model name instead of prompts just like we did in our evals, we're just going to pass the messages array. We're going to pass in our tools.

[00:06:30]
And for telemetry, we'll do our experimental telemetry here, if you want that set up. There we go. It's literally the same arguments as generate text. Nothing's different. It's just a different name, stream text. Everything's exactly the same except we don't await it and this result will be a stream that we can iterate over. It's an iterable that we can iterate over and see all the results as they come in.

[00:07:00]
The next thing is we need to keep track of tool calls, so we're going to do that as well. So we'll say tool calls, it's going to be an array here. We could give it a type, I think it's like tool call info, which is an array like that. Also need to keep track of the current text, which is not to be confused with the full response. The full response text would be the entire response outside of the loop, the current text would be just the current text for just this one step inside the loop.

[00:07:41]
So we're going to keep track of that. And let's keep track of some errors here. This is going to help us with stop cases in case we want to stop for errors, so we'll do that. We'll keep track of any errors here. And now what we're going to do, we're going to make our try catch here. We want to do a try catch because if something breaks, we want to be able to catch it and handle it gracefully. The same reason you would want to do a try catch on any asynchronous operation that involves some user facing thing.

[00:08:19]
So like I said, the result from stream text, we can, it's an iterable because it's just streaming, so we can do a for of here with a for loop, so we can say const. And typically you'll see something called like chunk. You can call it token or however you want to call it, but usually it's called chunk, and we want to get something called result.fullStream. So as a new chunk comes in from this stream, we want to do this, right?

[00:08:52]
So, first thing we want to do is we want to check the chunk type so we can say if chunk.type equals text-delta. That just means a new token came in, so if the, if the LLM is generating text, a text delta would be one of the tokens in the text. So if it was generating a sentence, this might be like four letters. That's what this chunk might be. So we want to check if it's that first, and if it is, we want to append our current text to be that chunk.

[00:09:32]
So update our current text. And then for UI purposes, this has nothing to do with creating a loop, but just to make sure our UI understands this, we have this callbacks.onToken. This allows the UI to stream the token in the terminal. This is what puts it in the terminal by calling this. So this is just the hook into the UI that I made. So pretty simple, new token comes in, let's check what type it is.

[00:10:06]
Hey, it's a text delta, hey, the agent, the LLM is saying something. Let's capture all of those, right? That's what this is. Yes, question. So is this like when, how, I know, like in an AI thread, like my second and third messages, it's putting back in those, or my second or third prompts are also including that first prompt in the context. Is that what's going on here? Or is it? Yeah, so let me try to describe in a different way what's happening here.

[00:10:34]
So first we have this first message is the system prompt, right? So this will be like if you use ChatGPT there's like a settings you can go to and you can add like a background to the agent, give it some custom instructions about you that it can use across chats, that would be the system prompt. That's just like some general information that every chat will see. So every chat starts off with this.

[00:10:57]
That does not show up anywhere in your chat history. You won't visually see that in a chat message. It's like invisible. This would be the chat so far, right? That's the chat so far and then this represents the new message you just sent. Right? Because the whole point of us doing this run is because you sent a new message. So here it is. OK. When an LLM sees a messages array, where the last message in that array is role user, it's telling the LLM that it needs to respond because that's how this conversational style works.

[00:11:33]
It's like, oh, if there's a pending user message, it is now my turn to respond to that. It's a request response model, right? So what we're doing now is we fed this messages array to the LLM here. The LLM saw that the last message was a user, so it's like I have to respond. So what it's doing now is that it is responding, but because we're streaming it token by token, we have to capture each single one and do something with it.

[00:11:56]
So this would be when you're in ChatGPT and you start seeing it responding and then the tokens start writing on the screen, that's what this is. So it's right now it's writing on the screen and we have to collect all of them, and at the same time with this callback, this is me putting it on the screen token by token. So this is literally rendering it on the screen. But because we have tool calling, the chunk type isn't always going to be text delta, so we have to check, hey, but what if the chunk.type equals tool-call.

[00:12:35]
Because as you saw on the last lesson or the lesson before that, the LLM can return, in the case of generate text, an array of tool calls. It's not always going to return text, it can also return an object or an array of objects representing the tools it wants you to execute for it and give it back. So we also have to handle that as well. So in the case of, hey, there was a tool call, this is the whole point of the loop.

[00:13:04]
We need to capture that tool call, get the arguments. Let's just say for now, execute the tool, get the results, put it back into the messages array and let the loop continue, right? So that's something that we want to do here. So if it is a tool call, we need to get the input, which is basically the arguments of that, so we can say, you know, hey, is there an input. And on this chunk, the chunk is an object.

[00:13:34]
If so, let's get the chunk.input. Otherwise, we just default to empty object. And then from there we're just going to collect the tool calls into our tool calls array, so we'll just push here. So we'll just say cool. What's important is that we need the tool call ID. I know I mentioned earlier that the tool call ID is how we reference the tool call that we're responding to when we append back to the messages array.

[00:14:00]
The LLM generates this ID for us. We have to then reuse that ID to respond back to the LLM like, hey, you know that tool call you asked us to do? I did it. Here's the result. And the only way to match those two is with this tool call ID. So if you try to put a message in the messages array that's a tool call response and it doesn't have an ID or an incorrect ID or the ID isn't exactly following the one that was generated by the LLM, you'll get an error from the LLM.

[00:14:29]
It has to be very specific. So we want the tool call ID here. We want the tool name. This is going to help us in our system figure out which function to call by the tool name, so we want that as well. And then lastly, we want probably the most important thing, which are the args. So we want the args as well, and I'll just say any on that. So this will just put some objects in this array where we could just capture, here's the ID of the function that the LLM wants to call, here's the name of it so you can match it with your execute tools function, and here are the arguments that you need to pass into that function.

[00:15:10]
And then we're going to loop over those and call those. Outside of that, UI purposes, I have an onToolCallStart. This shows in the UI the name of the function that you're running and then shows a spinner, that's what this does. So you can see in the UI that the AI is trying to invoke a function, right? It's about to. So we can say chunk.toolName and then the input as well. Here in this catch, we're just going to say, we're going to set our stream error to that.

[00:15:59]
So we can collect that and it's just for us to try to gracefully handle some errors, so what we can say is, hey, if there is no current text and there is no streamError.message that includes, I noticed with the AI SDK there's this error message that it puts out where it says like no output generated, so if there's any error upstream from an LLM, the AI SDK captures it and returns an error that says this.

[00:16:29]
So if we don't see any of that, then we're just going to throw the stream error. We actually, that means something's wrong. We actually do want this to break. Like, go ahead and destroy this. So now what we want to do is we want to append the full response with the current text, right? Because the full response is the whole response outside of the turn. We want to append that with what's happening so far, so we're going to do that.

[00:17:05]
So outside of this try catch, we'll say full response plus equals whatever the current text is. And then here we're going to do another error handler where we say, now if there is a streamError and there is not current text, then we could have our AI apologize to the users like, hey, sorry, you know, our engineers are working on it or something's wrong, so you can say whatever you want here. Sorry about that, you know, I'm working on it.

[00:17:29]
This is a way for you to get the LLM to say some deterministic error message when there shouldn't have been an error, so you can capture that and do what you want. And because we did hit this edge case where there's an error and we are responding, what we can do is I have this callback here called, we've used this before, it's onToken, we want to send back the token in this case, the token would just be the full response, so that would just pop up and then because we're done here, we want to break the loop.

[00:18:04]
So that's the first edge case here is that we hit an error. We want to show the user something, just break the loop right here. Whereas in this case, this is like something's going on with our system, some system error. I guess you could technically show the user here something as well. But in this case, we're throwing the error because there's, we messed up somewhere or some upstream thing is messed up.

[00:18:34]
Cool. So now what we want to do is collect what's called the finish reason. So an LLM typically will keep, especially if you add tool calls, it'll keep generating until it has, it generates a message that has a finish reason, and a finish reason can essentially be either I'm done or the reason I stopped is because I need you to call a tool for me. So we want to check for that, right? So what we can say is, hey, first we have to await it because it's a stream, so we can say finishReason equals await the result.finishReason, so wait until the stream is done completely.

[00:19:19]
There are no more tokens being generated. Completely wait for that. And not finish like the country. And then from here we want to say if finishReason does not equal tool-calls and toolCalls.length equals zero as in we haven't collected any tool calls. That means the AI is ready to answer and there were, it never even needed to call a tool in the first place. So this would be another break case for us where we can just go ahead and show the user the response.

[00:19:59]
So the user asks something. It didn't trigger any tools. There were no tool calls generated by the LLM. It's just ready to respond. So let's go ahead and respond so we can say responseMessages equals, we also have to await the response because it's a stream, so let's get that full response here. We have to append our messages here. If we don't do this, the user won't be able to reply to it in the UI because the LLM won't see this message that it generated, right?

[00:20:24]
It's kind of weird. The LLM will generate a message, but unless we put it back into the array, it won't see it. Like it doesn't know that it generated it. Yes, it came from the LLM, but it doesn't remember it somewhere. We have to put it back into the array so it can see it. So if we don't do that and let's say we show it to the user but we don't put it back in the messages array, the user responds to it, the LLM is like, what are you talking about?

[00:20:46]
I didn't say that. So we have to put it in the messages array. So it's pretty manual, right? But it's also pretty easy to understand. This is why you can manipulate the conversation really easily just by putting whatever you want in the messages array. You can fake a conversation, you can say, no, you did say this, look, I put that there. And if the role was assistant, then it's like, oh yeah, I did say that.

[00:21:15]
Even though that wasn't generated from that LLM, it could have been something you hard coded or came from somewhere else, but the role as assistant, the LLM is like, oh yeah, I said that. Easy to manipulate. So let's push these response messages, the messages like that into our messages array, and then because there's nothing else to do here, we'll just break this for loop, so that's our second break case or end case.

[00:21:56]
First one was, error, we want to respond. The second one was, the LLM is ready to respond. There's nothing else to do here. Now if we get to this next step, then we can assume that there was no errors. And because this didn't hit, then we must be doing a tool call. We must be, the LLM must have responded with like, hey, I need you to run these functions for me right quick. So, that's what we're going to do.

[00:22:20]
So we're going to say responseMessages again, await response. We're going to push these in immediately, so messages.push, responseMessages.messages, just like we did above. We're going to put them in there and the reason we want these in here because these are tool calls, so we're just going to go ahead and put them in the messages array and then now what we're going to do is we're going to execute those tools, get the results of them, and then put those in the array.

[00:22:47]
So in the case of a tool call, you're going to append the messages array at least twice, once for the tool call that the LLM generated because again, it generated but it doesn't see it. You still got to put it in the array. And then once again for the result of that tool call after you execute it. Does that make sense? So it's at least twice, but if it's in the case of a parallel tool call, as in the LLM returned an array of tool calls, then you will be placing all of those tool calls in your messages array, and then you would be executing all those tool calls and placing the results of all of them in the array.

[00:23:22]
Which I guess you could still do in two pushes, but you get the point. It's at least two. Otherwise, it won't see it. If you don't put the tool call in the array, the LLM won't know that it asked you to do it, and if you don't put the results in it, then it'll just ask you again, can you run that tool? I already asked you, but you didn't do it. Well, actually in this case, if you, in the case of OpenAI, if you respond to a tool call without the result to a tool call, you'll get an error.

[00:23:46]
You'll get an API error. Like, hey, the very next thing that you have to do in this messages array after a tool call is respond to the tool call. And if you don't, I'm done. I'm not doing anything. Like you actually get an error. I don't know the behavior for that for other LLMs, but OpenAI definitely does that. Any questions so far? Yes. If it's expecting you to do something, is there a message you'd send if you decided to abort for some reason?

[00:00:00]
You would just, you just wouldn't add the tool call into the messages array. You would just ignore the fact that it told you to call the tool and just would not put in the messages array, and then it didn't know that it asked you to do it, right? Because if it's not in the array, even though it generated it, it doesn't know that's what was supposed to happen. Yeah, so, yeah, that's the truth is that messages array.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now