Coding the AI Agent Loop

Netflix

Lesson Description

The "Coding the AI Agent Loop" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:

Scott demonstrates how to use a while loop to continuously send messages to the model and receive responses until a specific condition is met. Scott also discusses the importance of saving and retrieving the conversation history within the loop.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Coding the AI Agent Loop" Lesson

[00:00:00]
>> Scott Moss: Let's make our loop, so it's not gonna be too crazy here. We don't really have to do too much here. We just need to add a while loop that is gonna break our computer because it doesn't stop. [LAUGH] And then we the only time we do stop is here, right?

[00:00:18]
We stop here when we see that there's response dot content, as in the AI's done, it only responds with content when it's done. This is the terminal message. I am done. I'm ready to give an answer. There is nothing else coming after this. So we can just stop the loop right here.

[00:00:35]
Otherwise, you're in a tool call and you're continuing to go. There is no other, there's no third branch here, it's either a content or it's tool calls there's nothing else. So we can safely assume that, it will never do anything else. So let's do that, so let's go back into our agent, and here we go.

[00:00:54]
We have this, so what we need to do is we're gonna wrap, pretty much, and another thing we got to do is we got to get, we gotta move where we get the history and I'll talk about that in a minute. So,
>> Scott Moss: Right now, we get the history outside of here.

[00:01:10]
We're gonna have to move all of this inside of a loop cuz we're gonna be saving to history and stuff down here, and also right here. We wanna make sure when it comes back around that it has the new messages that were just generated, otherwise it won't have those messages.

[00:01:28]
So we have to save it and then get it. We have to get the new history at the top of the loop and save the new messages at the bottom of the loop. So when it comes back, it has the new messages that were saved right before it.

[00:01:39]
Otherwise you have to keep track of that in memory, and then save them all at once, yeah.
>> Speaker 2: Did you cut a step five?
>> Scott Moss: I will cut a step five right quick.
>> Speaker 2: I think I've actually seen it in cases with using OpenAI stuff, where it'll send you a content message back before it runs some tools, and then we'll like, think about some stuff and then send you another content back.

[00:02:04]
>> Scott Moss: Yeah, and I think we just saw an example of that. So like for instance, when I asked for the weather, it was like, what city? That's a content message. So in that case, it's like, yeah, I don't wanna run a tool right now, but I might be thinking about it.

[00:02:18]
>> Speaker 2: Well, in this case for me, it's like asking you for recommendations about something. The thing I've been doing, and it's like I'll ask for the recommendations, it'll send a message back that's like cool let me look, let me research that.
>> Scott Moss: I see.
>> Speaker 2: And then it'll call a tool or do a rag thing and then it'll come back with the actual answer, but it'll say hey, let me look that up for you.

[00:02:43]
>> Scott Moss: Yeah, what's that property I talked about that I don't use, I wonder what the
>> Speaker 2: The final result, I wonder what it would say.
>> Scott Moss: Yeah, I wonder what it would say in that example, cuz it's like, I have a content thing, but I'm not done yet and I expect you to loop again, so I could continue.

[00:02:59]
I wonder what that would be, that's interesting. I haven't ran into that, but that sounds terrifying. I need to [LAUGH] make sure that doesn't happen on my system, because it does, it's gonna break entirely. So that would be terrible. Okay, so while true, I know everything in your body is telling you not to put that in here, but trust me, it's gonna be fine.

[00:03:26]
While true, we're gonna just move everything over, one more, let's do that. There we go, so while true, I wanna get the history, get the response, save the response in the database, execute the tool, save the tool. This will just keep looping. But what we wanna do is, if response dot content, we want to stop it, and then we just want to return here because we're done.

[00:04:02]
There's nothing else to do. Let me go back to my notes to make sure I'm being consistent. Yeah, I guess I don't need those logs at the bottom anymore. There we go. So, again, we start, we get the history, we run it through the LLM. The LLM comes back, we save it in database immediately.

[00:04:27]
If the LLM says, I got some content for you, we just stop, we're like, cool, done. So if the LLM comes back with content, we stop, we're done. The LLM comes back with a tool call we run that tool, we save in the database, and then it loops again, and then it comes back up here, it gets to history.

[00:04:46]
The last thing that's in the database now is the response to that tool. So we run it back through, it's like, cool, got the answer now, here's my response, you know what? I'm still not done. Run this other tool, cool. Run this other tool, save the answer in the database, loops back up, gets all the messages again, which now is another answer to the last tool we ran.

[00:05:11]
Go back to the AI, it responds back cool, I'm ready to answer. Got it? We got your answer, stop.
>> Scott Moss: That's the loop. Let's see what we broke. So I'm going to just say, actually let me get rid of all this stuff.
>> Scott Moss: All right, and then
>> Scott Moss: Let's just start with hi, and make sure we didn't break anything.

[00:05:52]
>> Scott Moss: Cool, that works, I'm just not logging anything. So let me make sure I log, the thing that I need to log, and that would be log message response, I'll do that here. Let's just log this in here. Log message, response, let's do that. Cool, let's try that again.

[00:06:21]
>> Scott Moss: What is the weather in Mexico City? There you go, it got the weather, cool, so totally did the thing. Now we have a functional agent that doesn't do anything but tell you it's 90 degrees all the time. So we actually need to have it do some cool stuff.

[00:06:42]
So if you're thinking about like, but how would I put this on an API route? The same way you have it now, no difference, you would put it on the API route. Obviously you have different constraints about like, if you're on serverless, I talked about this earlier, execution time, how long the function is on, yada yada.

[00:07:01]
If you're on a traditional server, you don't have this problem, you can do background jobs, there's so many things you can do for us. We actually, I don't even know where to begin to describe how we do this, but basically every turn is its own job that can be replayed if failed.

[00:07:18]
So at any point, if one of these turns fail, we'll just replay it from that position, and then we lock this process for a user, that way they can't submit another one. So like, if we're already processing a job for them, as in, they were on their iPhone and they submitted a message.

[00:07:34]
And then they had another iPhone, they tried to submit another message before this one got back ,we stopped that from happening. And then, we broadcast the messages using WebSockets, every time a new message comes back, we send that to their clients. So every time a message, you just see it pop up on your phone versus waiting until they're all done.

[00:07:51]
So that's kind of how we do it. It works pretty good for now, but there's other ways, there's true streaming where you stream token by token, which is, you know, that's what you would see on any chat app right now. Claw chat GBT, they stream the tokens as they come in.

[00:08:06]
I decided not to have that approach because I feel like that's not human. Humans don't type that fast. They don't they don't do that. If I'm chatting with a human, I see a typing indicator, and then a message pops up, and that feels more human to me than like watching tokens streaming.

[00:08:19]
So we decided not to do that approach. We also use structured outputs. So streaming JSON is useless. You can't do anything with it until it's done anyway. So yeah, it doesn't really help us out. Back in the day, the only way people did this was with laying chain.

[00:08:35]
And then now you can see why I said you don't need laying chain, it's just a loop. Yeah, of course.
>> Speaker 3: How did you debug this loop, do you guys like LaneSmith have relevance in practice?.
>> Scott Moss: LaneSmith is not, in my for my opinion, a tool like LaneSmith is not for like debugging this, is for improving the output of this, which is way different than finding bugs in here.

[00:09:03]
So if you're trying to figure out how to make the output of this system better as in tool selection, quality of answers, things like that. Yeah, something like LangSmith or any eVAL tool is gonna help you do it. If you wanna be able to trace how long it takes for you to get a response back, and how many tokens you're using tools like LangSmith and other tools are going to help you do it.

[00:09:24]
They're just trying to figure out why your coding working, there's no help. [LAUGH] You got to help yourself, however, whatever debugging tools you use and node applies here as well. I guess, I mean, at the end of the day, this is just an API call. What I like to do is, because if you think about it, everything is driven by this state.

[00:09:45]
It's driven by this state, right? The AI is just driven by this state. So you can put the AI in whatever state you want by coming up with messages. So I have different scenarios that I have like literally a folder of all these different message scenarios that I can put through the AI to see what it would happen.

[00:10:03]
I'm like what if a user did this and then did this or what if the system was in this state, how would it respond? And I would run through those states, and then I would write the code to handle those states. And that's typically what I do. And then the next level above that is actually writing evals, which is very similar, but it's testing the quality of the output and the accuracy and less about like whether or not your code's gonna break.

[00:10:23]
So those are the two ways that I handle it, and it pretty much covers everything for me.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial