Check out a free preview of the full AI Agent: From Prototype to Production course
The "Summarizing Messages" Lesson is part of the full, AI Agent: From Prototype to Production course featured in this preview video. Here's what you'd learn in this lesson:
Scott implements summaries and a way to shorten the context size. Whenever the system has more than 10 messages, it will remove the oldest five messages and replace them with a summary of those messages.
Transcript from the "Summarizing Messages" Lesson
[00:00:00]
>> Scott Moss: So follow along as we implement the memory management that we're gonna do.
>> Speaker 1: So our strategy is window sizing, limited to window sizing and doing a very naive summarization.
>> Scott Moss: And then you can improve that if you want. I'll give you some tips. So the first thing is we need to go to our LLM.
[00:00:20]
And we need to get this summary. Well, actually, we have to make that function first. So let's start here in the memory. In the memory file, we're gonna make a function that gets the summary. And then we're gonna do this code right here that basically gets the last five messages.
[00:00:39]
Just tap it off F5, you can do whatever number you want, that's the window that I'm picking is just the last five messages. And then it tries not to leave off on a tool response cuz that would put the database, it would put the LM in a broken state.
[00:00:51]
So like, can't leave off on that message, get one more after that, just to make sure it's not broken. And then the other thing is creating a summary every 10 messages, I'll create a summary. These are just random numbers that I picked. You can pick whatever numbers you want.
[00:01:07]
So let's do that. Let's head over to our memory, and the first thing is, I'm gonna update these types. So now there's a summary here.
>> Scott Moss: Let's go to view string, and then in our default data, I just want the summary to be an empty string, so we got that.
[00:01:27]
>> Scott Moss: For the add messages.
>> Scott Moss: All we wanna do is check to see if the messages length is greater or equal to 10, and then we want to get the oldest messages. So the last, the oldest five messages and run this summarize messages function that we're gonna create and then just update the summary to be that.
[00:01:53]
So let's do that.
>> Scott Moss: And we'll make that function and come back to it, so it's not there right now, but we'll make it. So db.data.messages.length greater than or equal to 10.
>> Scott Moss: Then we wanna grab the oldest messages.
>> Scott Moss: Equals db.data.messages.slice, what did I put here, 05 map, okay so, or yeah, 05 and I need to map over these to remove the metadata from them because I don't want the IDs and created at in there.
[00:02:40]
Actually, maybe you do, maybe you do, cuz put them in the summary. No, but we'll leave it. Need an eval for that.
>> Scott Moss: And now we'll come back and add the summary, but for now, we'll just say const summary equals an empty string, okay? And then we will say, db.data.summary equals the summary.
[00:03:08]
That way we can write it and when we make the function, we'll come back and do this.
>> Scott Moss: Cool, and now to get the messages. Pretty simple. We get the last five. We do this logic right here and then we return the last five. So let's say, get rid of this return right here.
[00:03:33]
This instead. This is gonna be our messages.
>> Scott Moss: Our last five are gonna be messages., what did I do, /-5. So we got that. And then the logic to do this, basically what we're doing is we're like, okay, it's the first thing in this array which is the oldest thing.
[00:04:04]
Is its role tool? If that's the case, what's gonna happen is if we pass that into the AI, it's gonna be like, you cannot have a tool response without a tool call with the same ID. So we don't want that. It's gonna error. So we then say, okay, we'll go get the one before that, which would be the tool call, and then this should work.
[00:04:29]
That's essentially what this is doing. So I'm just gonna paste that in. And let me update my code to say last five. And I guess I should spell messages correctly.
>> Scott Moss: So, pretty simple, and then we just put them back in the same order. The way it works is, the latest message in a chat is always the one at the end.
[00:04:52]
The oldest message is always the one at the beginning. I guess I don't have to do this, I guess I could just do that [LAUGH] I think that works the same, I don't know what that was, I think it just looked cool.
>> Speaker 1: You need to return at the end for if it's not the 6th.
[00:05:10]
>> Scott Moss: Yeah, I need to return the last 5.
>> Scott Moss: Cool, and then we need to make a function that says get summary.
>> Scott Moss: Get summary. It's gonna be. Const db equals, how do we get the db? Await get db.
>> Scott Moss: And then we're just gonna return db.data.summary.
>> Scott Moss: Pretty simple.
[00:06:05]
So now, if we go back to the LLM, we need to do two things. Need to get the summary, put it in here, and then we need to create our summarized messages. So let's just start with our summarized messages, and then we can bring that back into here and keep moving.
[00:06:22]
So let's go to LLM.
>> Scott Moss: Make a new function at the bottom called summarize messages. And this thing just is gonna take in a list of messages. That's it. And then it's gonna have a system prompt and that's about it with those messages and that's it. So let's say messages, AI message is that.
[00:06:58]
>> Scott Moss: Response is await.
>> Scott Moss: Run LLM.
>> Scott Moss: Gonna pass in those messages, gonna pass in a system prompt. And we're gonna say, you're Job is to summarize the given messages to be used in another LLM's system prompt.
>> Scott Moss: Summarize it, play by play. So it's like, AI said this, user said this.
[00:07:53]
AI said this, user said this. Otherwise, it'll spit out a paragraph, which might also work. You got to play around with it. You have to eval this.
>> Scott Moss: So we got our messages. I'm a crank up the temperature a little bit on this, a little bit of randomness I think.
[00:08:11]
Why not? Don't do that. Definitely don't do that. These numbers, although are just decimals. They're very strong. I would never go over 0.4. I guess if I were to make something so wrong, if you do this, you won't get back English. You'll get back a character you've never seen.
[00:08:36]
I didn't even know there was a rendering for that. Yeah, don't do that. Okay, I think that's it. So then we just return the content.
>> Scott Moss: Just in case, fall back here. So we got that. And then while we're in here, we might as well go ahead and pull in the git summary.
[00:08:58]
So I'll do that. I'll say summary equals await, get summary from memory. Then what I can do is I can just add that to, I'm in the wrong function. I'll that here. I want to do that up here and to run LLM. So up here I want to say,
>> Scott Moss: Get summary.
[00:09:29]
>> Scott Moss: And the system prompts is gonna be this, plus the summary. So I'll do this.
>> Scott Moss: And I can say, conversation so far.
>> Scott Moss: Like that.
>> Scott Moss: And the last thing we need to do is go back to memory, go to where we're adding the messages and summary right here should be await, summarize messages, and it's gonna take in the oldest messages like that.
[00:10:13]
Let's see what it does. What's my database looking like? We're looking good. We're looking good, okay. Let's see if we broke something.
>> Scott Moss: Cool, I don't know. You tell me.
>> Scott Moss: Do it.
>> Scott Moss: Here's a joke for you. What do you call a beehive without the bees at e-hive?
[00:10:47]
My goodness, that was so cringe [LAUGH]. Humor can be quite subjective if you like. I can share a different type of joke. Yeah, let's see. Do it.
>> Scott Moss: Let's see if this one says. Here's another joke for you. Why was the big cat disqualified from the race? Because it was a cheetah.
[00:11:17]
[LAUGH] I hope that one brings a smile. What a jerk. This thing's is getting too good. Make me some art for that joke. Yes, I do. I wonder if I put a check here, check this out.
>> Scott Moss: [LAUGH] Putting a check emoji wasn't approval. That's actually pretty cool.
[00:11:56]
>> Speaker 1: It's not ambiguous.
>> Scott Moss: So let's see what art it made.
>> Scott Moss: Come on. There we go. Come on, come here. My goodness, why does it take so long? [LAUGH] Why is he so sad?
>> Audience: [CROSSTALK]
>> Scott Moss: He's a cheetah. My goodness.
>> Audience: [LAUGH].
>> Scott Moss: This is hilarious.
[00:12:31]
What is this animal? [LAUGH] What is going on? This is hilarious.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops