AI Agents Fundamentals, v2

Context Window Compaction

Scott Moss
Netflix
AI Agents Fundamentals, v2

Lesson Description

The "Context Window Compaction" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott demonstrates setting up a compaction system using an LLM, including summarization prompts to create concise conversation summaries. He shows filtering messages, converting them to text, and generating summaries to maintain seamless agent interactions.

Preview

Transcript from the "Context Window Compaction" Lesson

[00:00:00]
>> Scott Moss:The next thing we want to do is we need, because compaction involves an LLM, we need a compaction system prompt, which are the set of instructions in which we tell the LLM that we're going to make to do to compact. Compaction is just another LLM with its own context window, so we need to do that. So what we want to do is go to the agents, go to context. There's a compaction .ts in here. It's got some helper stuff in here like messages, text, and compact conversation, all the stuff that we'll be using.

[00:00:31]
But the most important thing up here is the summarization prompt. You can put whatever you want. I have some stuff here in the notes that I'll just copy because I'm not about to sit here and write some text, but I'll read it out to you and tell you what I was thinking when I made it. So I said you are a conversation summarizer because that's exactly what you're doing. Your task is to create a concise summary of the conversation so far that preserves key decisions and conclusions reached, important context and facts mentioned, any pending task or questions, the overall goal of the conversation.

[00:01:06]
Be concise but complete. The summary should allow the conversation to continue naturally. And then conversation to summarize, and typically, and you'll see a lot of people do this in the system prompt is like they'll leave off on something like this conversation to summarize, and then the next message will be a user message with the thing that I want you to do. Another thing that you can do to help with, and I guess I didn't, it's not a direct strategy for context window management, but if you were really, if you had like a tool or some systematic thing that you did in your loop that would automatically, and I think ChatGPT does do this, whenever the user sends a message, that message before it gets processed by the chat LLM, it also gets processed by another LLM that tries to determine, are there some facts or some user preferences in here that I can remember about the user?

[00:01:55]
Like if you had a personal assistant and you're like, hey, my favorite color is blue. You know, like, or like let's say you were asking your personal assistant, hey, can you go like, can you go pick up my suit, you know, from this shop? It's this blue suit and, you know, it's for this event that I have going on and, you know, I love the color blue, my favorite color is blue, but like, yeah, can you go pick it up?

[00:02:15]
ChatGPT will remember that your favorite color is blue without you telling it to remember it. So every other chat after that just knows it. So even though you might do a compaction strategy where everything is summarized and details like your favorite color might be lost in that summarization, because ChatGPT has like pulled out personal facts and details about you and stored it somewhere else and then every conversation it puts those back in as like here's a list of user facts and preferences, it kind of restores those details.

[00:02:42]
So that's also another strategy of just like having automatic, you know, important facts about a user derived and stored somewhere else and injected on every instance, so it just feels like this thing just learns you over time without you having to say it, if that makes sense. The next thing we need to do is actually make the LLM that does that. So we go down to the compact conversation we need to fill this out, it's empty right now.

[00:03:10]
So what we're going to do here is we are going to say, OK, let's get the conversation messages which are going to be, we need to filter them out because there might be some UI stuff in here, just like we did on the run, we kind of have to like filter these out, so we'll say messages that are passed in filter. You want to get the message, and then we want to get, we basically, we want to filter out, we don't want to summarize the system prompt, essentially, right?

[00:03:36]
So that's like one of the things we don't want to do. So we'll say m.role does not equal system. So get all the messages that's not the system prompt. It's probably the first one in it, so I guess we could have just removed the first ignorant array, but it might not be, who knows, just remove it. And then why are we compacting this if the length is 0? There's nothing to compact, so we can just return nothing, so let's just do that.

[00:04:12]
And then we want to basically instead of trying to feed this whole array to the LLM in the format that a messages array would want, we could just convert it all to one big text blob and like here is all the information so far versus trying to feed you all these JSON objects. So this helper function messages to text does that so I can just say like here's the conversation text. Just imagine you took a whole conversation array, a messages array, and threw it in the txt file without any of the JSON stuff.

[00:04:41]
That's essentially what this is. Messages to text. And you can kind of see what it's doing up here. It's just getting the role and the content and putting them with two new lines after each other in one big string versus all that JSON stuff, right? So we get the conversation messages, get that text there. And then we're going to get the text. Let's call that a summary from await. And this will be generate text with the conversation, uh, messages.

[00:05:16]
I'm sorry. Messages would be the conversation messages or, let's pick a model first actually. So for this one, you know, you might have to eval this too, right? Like you have to figure out what model works for you. This is why I allow you to pass a model in so you can quickly do that. And then in this case because we've already summarized the messages into one text blob, we don't need the messages array.

[00:05:40]
We can just pass in a prompt, so it's either or. It's either you pass in one prompt because this is not going to be an agent loop that's looping and conversating. We just need you to do one thing and be done. So in that case, we could do prompt. What we did at the beginning of the course, we were using prompt. So I'm just going to say summarization prompts, which is essentially the system prompt, plus here's the conversation text.

[00:06:06]
It's not to stop. If you want to use messages, you can use messages. You'll just make instead of prompts, you would just do a messages array, right? And then you would come down here and then you would say role, system, and then you would say content, summarization prompts, right? And then you'd make another one that's like roll user, and then content would be conversation text. This is the same thing.

[00:06:31]
Just why would I need to do that? I'm not having a conversation. I just need you to do one thing and give me the results. I'm just going to use prompts instead. All right, and then once we do that, we then now need to return the new messages array that the agent will use, the compacted messages array, so we're rebuilding the conversation, but now with the compaction, so we'll say compacted messages.

[00:07:17]
And we want the role here to be a user. And then the content, we'll just say something like, just a hint to the LLM that like, hey, this was a summary. So we'll, a conversation summary, so you kind of know what the vibe is here. Put a new line. And then from here we can just be like, hey, the following summary, the following text or let's say the following content is a summary of the conversation.

[00:07:56]
From, and uh, oh, as I say so far, conversation so far. Add some new lines. Do the conversation text here, or I'm sorry, not the conversation text, we want the summary that got generated. And then, you know, there's really no wrong answer. You have to eval this and kind of figure out what makes sense. And then I can just be like, yeah, please continue where we left off, right? And then just to mind control the agent, I can be like, yeah, let's make a role assistant here with some content.

[00:08:54]
And I could be like, I understand, you know, I've, oops, I've reviewed the summary of our convo and I'm ready to continue. How can I help, right? So I'm telling the LLM that's what you said. You said that when you saw this compaction. So now when the LLM sees it's like, oh yeah, I did say that. Yeah, I do understand. I am ready to help. I do get it. So there we go. And we want to leave off on an assistant role.

[00:09:16]
We don't want to leave off on a user role because then if we fed this back to the agent, then it would be, it would generate another message. So we leave off on here because it's probably the user's turn when this compaction happened, right? So, and then, yeah, return compacted messages.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now