Adding Context Window Management

Netflix

Lesson Description

The "Adding Context Window Management" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott shows how to update the run loop for context window management, including importing elements and checking model token limits. He demonstrates compacting conversations when thresholds are exceeded and reporting token usage throughout the code.

Join Now

Preview

Transcript from the "Adding Context Window Management" Lesson

[00:00:00]
>> Scott Moss:All right, we just need to update our loop inside of run to implement this, right? So we'll go into our runner, and we need to import a lot of these things, so we will say import from context index, and then from here what we're going to need is the estimate tokens, we're going to need the get model limits, we're going to need the is over threshold, the calculate usage percentage, the compact conversation that we just wrote, and then some default thresholds, or our one default threshold.

[00:00:52]
Okay, so then what we want to do is before the loop starts inside of our run function, we want to get the model limits. So let's get the model limits of this model name. And did I have GPT-4 mini in here? I did, okay, cool, so we got the model limits. Once we have those model limits, before we start this while loop, we want, this is the first place we want to check if we're over the threshold.

[00:01:35]
So if we are over the threshold before we start this loop, we want to go ahead and compact, so let's do that. So we'll say if is over threshold, and we can say, let's pass in the, let's just call it, let's see, we can call this, I guess we're just passing the messages here, passing the messages. If the messages are over the threshold, oh wait, wait, hold on, we've missed a step.

[00:02:08]
We need to calculate what are the, we need to calculate all the tokens for the current history first, so let's do that, so we'll say, I'm trying not to write this thing twice, I don't want to write this thing twice, but let's just write it twice, so I'll say, actually, let's do this. Actually we'll do that, we'll say messages and then I'll say precheck tokens, this will equal estimate message tokens, and estimate message tokens is going to take in messages.

[00:02:49]
There we go. And if you go look at estimate message tokens, which we did not write, I'm not using the proper way to estimate these tokens. Well, I mean, here I'm just adding these tokens up, but as far as how many tokens are actually being run, you would need a model to tokenize your text to give you the appropriate amount of tokens.

[00:03:12]
That is async, it costs money, there's APIs involved, it changes depending on what model you use. There's too many variables I just did not want to put in here, so I just, you know, on average, every 3.75 characters is a token, so it's not 100% accurate, but I did it for brevity. So if you're wondering how did he calculate how many tokens are in it, well, I kind of just guessed.

[00:03:41]
So that's what I did here. Okay, now that we have that, we can check is over threshold. And what we want to check is like, hey, is there, first of all, let me fix that, precheck tokens. If this is over the threshold, as in the total, then, and then we need to pass in the model limits context window, so the full context window of this model's limits.

[00:04:18]
If it's over, if the current messages array total amount of tokens is over the context window for this model, then we need to compact. So now we need to say messages equals await compact conversation, working history and the model name. So that'd be the first place we can patch at the top of the loop if it's over the threshold.

[00:05:07]
And then from there, we need a way to be able to report the token usage, so we need a function. I mean, you don't need this, we already did the compaction, but I have a feature in the UI that shows the user how many tokens they've used so far. We need to be able to report that. So we have to calculate it on every single place in which the tokens might change, and then call that.

[00:05:32]
So what we can do for that is we can make a function here, we can call it report, well, actually, do I want to do this in a loop? I don't want to do this in a loop. Oh yeah, we have to do it in a loop, I guess, because the messages is constantly changing, so I'll say const report token usage, it's a function. If callbacks on token usage, so if you have this, then we will do this.

[00:06:21]
Const usage is estimate message tokens given the messages like this. And then just update the UI, we can say callbacks on token usage, and you can pass some stuff here. The input tokens so far, which would be usage input, output tokens, usage output, total tokens would be usage total, context window would be the model limits, the context window, the threshold is our default threshold, and the percentage would be calculate percentage, which takes in the usage total and the context window, model limits context window.

[00:07:18]
Cool. So the UI can see all this and show you cool stuff, so we have that. And then from there, all we have to do is just call this function after each significant change to messages. So anywhere where we are pushing to messages essentially. So let's see, right here, we could say report token usage. Wow, I can't spell.

[00:07:53]
Let me fix that. Token usage. Cool, and then where else? Looks like we're doing it right here too, so you can say report token usage. Hopefully that works. I didn't mess that up. And let's see, it's going to be hard to reach the token limit, but at least we can see the counter. So let's check that out. I guess I could drop the threshold to 1% and we can see it.

[00:08:26]
It's from build. That's all it. Some AGI. Hello. There we go. We can see our tokens, our threshold is 80%. I didn't really use that many tokens, so it's only 0%, but let's go change that. So let me, let's go find a bunch of stuff. So I can just copy right quick, what's a great spammy thing that I'll just copy everything, yeah, I'll just copy all this.

[00:09:14]
I'll say what is all of this? Ooh, what has this got to do? Okay, see what happens here. Thinking, thinking, thinking. Web search? Oh no. Why did it do a web search? There's links, there's links on here, that's why. Oh no. You see what I mean? I need to write an email. If you see links, that doesn't mean you should be web searching, don't be crawling links.

[00:09:47]
The fact that I knew I pasted this is insane. Okay, gotcha. And then you can see, I've used 0.6% of my tokens, right? And you can see how easy it is to prompt inject yourself, so this thing could have done some crazy stuff, but yeah, you can see our token threshold going up.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now