Cursor & Claude Code: Professional AI Setup

Thinking Mode & Compacting Context Windows

Temporal

Lesson Description

The "Thinking Mode & Compacting Context Windows" Lesson is part of the full, Cursor & Claude Code: Professional AI Setup course featured in this preview video. Here's what you'd learn in this lesson:

Steve explains "Thinking Mode" in Claude Code. It is a mechanism that allows the AI agent to use "extended reasoning" and evaluate alternatives more thoroughly before producing output. Instead of immediately generating code, Claude will attempt to understand the complete context of the task. Compacting the context window is also discussed in this lesson.

Join Now

Preview

Transcript from the "Thinking Mode & Compacting Context Windows" Lesson

[00:00:00]
>> Steve Kinney: But they also have this interesting thing. This is actually true of the anthropic API as well. It just happens to work in Claude code. In fact, they mentioned the cloud code docs, but they kick you off to the regular anthropic SDK docs at this point. Till I talk about this, which is, you can use this thing called thinking mode, because we saw like 03 and 01 and 04 are reasoning models.

[00:00:22]
They engage in thinking. Any of the anthropic models can engage in thinking if you compel them to do so. Which mechanism puts them to extended reasoning? They'll spend more time thinking about what they're going to do and alternatives before they even try. They'll think about what they do before they do it again.

[00:00:44]
We all aspire to that. Sometimes it's tricky, obviously thinking cost tokens, but like, so does throwing away what it just did because it was garbage and asking it to do it again, it becomes a fine art. So how do you do it? You just use the word think in the prompt.

[00:01:06]
You just say, think about. Think this through. If you say the word think, it will use about 4,000 tokens of reasoning, right? Like I said before, if you think about the context window of Gemini and Opus, they're 2 million. So, it's not nothing, right? That is like, 16,000 characters, right?

[00:01:31]
Like, Think Hard will go up to 10,000 tokens. I didn't know about Mega Think until I reread the docs I like because I know that Think Harder and Ultrathink do the same thing. Which one do you think I use? Ultrathink. I have seen some people use the ultrathink.

[00:01:50]
I don't believe that does anything. Try it out. Because like, Ultrathink maxing out at 31,999 seems awfully specific to me, right? I cannot imagine. Like, the rest are like around 10,000 tokens, around 4,000 tokens, 31,999 tokens. And I cannot believe that ultrathink does anything. Maybe I'm wrong, but that seems oddly specific.

[00:02:18]
Now keep in mind, all of those tokens cost more with Opus. I would love to tell you there is an answer of does Opus without Ultrathink outperform Sonnet with Ultrathink? And does the amount like, maybe. I don't know. I couldn't tell you, I am trying them all. I'm getting a feel for it.

[00:02:48]
Like, and like, I don't have a. This always works like this. I'm like, you do start to develop a sixth sense of like. And then when I'm angry, it's opus ultrathink. And then I run, I blow through my limit and I'm sad. But like, it's a thing. But it is worth thinking about again.

[00:03:10]
Reddit comment I posted before, a good plan will save you tokens. And so do you use the token to come up with a good plan? Do you write the good plan? Do you sit there in Gemini in the chat window and come up with a good plan and paste it into the markdown thing and not use any of these tokens?

[00:03:31]
I have done all three in the last three days, yes. I don't know. I'm figuring it out. This is the lint rules definitely work. How do you jockey between and divide up like the thinking times the model plus the tokens, plus which rate limits apply where. I mean, how I behave when I get the warning that I'm approaching my rate limit is different than how I behave at 8am in the morning.

[00:04:00]
So yeah, extended thinking could waste tokens, it could save you tokens, it could rewrite the whole thing in elm, we don't know. But yeah, okay, so, and we'll pull this up in there in a second, but I think this is worth talking about before we jump in is it has this notion of compacting context windows, which I think has some ergonomics that we need to think about.

[00:04:24]
And so cursor will just say you need to start a new window now, a new chat. Now Claude will do this thing of like approaching context window limit. We're gonna precompat. We're gonna compact soon. It's gonna happen where all compaction is, is it looks at the giant summary of everything, right?

[00:04:46]
And it says, it sends it back to the model. That's why it needs some space to do it. It says, summarize this into a smaller thing, which you will lose some fidelity, right? And so if you're in the middle of a big thing, okay, you have no choice, right?

[00:05:08]
But what you don't want to have is, then I started talking about something else, and then I went to this other task because that's going to get summarized to your thing. You're going to lose fidelity of the thing you cared about to keep context of the thing you stopped caring about hours ago, right?

[00:05:22]
And so yeah, shrinking it down, take a bunch of tokens, make it less tokens, not that hard to think about. You can call compact anytime you want, right? And one belief system is you should do it like whenever you're between a Task. I would argue there's also clear, which is like starting a new chat.

[00:05:46]
And if you're truly switching tasks, I don't even need to compact it at that point. You know what I mean? That's a, you know, your mileage may vary kind of thing, but you can trigger one. If you don't, it will do it right, but you're probably better off having some control in when you do it.

[00:06:04]
So switching tasks, any kind of natural stopping point you call compact it will take your long context window, smush it, that's a technical term, into a smaller one. Yes, you lose words. Hopefully, it does it well. You can like, if you do it yourself though, you can also give it a prompt in a few words of like, hey, this is what I care about.

[00:06:26]
And it will take that into consideration. If you don't do that, and it does it automatically, because it's hit the point where I gotta compact, I'm gonna blow through the context window, it's gonna yolo it, right? And you might get the thing you want. You might get the thing, not get the thing you want.

[00:06:41]
So like, generally speaking, compacting it down or clearing it out is the path forward. And again, if you do it yourself, you can at least give a summarization of like, only keep the to do items or summarize it down to just the things about this. And it will do that.

[00:06:58]
If you don't do it yourself, you're rolling the dice and hoping it makes good decisions. Alternatively, this is the same as quitting it and starting it again, which I did for the longest time until it encouraged me to hit slash and see what the other commands were. That works as well.

[00:07:15]
It's great. It will totally wipe it out. There is possibly one reason why you wouldn't want to use this. And I'll show you this in a second.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now