Lesson Description
The "Temperature & Top P" Lesson is part of the full, Practical Prompt Engineering course featured in this preview video. Here's what you'd learn in this lesson:
Sabrina explains the controls available to make LLMs more or less deterministic. Temperature ranges from 0 (deterministic) to 2 (random). It controls how often the LLM will pick the next most likely token. Top P is an alternative to temperature and is used to remove potential answers from the dataset.
Transcript from the "Temperature & Top P" Lesson
[00:00:00]
>> Sabrina Goldfarb: So I promised you that we would talk about the things that we can do to make LLMs more or less deterministic, right? And here they are. We are not going to be playing with these today because we're going to be focusing on talking about really practical prompt engineering things, so utilizing prompts and chat, utilizing prompts in Copilot or Cursor or one of those types of tools, but I think it's important for you all to understand temperature and Top P in case you're making AI applications or maybe utilizing AI applications, and you just want to know about all of these things that contribute to the randomness of LLMs.
[00:00:40]
So first, let's talk about temperature. Temperature at its most simplest is it controls randomness, and what does that mean? So our average is about 1, and we talked about how these LLMs are pattern predictors that are predicting the next token, right? So the temperature controls how often we will pick that next most likely token. So if my temperature is set to zero, the LLM is essentially deterministic in the sense that it will always pick that next most likely token, always, right?
[00:01:16]
Whatever that top percentage is, that's getting picked. If I turn my temperature way up to 2, which is its max, I am going to get something completely illegible. Like, you will not be able to read anything from it. It will not be like broken English, it will just be randomness, right? So we'll never really use two because it's just chaos. This is important for us to know, especially if we're building AI applications, because lower is going to be better for factual tasks, code, data extraction.
[00:01:50]
Maybe I'm building an AI application in healthcare or in banking, and I really want to make sure that my LLM or my AI agent is behaving very specifically how I want it to, right? It's never going to make an assumption. It's never going to guess something that's a very low percentage chance of being the next most likely token. That's when I'm going to dial my temperature down from 1 to maybe 0.5. Higher is going to be better for creative writing, brainstorming, and varied solutions, right?
[00:02:25]
So if I want to make an AI application that helps novelists come up with some new ideas for books that they might want to write, or maybe I want to do a support chatbot, and I kind of have the flexibility to make this support chat AI agent be a little bit more friendly and creative. Well, then I might dial my temperature up towards 1.3 or something like that. Again, we're not going to go to 2 because 2 is just going to be complete chaos.
[00:02:57]
In addition to temperature, and we can edit temperature if we're utilizing the APIs for these LLMs, right? So if I go into OpenAI and I want to add their API or if I'm adding Claude as a model for my AI application, I'll have the ability to control this temperature. I do not have this ability to control temperature from my chat, right? So if I literally just look at my Claude chat, you can see there's nothing over here that says control your temperature.
[00:03:29]
That's totally OK. One is the average and it's pretty good, and it's most likely what all of your providers are using, unless you're using a specific AI-built application that was built by someone else who maybe is focused on adjusting that for us. I want to talk about one more thing that controls randomness within our LLMs that's also really important to know, and that's called Top P. So Top P is an alternative to temperature, but we can also utilize it with temperature.
[00:04:01]
So Top P is a cumulative probability cutoff, and what that means sounds really scary and complex, but it's really not. So if my Top P is set to 1, which is the max, it is considering 100% of the options within my tokens. When we were talking about what color the sky is today, I said that there was blue as an option. There was gray as an option, there was orange as an option. Maybe blue is like the 75% of the time I'm going to get blue and maybe 20% of the time I'm going to get gray, and 5% of the time I would get orange, right?
[00:04:40]
It's like in that order of percentages. If I were to lower my Top P to 0.5, I am only considering the tokens within that 50 percentile, most likely tokens, right? So now I've cut out the ability to get gray in my answer. I've cut out my ability to get orange in my answer because blue was the top 75% of tokens. So I utilize this Top P again to control the randomness of my LLMs and I utilize this with temperature, right?
[00:05:13]
So maybe I pull my Top P down to 50, you know, to 0.5 to have that 50% if I'm using like a very serious business application, and then I can leave my temperature a little higher of like 0.8, 0.9 because I know that I just have less tokens that are less likely to happen. So again, not important for today, but if you are building AI applications, these are the first things that you should be adjusting along with your prompt changes because these are going to make a massive difference in how your LLMs and your AI applications are actually behaving.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops