Maintaining Context in a Conversation
Check out a free preview of the full First Look: ChatGPT API for Web Developers course
The "Maintaining Context in a Conversation" Lesson is part of the full, First Look: ChatGPT API for Web Developers course featured in this preview video. Here's what you'd learn in this lesson:
Maximiliano explains how to preserve the context of previous API requests and simulate receiving responses word by word by incorporating the stream response property. The discussion also addresses the current context limitations in ChatGPT, where the first message is automatically deleted if the request becomes too large.
Transcript from the "Maintaining Context in a Conversation" Lesson
[00:00:00]
>> So you know that when you're chatting with ChatGPT, it maintains the context, right? So I can ask another question, I said, do you know in which city they are located? Let's wait. I will answer the wait. I'm sorry, I don't know which city you're referring to, can you please provide more context or specify which city you're asking about?
[00:00:24]
So what happened here? Actually, on every request it's losing the previous context. So it doesn't know that we were talking about Frontend Masters. So let's answer one question at a time. We are waiting. You can see that we are receiving the whole answer at once, not word by word or token by token, as we were used to on ChatGPT.
[00:00:52]
Can we emulate that thing? Yes, it's possible. It's a property called a stream. Let me show you the difference here when we were using the terminal. So, there is another property, you can apply more properties here, for example, here we can change the temperature. Remember that value between 0 and 1, so I can say 0.5 and I can say a stream.
[00:01:16]
And I can ask for true by default is false. So if I try this stream, Not in the terminal. Let's look at the difference. Instead of giving me one answer at once, it's sending me partial JSONs per word, actually. And it's not, now the text now is not JSON value, okay?
[00:01:48]
So it's partial JSONs meaning JSONs over HTTP. You need to actually know how to handle that. Can you lead in JavaScript? Yes, but we are not going to waste time with that because we know that we are not going to do that client side. But there is a way to work with streams with, with a buffer stream where an HTTP stream.
[00:02:09]
So we can read, in this case, slice by a slice, chunk by chunk in JavaScript, we need to get rid of the data colon, and then parse the string as JSON. And we're going to receive basically a delta. Should we say delta somewhere here, delta. So what's delta?
[00:02:32]
It's the part of the full message that is coming right now. And then we need to add word by word while the ChatGPT is typing, okay? But anyway, most of the time, unless you are actually going to make a ChatGPT clone, we don't need the stream. So streaming the content first, the answer is longer because now we have more actually JSON to parse.
[00:02:59]
And most of the times we don't need that streaming, okay? So, we need the answer. So, we don't typically do that, okay? But that's just the waiting part. Now, the worst part is that our chat is kind of silly. In fact let me retry so it wasn't just random, I will ask about myself.
[00:03:19]
Who is Maximiliano Firtman? It knows me, so we wait for the full answer. And it's an Argentinian author, trainer, speaker, blah, blah, blah, blah. Where does he live? I'm sorry, I don't have a context, who are you referring to? So, by default, and this is something that for me was a completely surprise when I realized how this work.
[00:03:49]
Every request on a chat must include the previous context, we need to do it manually. Server side, OpenAI is not keeping the history that in this LLM work we know as memory. So, memory between conversations. So, how does it work? We need to do it manually, how? On every new message, we need to add the previous messages as well.
[00:04:24]
Does it makes sense? I'm not sure if it makes sense, but that's how it works. In fact, that's how ChatGPT works. Remember will but sometimes I have very long answers and then I have a long prompt and long answers. Every time I'm sending a new question he's sending the previous, yeah, and you're paying for the tokens again.
[00:04:43]
Yes, you are. That's why at one point, you probably felt that with ChatGPT, ChatGPT is actually lost. So you say, but we were talking about something, you were like right and now you're hallucinating. That's because it has a limit of tokens. At one point, it's removing the first messages in the chat.
[00:05:12]
Makes sense? That's why GPT4 has a bigger window, a larger window of tokens, then it can hold more context. That's how it works today at least, we don't have any other option, we actually need to do this. So that's why I already have an array of messages, so we can start collecting this.
[00:05:34]
So, we actually need to, for example, this role can be just the first message that we have here, okay? And by the way, we cannot hear your computer answering questions as a pirate or always in Spanish, so and you will use that, okay? As the system message has the power for how it answers like in a funny way or in a confident way, or in a short way, or whatever.
[00:06:07]
So what we need to do always, actually, is to send the whole array that we have there, always. And here, what we need is to add this to the array. So, messages.push, and we are going to add that one. So we parse the whole array of messages, in fact, we don't need this, we can just do this in ECMAScript today.
[00:06:33]
And then this is important. When we receive the answer, we also need to save the whole message, because if not it won't have the context of what it answered before. So here, not just the content, but also the role. So let me show you that. So, I'm going to also say messages push, and we are going to add json.choices sub 0 .message.
[00:07:08]
That's the whole message including the role assistant. So, let's see if now this works better. So, who is Maximiliano Firtman? Blah blah blah, never heard of him. Come on, you're hallucinating. Look at that, that's hallucination, and it's weird. They say, why is hallucinating? You were all witnesses that it knew me, but now it says, never heard of him.
[00:07:43]
But also, it's funny, can you look at that? I bet he's a computer genius. Well, it's a mix between hallucination and funny. So why hallucination? Well, probably because we didn't set here temperature, and by default temperature is 0.5. So if you don't want that, send temperature 0. In that case, to the same question who received mostly, the same answer all the time, okay?
[00:08:13]
Are we gonna remove the funny part? I don't know, we can keep it or not. You can play with this. So let's do that again. We need a clear button so we can start from chat who is Maximiliano Firtman? Okay, now we are back to the original answer.
[00:08:36]
And where does he live? I mean, the answer is not that he doesn't have context. It says, I don't know where he lives, but, yeah, because the answer is that he knows me, we're talking about myself. We can say, I don't know, something else. Do you know any books written by him?
[00:09:03]
Anyway, you know that it's sending the context, okay? Because even here, we can actually check the array of messages, we are accumulating the full chat, okay? And here, we have some books. And it's I think it's through, High Performance Mobile Web, that's fine. Programming the, yeah, no you hallucinate with that one.
[00:09:23]
And never, no, maybe it's the translation because I wrote one in Spanish and maybe that's a translation. Because I have one that is, it's something like that in Spanish, yeah. Okay, I'm sorry GPT for saying that you were a liar. Anyway, so we have our ChatGPT clone, and with this sample, we understood how ChatGPT APIs work.
[00:09:51]
So we need to make API calls all the time, and also, we have to send the context all the time. And of course, think about this. Every time you're sending this you're using more tokens. So this is kind of exponential in terms of how many tokens you will spend.
[00:10:10]
And if you want the amount of tokens used are coming back in the response, so you can accumulate that and then have some stats. And also, if you go now to the OpenAI website, your dashboard, you can actually see each call and you can see how many tokens you have been consuming.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops