Transcript from the "Document QA Query Function" Lesson
>> Now, we need to write a function that allows you to query it. And here's where it gets a little more advanced in the search. The search kinda was just like this. Do a similaritySearch, done, we're done. Okay, well, you can see, we got a lot more to do after that.
[00:00:13] [LAUGH] So let's do it. So we'll say query, like that. We're gonna create a store. So we'll say, this is gonna be async, and we're going to await loadStore, like that. So now we have our store. And then we're going to get the results from the question. And we could probably just log that so you can just kinda see.
[00:00:41] So const results = await. And from here, we just wanna do the same that we did before. similaritySearch, pass in the question, and then I'll say 2 docs. So I'll say store.similaritySearch. Passing the question that we created above, that comes in from the command line or defaults to high.
[00:01:07] And I'll just say 2, right? So let's log that. So I'm gonna log the results here, and then I also want to log, let's log just one of these chunks. It's gonna be too crazy if I log them all. So I'll just say videoDocs
, and then pdfDocs
[00:01:27] And that way, you just kinda see what a vector looks like, what an embedding looks like. I want you to see that. I think it'll make sense if you can see it. And then I'm just gonna run query here, like this, okay? If we didn't break anything, we should at least see some logs.
[00:01:42] So let's try that out. So now if I say no qa, and then I pass in what is the xbox warranty? It's not gonna answer it, right? Because we didn't do that part. I'm just trying to see the logs. Okay, here we go. Here are the logs. So if I go and look, here is, and again, this has nothing to do with the question that I ask because this log is from when I indexed all the data.
[00:02:20] So it went to YouTube, it got the transcripts from that video, and what did I log? I log chunk what? Chunk, I log the first chunk from the video, so basically the first thing that the video picks up, that's what I'm logging. So you can see that this is what a chunk looks like.
[00:02:39] It's not the full thing. You can see why I was like, yeah, the punctuation on the transcript is not proper English, cuz it's a transcript, so it's hard to split on a period or something like that. So you're basically just guessing. So this is the 2,500 tokens that I said a chunk must be.
[00:02:57] And you can see it split in the middle of someone talking, right? And here's the metadata that I told it to add. I said, add all the metadata to the video, so it did, title, all that stuff. And it'll keep doing that forever and ever, but I only logged the first one.
[00:03:15] And then it indexed the PDF and I logged the, I logged the first one there as well. So this is the first thing that shows up in the PDF, is a Product and Regulatory Guide. So if I open that PDF, I'm imagining that's what I would see when I first open it.
[00:03:31] Yeah, that's literally the first thing on the PDF that shows up. So that's why that's the first chunk, right? Cool, yeah, and that's that. So that's why that's there. And then I closed it before it showed me the results of the search, so it didn't show me the results, but we'll keep going and we'll see everything.
[00:03:49] Okay, any questions on that so far? It's actually like, it makes sense why it does that. It's quite predictable in how it will work. And it almost makes it feel like it's not smart. And this is why I think sometimes when people are like, AI is gonna take over the world, I'm like, have you used it?
[00:04:07] This thing needs a lot of hand-holding and a lot of teaching, so I don't know, I'm not for there yet. Okay, so we got that. And now we need to actually send this up to GPT. So let's do that. And we're gonna do a few different things here.
[00:04:23] So let's say results, and this is awaits. Oops, sorry, we have results, let's say response. Equals openai.chat.completions.create. You should know this part by now. We're gonna pass in our model. I'm just gonna use, you know what, I'm gonna splurge a little bit, go a little crazy. I'm gonna do gpt-4-32k.
[00:04:51] I know, big flex. Relax, no big deal.
>> That API key is gonna stop working.
>> Yeah, no big deal, no big deal. If you're using my key, feel free, go ahead.
>> It's on me. [LAUGH] Go ahead. So yeah, you can use any model you want.
[00:05:05] I'm just doing this. But yeah, if there's one that you have access to that has a bigger context, use that one, use a 16k. I think the other ones are still fine, they're within token limit. Just use a bigger one just in case, otherwise you might run into a problem, I don't know.
[00:05:20] Depends on the query you ask, I don't know. So there we go. We got that. The other thing here is temperature. I want this thing to be factual, so I'm gonna put it at 0. I don't want you getting creative. I don't want you flying by the seat of your pants.
[00:05:32] I want you to be for real with me, don't BS me, keep it real. So I'm gonna say 0. I don't need the creative person right now, right? I need you to be real with me. Stop the silliness. That's what this is saying, okay? You probably always wanna do this for a document QA, almost always, just 0.
[00:05:53] And then messages, you know how this works. Let's make a system message here, we'll say role: 'system', content. And this is where it gets interesting. Well, not this part. This was actually pretty easy, just type that. And you can put whatever you want here. I'm just gonna do the same thing I always do, you are a helpful AI assistant.
[00:06:13] Answer questions to your best ability. Easy, okay? From there, we want to type in the user prompt here, and this is where we kinda have to make a prompt. So I'll walk you through this a little bit. So I'm gonna add another message. Its role is gonna be user, as in this is what the user is saying.
[00:06:35] And its content is gonna be, I'm just gonna use backticks here. We have to guide this thing a little bit. So basically, what I have here is I'm saying, hey, answer the following question using the provided context. If you cannot answer the question with a context, don't lie and make up stuff.
[00:06:53] Just say you need more context, right? I'm trying to guide it, like don't. I know I'll put you in temperature 0 but just in case if you wanna get crazy with me. Just cuz you're a people pleaser, you wanna make me happy and tell me things that I wanna know.
[00:07:06] No, no, I want you to just be like, look, I need more context. I don't know how to answer this, right? So that's what I'm gonna put here. I'm actually just gonna copy that, I don't wanna write it again. Okay, and then from here, we're gonna add the context.
[00:07:21] And in this case, the context is whatever the results were from the similaritySearch, right? That's gonna be the context. Well, I guess we should put the question first. Let's do question. So we'll say, the question is literally the same question that we used on the similaritySearch. So that's the thing about document QA.
[00:07:41] You use the question first on the similaritySearch to get back the matching chunks, and then you use the question again on the chat completion to answer the question. So you have to use it twice. Does that make sense? We needed the question the first time to figure out what chunks are semantically closest to it, and now we need the question the second time to ask GPT to answer it with these given chunks, okay?
[00:08:06] Okay, so we gotta put the question back in here. And then we have context. And I don't think there's a right way to do this, but the way that I ended up doing it was I just mapped over all the content and I just put in their pageContent.
[00:08:22] Or I mapped over all the context and I put in their pageContent and I just joined them on a new line. There's really no right or wrong way to do it. I just don't wanna give it an array of objects. I think that all those extra characters in there will mess it up, I think.
[00:08:36] So I'm just like, giving it unstructured text is a lot better than, here's an array of objects and brackets and things like that. It's just extra stuff. Not only does it make my call more expensive, cuz there's more tokens, but now the AI has to learn to ignore those things and just look at the data.
[00:08:53] So I'm just making it easy for it by giving it just the raw text. So that's what I did here. So I'll just do the same thing. I'll say results.map. I'll take a result and I'll say result.pageContent like this. And then I'll just join all these things on a new line.