
Lesson Description
The "Question & Answer Models" Lesson is part of the full, Open Source AI with Python & Hugging Face course featured in this preview video. Here's what you'd learn in this lesson:
Steve explains question-and-answer models, highlighting their extractive approach vs generative systems like ChatGPT. He also mentions the role of context size in producing accurate answers and factors such as confidence scores and filtering out low-confidence results.
Transcript from the "Question & Answer Models" Lesson
[00:00:00]
>> Steve Kinney: Question answering you might think is what like ChatgGPT and Claude do, but like it's actually you can argue better or worse. But I would argue it depends on what you're trying to achieve, so what's interesting about question answering is you're like, when I ask ChatGPT a question, that's what it does.
[00:00:20]
No, it does text generation, and like it's got some nuance where like that whole assistant you user thing is like not real. It's just characters that break up special tokens that will break up the different pieces based on how it's trained. And when we fine tune a model, we'll talk a little bit about how that looks, but it's just making up words right now.
[00:00:40]
You can make up words when you are trained on all of the Internet and every book as like anthropic, like tore them all up and scanned them in. Google did that a decade ago, so they got that part over with, but they are just generating the next word tuned to a probability of every public word out there.
[00:00:59]
Question answering is a little bit different, question answering will only use the context that you give it, so it's not probabilistically coming up with the next word. It is taking whatever body of text that you handed it and then doing the math to come up with based on that context what the answer is.
[00:01:23]
So like if this is a very short thing on hot dogs, my family's going to Chicago next week and then there was like something, so like hot dogs were a topic of conversation in my house. And so if you ask it something not about hot dogs, it's not gonna go well, or even something about hot dogs that's not in here, which you can argue, well, that's not good.
[00:01:55]
Or it's very good if you just wanted to get understanding about a particular document. If it is based on this particular content, I need to know something and I don't want you to use the world's information to answer this question just based on what I'm showing you. Then you want question answering over anything that Claude, Gemini, GPT-4o what have will give you.
[00:02:18]
So like again, the answer for all of these things will give you like different answers. I think this is inspired by like there was some clip that came out recently where Barack Obama said no one over the age of eight should eat ketchup.
>> Male: So this seems like good for like internal company documents or internal company data only want to have it, you don't want it to hallucinate, make exactly up random stuff.
[00:02:43]
>> Steve Kinney: Exactly, you don't want it to use anything else, not even hallucinate, you just only want to know based on this data that I have. And it will only do that in this case and it will give you a confidence score as well. So if we say, where are the first hot dogs?, stand up here, these are all things in there, let's go ahead and run this and see what we get.
[00:03:07]
Again, this model, if you look at it coming down off the wire and is 261 megabytes, so not a giant model, and there are, if you need something a little bit more sophisticated, there are slightly bigger ones. But could you have this on a single VM or connect some storage to a serverless function and do this rapidly, quickly, on very small things at some cost, but not a lot of cost?, absolutely.
[00:03:37]
And that's the interesting thing I think all of our brains are wrapped around when I think AI, I think one of these big dogs. But actually there's a bunch of very practical, less glamorous use cases that are super easily accessible. And again, I've said it before, I'll say it one more time for all the tech stuff we're doing today, TypeScript SDK support is exactly the same as Python's.
[00:04:00]
It's just when we want to get to all the fancy image stuff, which is fun sometimes, so we're going to do it. Question where did the first hot dog stand appear?, I know the answer to that one because I'm from New York City metropolitan area. Answer is Coney Island, so on and so forth, and there's confidence level in there.
[00:04:19]
Now, if you change the question to like It doesn't really matter what I say, it will only use the content that it has and it is not confident about that answer, as you can see. In fact, I think let's actually, instead of even trying to format the answer, let's actually just print the full.
[00:05:00]
It'll be a little gnarly to look at, but we'll actually see the full answer in this case. The interesting part that I want to draw to your attention here is when it answers, not only is it giving us a confidence score, but it's also giving us the index in that larger string of where it found that answer.
[00:05:31]
And so thereby, if it's not necessarily in there, it's got that low confidence score, but all of its answers are legit coming from that string to the point where it will show you the index of where its answer started from and where it ended. So it is by definition locked into whatever context you give it, which again is either 100% not what you want or absolutely what you want.
[00:05:55]
And probably something that unless you have these tools, you probably don't have access to. Luckily, again, tiny little model that took us about 10 seconds to download and ran incredibly quickly on commodity hardware.
>> Male: In a practical setting, would you just filter out low confidence answers and get a canned response?
[00:06:16]
>> Steve Kinney: You would set a threshold, and that could obviously, should you dump all of this into a database, clearly then you could filter on that on a simple level, even if it was just on ingest. A filter function will work, you know what I mean, if above this store, if not, do other thing.
[00:06:37]
And so all of the programming constructs that you have ready to rock and roll work just as well and become incredibly powerful. A term that I kind of want to stick in your head for the vocabulary test later will be this idea of extractive versus generative. Maybe I'm going to expose that I'm not very smart right now, because I kept hearing the term generative AI and I'm like, isn't that just AI?
[00:07:09]
I see people generating images and they're generating text and they're generating blog posts and they're generating slop that seems generative. And because those big models that we see and know and love depending on the day are all generative, we tend to think that AI is always generative AI.
[00:07:28]
In the case of this question answering set of models and pipeline, it's actually extractive, we're extracting data out of it in useful ways. And for a lot of these, that's what we're doing, we're extracting the sentiment out. We're just taking given things and getting metadata from it that a younger version of me, to be clear, much younger version of me, less gray version of me, maybe try to do with a series of regexes that I would have regretted forever.
[00:08:00]
And all of these code snippets are very small and very lightweight, they're going to get gnarly at one point, the last one today. We will not even step line by line through the code because it's wild, but it's more to show a concept more than anything else. They start small, you can get big and complicated with them, but we're at the beginning of our journey, small.
[00:08:20]
But like I said, that span of a start and end will show you where in the text.
>> Male: So for example, if you gave Nathan's first name at the beginning of the paragraph and then you mentioned his last name at the end of the paragraph and you prompted it for like Nathan's full name.
[00:08:36]
Would it be able to do that?, would it be able to say, well we learned his first name up here and his last name down here, or would it not?, because that's not a direct quote from the past.
>> Steve Kinney: I think for this one it wouldn't, but like that's definitely, do we have Nathan's full name?, [LAUGH] does anyone know Nathan's last name?
[00:08:56]
I guess it doesn't really matter in this sense, I'm trying to figure out how to even like the hard part is like tweaking the content in a way to do that, I'm like, how would I write that sentence? Let's definitely play around with it and let's see, I don't think it would, I think because it is starting with the starting index, the ending index.
[00:09:13]
I am like at a 0.98% confidence, if I may use the parlance, that it won't, but like that's a question I haven't got.
>> Male: It has to gather text from different parts, it has to be together in the content is what I'm thinking.
>> Steve Kinney: But then you could think about like, could you like, and again, we're just using the pre baked.
[00:09:33]
These again are doing the thing where they are grabbing a model, they're grabbing a tokenizer, they're grabbing all of these things, they're grabbing it and better and doing it all. Like if you started to pull apart the various pieces, could you pull, tweak one of these a little bit?, my sense is probably yes.
[00:09:54]
Pitfalls, the answers down the context, ambiguous questions, obviously it can only find you the facts. And with anything this is true of the generative ones as well, the longer the context, the more it degrades because again, the more things it could find, so on and so forth. That's true even if for a lot of the larger tools, one methodology that people use is called RAG Retrieval-Augmented Generation.
[00:10:24]
If you want ChatGPT to know about your stuff, what you do is you take all your stuff, you put in a vector database which you turn into numbers, we'll see that later. But you have to always do that in small chunks too, and this is what I deal with my day to day life now is what size chunk is the right chunk.
[00:10:46]
I'm working on this thing right now, where for a given webpage, we want to highlight the part of the DOM relevant to the topic you just asked, but like, how much DOM do I want?, what is the right size chunk? And then you throw in things like spans and like, the DOM tree is a mess.
[00:11:02]
But getting stuff right for the given thing that you're doing, of the size of the context or chunk versus how you feed it into the system, there's not a known right answer for that. It depends on what you are trying to do, the data you are working with, the questions that you are asking.
[00:11:22]
But if you do hand it an entire book and say, find me one fact in it, your chances of getting what you want is going to get worse. And so do you break it up?, do you break everything up by paragraph, by sentence?, too little context, you have a problem, too big context, you have a problem.
[00:11:40]
And it really depends on the questions you're asking and the content itself, do you strip content that we think is meaningless? And we'll see that these models do that themselves in, like, minor ways as well, and then it's not us to do it kind of in the macro ways, even less.
[00:11:55]
Like, what is the best brand of mustard for hot dog?, I don't think it's in there.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops