Building an AI Agent Q&A

Netflix

Lesson Description

The "Building an AI Agent Q&A" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:

Scott discusses the importance of sending messages in a certain order to prevent the AI from erroring. He also explains how to use the Zod function to describe the purpose of a tool and how to add reasoning to improve the AI's decision-making process. Scott also mentions the use of token counting to manage token limits and the option to set a maximum number of tokens for the response.

Join Now or Start a Free 5-Day Trial

Preview

Transcript from the "Building an AI Agent Q&A" Lesson

[00:00:00]
>> Speaker 1: What if you took the get weather out of there and say get stuff and then put in use this.
>> Scott Moss: Get stuff and then do what.
>> Speaker 1: And then change your description to say use this to get the weather.
>> Scott Moss: Kay [LAUGH] Use this to get the weather I think for sure we'll use it it would for sure use it now so.

[00:00:21]
Let's see I think we have to do like first of all let's not ask it for the weather like hi what happened yeah okay that's great I'm so glad we saw this because I want to talk about this. Maybe I told you if you don't send back once you start getting the tool calling you have to send back messages in certain order or the AI will freak out.

[00:00:43]
This is it so I'm glad we saw this and I didn't delete my stuff yet. So basically what this is saying is, hey, I asked you to run a function. I expect the the immediate next message to be a result for that function, and you didn't do that.

[00:00:59]
So it's freaking out right now. You have to reply back to the function call with the same ID, in this case it's the ID of this, with a response. And I didn't, cuz the next message I sent was what did I just type in? I typed in hi so it's that's not a thing, right?

[00:01:16]
So if I go look at the database, what it's saying is that the last thing I sent you was asking you to call this function called get weather. I would expect you to return back with a message of type tool. And the content is whatever you want it to be.

[00:01:34]
And the tool call ID would be the same tool call ID as this one. If it doesn't see that, it's gonna freak out. So this is where it can get pretty complicated sometimes, especially if you have human in a loop and things like that. But let's try it again with a clear one oops, guess you do gotta put messages there.

[00:02:03]
>> Scott Moss: Cool okay, cool I said hi it didn't ask for weather that's good. Now, you said ask me for weather?
>> Speaker 2: Yeah.
>> Scott Moss: Okay.
>> Speaker 2: How's the weather?
>> Scott Moss: How's the weather?
>> Scott Moss: It didn't do it, no, it did it? No, yeah, I didn't do it, yeah so even though I told it to use this for the weather and then

[00:02:32]
>> Speaker 2: Wait, do we know that descriptions li actually making it into the
>> Scott Moss: Yes, this description should be that's what the Zod let's go see. So this thing right here, this Zod function thing. You can see so it'll take in the name parameters of what you wanna do, and then in those parameters, you can describe to the AI what this tool does, right?

[00:02:57]
So let me show an example of that
>> Speaker 2: There's a top level description on that.
>> Scott Moss: Yeah, that's how
>> Speaker 2: Object two
>> Scott Moss: Looking at is this what you guys were saying?
>> Speaker 2: Yeah, that's what we're saying.
>> Scott Moss: I'm sorry, I thought you meant here.
>> Speaker 2: No, no, no.

[00:03:11]
>> Speaker 1: Okay, okay, okay, okay.
>> Scott Moss: I think you can do that too.
>> Speaker 1: Okay, okay, okay.
>> Speaker 2: Yeah, you might be right.
>> Scott Moss: No, no.
>> Speaker 1: I think doing that too is like, hey, if you need this ID, tell it some context.
>> Scott Moss: I see.
>> Speaker 1: That.
>> Scott Moss: No yeah, no, you guys are right, I'm sorry we were talking about two different things.

[00:03:23]
Okay, so yeah, so this right here, yeah, okay so what do you guys want me to put here?
>> Speaker 1: Just that was there.
>> Scott Moss: Okay, cool.
>> Speaker 1: You're describing the weather tool.
>> Scott Moss: Okay, cool let me delete my database so we don't get screamed at.
>> Scott Moss: There we go and then let me say, let's just do Hi, okay, and then, what do we expect when I hit this?

[00:03:54]
>> Speaker 1: I think it'll call the weather now.
>> Scott Moss: You think I'll call the weather, and that's because
>> Speaker 1: I'll get stuff for the weather.
>> Scott Moss: It'll call get suckers we told us the weather got you. I think so too, Let's see.
>> Scott Moss: It did, there you go it called the weather.

[00:04:12]
So the thing I noticed about the descriptions is you reach a point where you can only put so much. In the description, you can't put a full system prompt in here because then it loses its ability to reason. So once you start having like a lot of tools and a lot of descriptions in there, you have to make something that decides the right tool other than this assistant, right?

[00:04:47]
Because you need another LM in front of this that's again an intent classifier. That's, hey, given this user message and this list of tools that this other agent can do, give me back a list of, I don't know, objects that describe or getting back a list of. All the tools with a confidence score on like which ones you think I should be running and then you can decide what tools to call.

[00:05:11]
Because once you start adding so much stuff in here, three sentences four sentences, it starts to freak out. One thing that I learned that's really good is, if you always add this one reasoning or reason yeah, maybe just reasoning. And you just describe, why did you pick this tool, it forces the AI to come up with a really good solid reason on why I decided to pick this tool.

[00:05:41]
And by having doing that, it sometimes for prevents it from picking the wrong tool. So this is chain of thought so sometimes you'll see this a lot in a lot of people's prompts is think about this or tell me why you did this. And having to do that usually prevents really bad stuff from happening.

[00:05:59]
So I have this on every single tool call that I have made. And we have 40 of them, so [LAUGH] Works really good. Cool, any questions?
>> Speaker 1: Does that get all put in when you run that?
>> Scott Moss: Yeah, it does you wanna see it, let's do it yeah, it does it's kind of funny sometimes, it's right?

[00:06:21]
>> Speaker 1: Reasoning it.
>> Scott Moss: Yeah and it helps you because then you, then you can tune on it if that's what it thinks, then all right, let me correct that behavior yeah.
>> Speaker 1: Like you're fine tuning me now.
>> Scott Moss: [LAUGH] yeah exactly so, where is it, I gotta log it better let me log it better yeah, all right, was it?

[00:06:45]
There's, yeah,
>> Speaker 1: Accurate weather.
>> Scott Moss: Yeah I picked this to provide accurate weather information. Yeah so that makes sense that totally makes sense that's reasonable, so, but sometimes here's the thing, even that can be a hallucination. So it's [LAUGH] You It'll hallucinate a good reason and then do the wrong thing, or it'll do the wrong thing and hallucinate a good reason.

[00:07:08]
And it's the reason was great, but you didn't actually pick the weather [LAUGH] But you said you did. Or you didn't pick the weather, but you said it was a good reason to pick the weather it's very weird. One pattern that I found is use the smaller model to do the stuff, and then use the bigger model to check the smaller model and it works out pretty good.

[00:07:31]
>> Speaker 2: Is there a way to count the tokens or estimate the tokens response and what you're sending before actually sending it?
>> Scott Moss: Yeah there is in Python, there's one called tip token, or something like that but NPM, I'm trying to get the exact name of this one. Let me find it let me check this one.

[00:08:00]
>> Scott Moss: This is the one okay, yeah, GPT, tokenizer yeah, this one's really good. So this one will allow you to take in some texts and return the array of tokens that you have, and then you can just count the length of that array, and you'll see how many tokens you have.

[00:08:15]
So it also has helper functions is within token limit. So you can give it the text, give it the limit of your model, and then it'll tell you if it's within limit or not. We use this one in production to make sure we don't break the chat interface by doing things in rag or anything that we're doing before we save it.

[00:08:35]
Before we attempt to save it, we'll do this. Sometimes we'll just wait until the AI screams at us and we'll detect the error message of like too many tokens and then we'll go back okay, sorry, too many tokens. Let's summarize that answer before we give it to you, and then we'll give it back to you.

[00:08:50]
So we're we're experiment with a lot of different stuff but yeah, counting your tokens is a deal is a good deal. You probably wanna do that, not just for making sure your agent doesn't break, but also for things analytics to see how many tokens people are using on your product.

[00:09:06]
Because then you can correlate that to cost you can go to any you can go to APIs, docs and they'll tell you this is how much it costs per million tokens, input and output. And that way you can track cost amongst your users because you're tracking how many tokens they're sending and they're getting.

[00:09:20]
And then you can see how much users are charge accosting you and you can charge those people money [LAUGH]. Or if your product is usage based, you actually have to track these tokens because you're passing that cost directly to your users, we're charging you guys. I don't know, they might obviously you wouldn't tell them this, but like in your head you might be charging a 20% margin on tokens.

[00:09:41]
So you need to keep track of the tokens so you can accurate 20% margin on top, and that's what you charge them, or whatever it is. So, yeah, keeping track of tokens is a good deal.
>> Speaker 2: I think part of the question is output, an opening, I don't think they have parameters we can say I only want a max output of a certain size, or something.

[00:10:01]
>> Scott Moss: They do almost never use that for some of the things I mean let me describe that. So, yes, if you go here, you can do Max tokens like this but I guess I'll ask, how do you know how many tokens you need? [LAUGH] And that's the hard part, because if you're building a chat interface, you probably don't wanna do max tokens if you're building a one off transactional thing.

[00:10:27]
Yeah, that probably makes sense if I built the thing that takes a blog post and it returns true or false, whether or not this blog post mentions anything about I don't know a specific person or specific anything and it says true or false. Well, I know how many tokens false it's at most maybe two tokens and true is even smaller than that so I could say, yeah, limit that to two tokens.

[00:10:55]
So it's forcing it to either be true or false and that's it. I've had experiences in the past where you think by limiting the token, the AI will rethink what it would say but in actuality, it'll just get cut off. [LAUGH] So it's like you think if I tell you can only do 500 tokens, you'll make sure your answer is within 500 tokens.

[00:11:15]
No, it'll return a 2000 token answer, but cut it off at 500. So now you don't get a full answer and it's, that's not what I wanted you to do. So it's kinda weird, the other thing is I mostly use structured outputs. So for me, I get back JSON from this so and I don't want, and then those JSON characters are counted as tokens.

[00:11:37]
So I don't wanna get a game of counting brackets, and parentheses, and things like that, it's kinda weird right? So I personally don't use max output,
>> Speaker 2: Like limiting the response here is probably a better choice to say, hey, return just this TypeScript object that has a Boolean and,

[00:11:52]
>> Scott Moss: Exactly, yeah, structured output
>> Speaker 2: That way, versus
>> Scott Moss: Exactly, yep, that's what structured outputs is It's here's a just we're giving the tools a JSON schema. You can actually do a response format and give it adjacent schema, and it will return it in that shape too. So we just don't have a use case for that right now but yeah, good question any other questions?

[00:12:19]
>> Speaker 1: Does the max tokens apply to the responders or
>> Scott Moss: Yes, the max tokens
>> Speaker 1: Not the input.
>> Scott Moss: It's not the input, it's the responses the input is the whole entire context. So there's the input that the whole chat can have, and theoretically you could pass all that up in one post request, but then now you're bound by like whatever, HCCP.

[00:12:42]
Limits, [LAUGH] Well, whatever the internet allows you to send, I guess, but yeah, Max Tokens is the output.
>> Speaker 1: It is the output of the tokens.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Start a 5-Day Free Trial