Coding a System Prompt

Netflix

Lesson Description

The "Coding a System Prompt" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott demonstrates creating an evaluator by defining a schema for the judge, including output structure, score constraints, and reasoning. He shows using the AI SDK’s generate function to produce structured outputs, similar to setting up tool call inputs.

Join Now

Preview

Transcript from the "Coding a System Prompt" Lesson

[00:00:00]
>> Scott Moss:All right, so I am going to, I guess I could just stay on this branch. I don't really need to check out, but you can check out to Lesson 5 if you want to be where I left off, or you can keep going with what you have if it's working. We're just, everything is building off of it, there isn't anything introduced that's new. Just so I'm staying up to date, I will check out to Lesson 5. I want to be super consistent with what's in the notes.

[00:00:30]
And, but yeah, as you can see, it's the same thing. Lesson 5 is the same thing we just made. OK, so what we want to do is we want to go ahead and make our evaluator. So if we go over to evals, we go to evaluators. Right now, all we have—we did the, which one did we do yesterday? We did the tool selection score, right? And I mentioned there were other ones in the notes that you can put in, so this branch has the other ones in there.

[00:01:01]
We need to make the LLM as a judge. That's going to be our single evaluator that we don't have yet. So let's go ahead and do that. So at the top here, first thing I'm going to do is I'm going to make our schema. This schema is very similar to the schema you would make for inputs on a tool call, right? So when you're making a tool, you have to give it an input schema to tell the LLM what the input should be to pass into this tool.

[00:01:30]
That's structured outputs, but in the case of tool inputs. We're doing the same thing. We're just making a Zod schema here. So we're just going to say, hey, Judge, we need you to return an object that looks like this that has a score. And it's a number. And we can say the minimum of that number must be 1, and the maximum of that number must be 10, and then we can give it a description. Or describe, and you can just describe what the score is supposed to be, so I could say score from 1 to 10 where 10 is perfect.

[00:02:14]
Right? You know I'll be asking why don't I just put 0 to 1 instead of 1 to 10? Well, in my experience, and I don't think, I don't know if it's a Zod thing or some of the LLMs or a combination of two, it's really hard. It's all, you're not going to get really good accurate results if you try to do like floating points with LLMs and I don't know if Zod even supports that. So I just have it do real numbers, right?

[00:02:41]
And then I'll just do the math myself, like just divide by 10, it's fine. It's easier that way, and that's through tons of testing. So, and then I always like to put a reason here. This right here, it's like obviously it's helpful for us to see it, especially like in the eval, we can see the reason why the judge selected this score. But what I've noticed, what it does is by asking an LLM to give us a reason why you did what you did, it makes it think harder about it.

[00:03:19]
That's less and less important with reasoning models, but before reasoning models, this was literally like if you added this one thing to your structured outputs, the quality will go up like crazy just by saying, explain yourself or give me the steps that led up to this conclusion. And it's like, oh, well, now that I got to tell you. And the quality is just so much better. So yes, it's useful to see it, but it's also ensuring that our LLM isn't going to be lazy and now that you have to explain it, you really got to think about it.

[00:04:00]
Brief explanation for the score. Right. Explanation, explanation, there we go. There's our schema. And then LLM as a judge, well, it's not going to be much different than what we've already been doing because we're just talking to an LLM so we kind of already know what to do. So we'll say exports, let's just say LLM judge equals a function here. It's just going to take the output, in this case, it's going to be a multi turn result and then it's going to take the original input that was, I'm sorry, it's going to take the target as in what we expect it to be, which is a multi turn target.

[00:04:51]
OK. Only thing new here is we're going to use something new from the AI SDK called generate object. So make sure you have that imported at the top if you don't already, but what is generate object? It's the same as generate text, except it takes a schema and it will, instead of returning you text, it will return you an object in the shape of the schema, structured outputs. It's quite literally the same thing as tool calling inputs.

[00:05:20]
Think about tool calling inputs. Whatever you put in that schema, that's what that's shape what the object's going to be for the arguments for that tool call. That's also structured outputs. In fact, before structured outputs was a thing, people would just use tool calling input schemas for structured outputs. They would set up tool calls, not for the purpose of executing a tool, but just for the purpose of getting structured outputs as an argument and then once they got that structured output, they would just never feed that back to the AI.

[00:05:49]
They didn't care. They just wanted to do one single turn and give me that input object because I can give you a schema and you're going to stick to it. And then I'll take that and continue on with my workflow. I don't, this is not a multi turn thing. I'm not feeding the tool back to you. There's not even an execute function. I just want that structured output. So they just went ahead and made, you know, the LLMs do it themselves, right?

[00:06:12]
So generate object just generates an object versus a string, and the inputs are literally the same thing minus passing in that schema, right? So a model. In this case, you want to use a bigger model, but you can put whatever you want, whatever your account allows you to do. I'm just going to put GPT-4o.1, the latest one they have here at the time of this course, and then, like I said, you have a schema now, so you can put the judge schema.

[00:06:43]
You also have to supply a schema name. This is just for like inference and different things like that. It's, it doesn't change the behavior of anything, so I'm just going to call this evaluation. And then provider options because I'm using a reasoning model on OpenAI. I want to give this thing reasoning effort high, like that. I also have to give it a schema description, so I need to describe what the schema is.

[00:07:26]
So this is an evaluation of an AI agent response. And then I need to give it the messages that I wanted to have the first one. Like I said, it's typically always going to be a role of system. This is where a system prompt is. And this is where you can just, you know, kind of describe to the agent what its task is, so I already have this one. I'm not going to write it again, so I'm just going to copy and paste for the first time in this course.

[00:07:53]
Right. So let's just walk through this system. You can put whatever you want, there's no wrong answer, but you can see here I'm like, hey, you're an evaluation judge. Score the agent's response on a scale of 1 to 10, scoring criteria is 10, fully addresses the task using tool results correctly, 7 to 9, response is mostly correct with minor issues, 4 to 6. Response partially addresses the task, 1 through 3, response is mostly incorrect or irrelevant.

[00:00:00]
You can put whatever you want here, and then you can even out this thing, right? Like you, you got to figure it out. You got to figure out what works for you.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now