Check out a free preview of the full Build an AI Agent from Scratch course
The "Improving Agents" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott discusses the concept of RAG (Retrieval-Augmented Generation) and embeddings, which are numerical representations of text. Scott also touches on topics such as human-in-a-loop, evaluation strategies, safety mechanisms, integration patterns, and specialized agents.
Transcript from the "Improving Agents" Lesson
[00:00:00]
>> Scott Moss: So taking agents to the next level, we talked about RAG a little bit. I had some folks bring that up. RAG, Retrieval Augmented Generation. It's essentially saying i'm going to, let me describe it. There's a really good, let me think. I think it's Open AI embedding me on their blog.
[00:00:18]
So let's talk about embeddings and RAG right quick. So the way embeddings work is that you can take some words, you run through an embedding model, that's another AI model, and it outputs text as vector. A vector is just a number between 0 and 1. A group of these vectors is like an embedding.
[00:00:37]
So each number is a dimension. So an embedding model might put out 1,536 dimensions. So for this one piece of text, you might get back 1,536 numbers. That is a numerical representation of this text. And they save this set of numbers in a vector database, okay?. How RAG works is, there's a bunch of information that you want your agent to know about, whether it's for question answering, whether it's just the chat history, whatever you want your agent to be recalled that information.
[00:01:14]
So what you do, ahead of time or as the data comes in, it depends on what the data is static or dynamic, you will convert that data into vectors. So the problem is vectors can only be so big. So sometimes you have to like break those pieces of data up and that's called like a chunking strategy.
[00:01:32]
We won't get into that, but you convert that text into vectors. So you convert them into this. And then when a user asks a question, you convert their question or their message into a vector. So now you have the message that the user sent that's a vector or an embedding, and you have all the data that you want to be able to use to allow this user to answer as like embeddings.
[00:01:53]
So then what they do is that they plot them. So here's what it look looks like if they were like plot with their words so the closer they are the more related they are in like semantics and things like that. So like woof is closer to canine companies say, because a woof is canine, right?
[00:02:11]
That's how a dog barks. Feline friends say moo, so moo is associated with this and then all the over here a quarterback throws a football has no relation to any of this stuff. So if you were to plot those 1,500 dimensions, or however many dimensions you had, this is roughly what it would look like, right?
[00:02:29]
Math wise, as far as like their relation. So what you end up doing, and here's the one that I was looking for, is this one's actually interactive. It's really cool. So you can see they uploaded a data set and you can see the different types of texture. So they got animals, athletes, film, transportation.
[00:02:53]
This is on a three dimensional plot, but like I said, it can be up to 1,500 dimensions. So they're just trying to represent that to us in three dimensions. So you can see how closely related things are, right? So, like animal, I don't know, I guess it's kind of equal.
[00:03:10]
It looks like it's pretty close to transportation, maybe somehow. Or what's this over here a village? Anyway, you can see how closely they are related. So what you do is when a user sends a message in, you also convert that message to a vector and you put that vector on the same graph as the other data that you stored.
[00:03:36]
And then you say, cool, here is the user's message, find all the chunks, the vectors that are closest to this one and give you back K amount, right? And that's what it'll do, and that's how a vector database works, essentially. So you're just plotting them in this thing, and there's different types of algorithms that search most people do, like cosine similarity.
[00:04:00]
There's other ones, but you just run that function and it'll give you back all the ones that are the closest using math. So it's literally doing math on these words. It's kind of crazy how it works, but that's embeddings. So OpenAI has an embedding model that a lot of people use to do this.
[00:04:14]
So RAG is something that is super helpful. You will probably most likely get into using it. The other thing is different types of memory strategies. I already talked a little bit about these. I talked a little about Human-in-the Loop. Human-in-the Loop is very important. Again, here's a really good example of this.
[00:04:36]
So, for instance, if I have something here where, I have a tool, and like for in our system, we call it approval, but in this example, I call it ask user. So in order for a system to do something, it has to ask a user. It might even say, here's the question I need to ask the user, and here are the options that I think they need to respond with.
[00:04:57]
So you can imagine putting this in the UI is like, a message pops up and then there's two buttons they can pick from, this or that, right? And then once you click on one of those buttons, it replies back with the tool like, yeah, I want to do that.
[00:05:10]
And now the AI is like, great, you told me you want cheaper with Layover, that's what you want. Now I can call this bookflight knowing that that's what you wanna do. So it calls a bookflight with the right parameters. So this is a human in the loop where you could think of as the same thing as function calling, except for the function needs to be called is not a function on your server.
[00:05:30]
It's a human having to do a thing, press a button, hit respond. And that can happen at any time. That happen now, that can happen later, it doesn't matter. A human can respond wherever they want. So, this is very powerful. I just don't believe you can build a system today that does destructive or mutative patterns, as in they write to a system, write to a database, they alter a database, they mutate something, without a Human-in-the Loop.
[00:05:54]
No one's ever going to trust it. There was an agent who was yeah, I'll book this meeting with this person. And you're like, show me first. Or sure, I'll send this email off. Wait, wait, show me the draft first before, you'll never trust it. You'll never trust it, so you need an approval step and Human-in-the Loop is the way you can do it.
[00:06:14]
But when people say Human-in-the Loop, they mean a lot of many different things. This is like Human-in-the Loop as far as task calling or tool calling. There's also Human-in-the Loop as far as like giving feedback. There's also Human-in-the Loop as far as like training. So there's many different ways when people say Human-in-the Loop, I'm referring to this.
[00:06:31]
Evals, Evls are good. They're like testing the output and accuracy against your llms. This is someone's job. Someone's job at a company is just to sit down to evals all day to find metrics, come up with different scorers, figure out how to get data sets. Fake data sets, real data sets, golden data sets, and running evals.
[00:06:54]
And then maybe even improving it or handing it off to someone else who does their improving of the system. But this is a whole job. If you're taking LLM seriously in production, this will be more than 20% of your time. And if it's not, you're doing it wrong.
[00:07:07]
You should be doing evals a lot. You can't just go off fives, it's not gonna work. You have to do evals. Trust me, I know it was bad. We had to do evals. So I spend a good part of my time just doing evals all day. Talk about specialized agents, things like that, safety mechanisms.
[00:07:33]
Good luck. [LAUGH] I mean, we talk about the Human-in-the Loop, that's a good safety mechanism. The other thing you can do is you can validate the inputs that your tool receives to make sure that they're good. You can have other LLMs check the inputs of your tools, check the outputs of your tools.
[00:07:50]
So you have the QA type thing. You have many different steps. There's a lot of different things you can do. You can get really creative with it. Again, I encourage you to think about it from a human. If I just ask some person to do some work, but they are very restricted in their knowledge, I might have another person check their work before it comes to me, before it's good, right?
[00:08:08]
Or I might have another person give them instructions, have the person do the work and have another person check the output of that work and then tell me about it. So that way, by time it gets to me, it's good. So you have to, like, think like that when you're developing these systems, like how would this work if I were like talking to a bunch of humans?
[00:08:21]
Because you're scaling human intelligence. We talked about context window stuff. We're not going to talk about training. Ooh, integration patterns, I talked a little bit about this. I highly recommend using background jobs if you want to be able to replace some of this stuff and not have to burn through a bunch of tokens when things fail.
[00:08:39]
So having like a redundant, durable system, there's tons of different products that can help you with that, in my opinion. So you have the one that I talked about before, which was QStash. This is what we use in production. I think it's amazing. Equally good is something called ingest.
[00:08:56]
That I think is also really good that. It does something very similar. I've seen QStash is simpler to work with, but this is also really good. It allows you to orchestrate steps and code. So each one of these is actually a job in a queue, and they can be replayed, and retried, and stuff like that.
[00:09:11]
It's a different way of writing code for sure. The best way I can describe it, if you were to use something like this or QStash, it's turning your service functions into React components, as far as like how they re-render, right? A React component might re-render multiple times. So if you put a variable at the top, that variable is gonna get re-initialized every single time.
[00:09:28]
So therefore you use hooks like state to make sure that doesn't get re-initialized. It's kind of the same thing. This function is gonna get called many, many, many times. So you can't really put stuff in here. This code has to be item potent, so it doesn't have weird side effects and stuff like that.
[00:09:42]
It's a little weird how you write it, but it's so great. Highly recommend having durable systems. Obviously, this is nothing new. People have been building durable systems for years, but you had to build it from scratch and now they have systems as a service that can do this for you.
[00:09:56]
So highly recommend checking out one of those tools. That are really cool. Everything else is pretty much stuff I already kinda talked about, but there's, there's some more stuff in here if you're super curious.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops