Check out a free preview of the full Build an AI Agent from Scratch course
The "Types of AI Agents" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott provides examples of different types of agents, including chat-based agents and task-based agents. He discusses the characteristics of agents, such as autonomous decision-making, task persistence, and goal orientation. He also addresses questions from students regarding validating LLM responses and the potential impact of larger AI models on specialized agents.
Transcript from the "Types of AI Agents" Lesson
[00:00:00]
>> Scott Moss: All right, so the next thing we're going to do is build an agent. Let's get to it. Got a lot to cover, but it's gonna be really cool. So we talk about agents, what is an agent? Really, I have a lot of bullet points here, I'm gonna be honest.
[00:00:16]
Really, the thing that I think about an agent is it's an LLM that has memory, whether it's short-term or long-term, it can remember things, and it has the ability to do tool calling. We'll talk about, or you'll hear me say tool calling and function calling, they're the same thing.
[00:00:35]
It has the ability to describe. It has the ability to tell your system what function it wants your system to run. It's not to be confused with code interpreter, which is an LLM running code for you. That's an internal tool that an LLM might have or a platform might offer.
[00:00:56]
OpenAI has Code Interpreter, and it will run code on their system for you and give you back the result. That's not to be confused with function calling or tool calling, which is something different, and we'll dive more into that a little bit. But to me, an agent is just something that has memory, short-term or long-term and tool calling.
[00:01:16]
And because it does have tool calling, it works on a loop, and we'll talk about that loop in a minute. But eventually, I mean, basically having those things allows the agent to make decisions. Because if you give an agent, here are all the functions that you have access to, the agent has to decide if, when, or how to use those functions.
[00:01:37]
You can only try to influence it with different descriptions, but at the end of the day, it's gonna decide what tool is appropriate for it to use. There's a way to be like, you have to use this tool no matter what. But there isn't a way to dynamically say, if this comes in, you have to use this.
[00:01:58]
You have to make another LLM in front of your LLM to classify that intent, and then, you got to make sure that LLM is good. It's just LLMs all the way down, right? So [LAUGH] it just gets crazy, so that's the way I think about an agent. So it has autonomous decision-making.
[00:02:15]
It has task persistence. It has tool usage. It's aware of context, and typically, it is goal oriented. So typically, when you interact with an agent, you're like, can you do this for me, or can you answer this for me? And it creates a goal that is trying to accomplish, so it'll keep looping and keep calling functions until it's satisfied that it reached that goal, and then it'll respond to you.
[00:02:39]
You won't always just get back a response from an agent. Whereas every time we've interacted with this tool so far, we get back a response immediately, right? So if I say, what is my name? I immediately get back a response. Whereas with an agent, you might not get back a response immediate.
[00:02:57]
They might be like, hold on, let me think about this. Let me go check this, or okay, I did check that, I still don't know. Let me go check, let me go do this, kind of. And then it might come back and be like, can you tell me more about this?
[00:03:09]
Okay, cool, let me go check again. So it's trying to figure out what to do before it finally responds. So if that sounds terrifying, then yeah, it's pretty terrifying. I remember when agents were a thing, two years ago I start work. I was like, what in the hell am I looking at?
[00:03:27]
I set up two agents to talk to each other, to play chess or something like that. It was really entertaining. And I gave one the personality of snoot dog, and other the personality of his best friend, Martha Stewart, and it was fun. [LAUGH] So that's an agent. Different types of agents, I talked about this earlier, but you have chat-based agents, that's what we're building.
[00:03:51]
The difference between this chat based agent and a non-chat-based agent is that this has long-term memory. We just talked about memory, so we know what that looks like, that's it. It's an agent that has a recollection of all your conversations, including the results of previous tool calls. So functions that it called earlier, the results of those functions are also stored in the conversation, so it knows that too, it's pretty aware.
[00:04:17]
So use cases are customer service, representative, tutors, mental health, support assistants, technical support agents, things like that. Here's some pseudocode that I created just to kinda show you what this will look like. The only thing that's here that's different is this, if check. We don't have this where you can check to see if the AI responded with a tool, and then you find the right tool, you run that tool, and then you respond back with the result of that tool.
[00:04:46]
Don't worry, we'll get into this, but this is what makes it an agent. It's this part is that I like. The AI is telling you to go do something on your machine and then report back with the results, that's what telling to do, and we have to check for that.
[00:05:00]
Task-based agents are just one of things, right? So if I go into cursor right now and I say, do the thing, that's task-based, it's not a chat-based, it's not gonna remember that. Data analysis like, look at the spreadsheet and do this thing. Research assistance is another one, help me find something that does this.
[00:05:22]
Content creation is probably the number one thing when GPT 3.5 was out. I mean, I would say 99% of the use cases was creating content, like blog posts, Twitter posts, SEO stuff, it was just content creation. Nobody really knew what else to do with it other than to make content, podcast, videos, things like that.
[00:05:42]
And it's probably still the number one use case. Because obviously, the direct export of an LLM is text, it's content, so it makes sense to just use that. That's the exact use case, but LLMs are much more powerful than only just creating content. But creating content itself is not useless, it's great.
[00:06:03]
But that's the obvious solution, and there's a lot of other non-obvious solutions out there, I think. So an example of imitation of that is, in this example, you have this while loop, we'll talk about that in a minute, but you can see that the LLM basically tries to plan a step.
[00:06:25]
It tries to execute a step. It gets the results, and then it checks to see if the results satisfies the original request. And if it doesn't, it loops again until it does it up until a max amount of attempts before it just say, okay, we need to stop this runaway agent, which is a thing in AI.
[00:06:42]
You'd have an agent just constantly contradicting itself, like I got the right answer. No, I didn't. Let me check again. I got the right answer, no, that's not right. Let me check again, and it could just do that over, and over, and over. There's so many ways to get around that, but people do Mac steps and stuff like that.
[00:06:58]
So, yeah, real world, I talked about some of those already. Some coding ones are really cool, like GitHub Copilots, actually Tab 9, I think they got bought out, there's something else now. But some of these features are examples of that, so not gonna go into all of these.
[00:07:19]
Performance optimizations, we can talk about that later, and then I think this was really interesting, future trends. So I wrote this one, because people, I think just like being able to see what's possible with some of this just helps when developing it, so you have multi-agent systems. These are actually really cool.
[00:07:40]
Instead of having one agent who can do all these things, you have multiple agents and each agent is responsible for one part of something. So, you might have one agent whose job is to orchestrate other agents, and then, each one of those agents has a single responsibility. This agent is anything related to email, it can do email stuff.
[00:08:00]
This agent, anything related to calendar, it can do calendar stuff. This agent, anything related to research, it can do research stuff. And then you have one agent that orchestrates the communication between all three of them, and its job is to get the result and pass it to another agent that creates the response, right?
[00:08:16]
And you can have all these things working together like a team, it's kinda crazy. So that one's good.
>> Scott Moss: Improved tool creation, yeah, I think we'll talk about this when we get to tools, cuz it's gonna make more sense. Enhanced reasoning capabilities, that's any new model that comes out.
[00:08:33]
So any new model that comes out usually is an increase in reasoning capabilities, that's usually the biggest benefit of a better model. That's how they score these models, is how well it can reason. So typically, if you want better reasoning, for instance, if you have an agent and you're realizing it's just picking the wrong tools no matter what prompts you give it.
[00:08:53]
It's just not picking the right tools, get a better model. It's that model you're using is not good at reasoning, so you need to find a better one. Or make an intent classifier, which is another LLM in front of your agent that its job is to figure out what the right tool is, versus having an agent who not only figures out the right tool, but also uses it.
[00:09:11]
So you can separate the concern out to another agent, which does make it easy. Better memory management, yeah, right now, all the things that I talked about about summarizing, ejecting, evicting messages. If the agents themselves could just do that, or if the LLMs themselves could just do that, that would be great.
[00:09:31]
Always having to do it is kind of a pain. And then specialized domain experts, this was really cool, I like the idea of this. Like I said, most LLMs are generic. They're trained on the whole Internet for the big ones. But if you compare them to one of us, very specifically trained for one thing, that one is gonna be way better at that one thing.
[00:09:50]
It's gonna be faster, cheaper, and more accurate. So I think we're gonna see a lot of agents that are just experts at just one thing, and that's all they do versus like. There's always gonna be the OpenAIs and the Claudes of the world whose job is to make AGI and to make the biggest, most general intelligence out there.
[00:10:07]
But I think for most people, especially when you start thinking of running AI on a phone where the processing is not that strong, you're gonna need smaller, specialized agents that know how to do this one thing very well versus like, I'm gonna run ChatGPT 6 on my phone.
[00:10:23]
Okay, good luck, it's not gonna be the accurate thing. So, yes?
>> Speaker 2: Your production uses of AI, have you ever found the need to validate LLM responses with a follow-up LLM query, or is that just generally overkill?
>> Scott Moss: That's a great question. Times are changing. So the question was, have I ever found the need to, like I got a response from the LLM and then now I need to validate that with another LLM?
[00:10:49]
Yes, for many different reasons. So long time ago, two years ago, I don't know. It's not a long time ago. Before, there was JSON output or structured output where the LLM, you can give it a schema, and then it'll put data in that exact schema, right? Before, that was a reality, you would have to ask the AI to do it nicely.
[00:11:12]
You have to put that in a prompt, like can you please just send me back something that looks like this, and then sometimes it would, sometimes it wouldn't. So before you try to JSON.parse() that thing and destroy your whole system, you would check it with another LLM. Hey, is the output of this match this scheme?
[00:11:28]
If not, run it back, right? And actually, that's a lot with LangChained, LanChain did a lot of that. So it would take a schema you gave it, it would put it in your prompt for you, and it will keep running it over and over again, up until max amount of retries, until it actually came back with the shape that you wanted.
[00:11:45]
Now, that's built into models now. You can just say, actually, return me this shape and what we're gonna use it today, so you don't have to do that. So for other reasons I might wanna do this is sometimes the LLM comes back and it's like, it's just not correct.
[00:12:00]
It's not doing the right thing. So there's different strategies. You might ask it to come back with a confidence score. You might say something like, cool, when you send me back this answer, also give me a score from 0 to 1 or 1 to 100 on how confident you think this answer is.
[00:12:15]
And then, if that confidence score isn't past some threshold that you're comfortable with, you run it through another LLM or something like that. So you have this QA LLM. Yeah, I've had that many different times for many situations and I'm pretty sure I have some of that in production at some point.
[00:12:31]
But yeah, I do have a lot of that in production, so,
>> Scott Moss: Cool, any question?
>> Speaker 3: How do you approach insulating yourself from the kind of bigger LLMs just absorbing the functionality you're creating, just becoming ingrained?
>> Scott Moss: Yeah, that's a good question. How do I think about being cannibalized by the big general foundational models?
[00:12:58]
I don't. Actually, I really don't. Actually, I had a, how long ago, couple of weeks ago, we all had to sit down and we talked with Sam Altman from OpenAI about what they're doing, OpenAI, and things like that. I think at the end of the day, those companies, they're trying to create AGI.
[00:13:16]
They're trying to create the most general thing. It's literally in the name, artificial general intelligence. And I think even with AGI, it doesn't solve everything, even AGI is gonna have a lot of these problems. So I think there's a lot of value in going after something that's vertical and very specific to a set of people.
[00:13:37]
Because I think ultimately, when you think of the economics of things, it's okay, let's say AGI is here today. It's gonna cost so much damn money to use it that no one's ever gonna, I don't know, it'll probably cost you $1 to run it once or something like that, extremely expensive.
[00:13:52]
And that's assuming you have the compute to do it. So because of that, there's still gonna be use cases for things that can run today that are also quite good and tons of verticals. Another thing is, I think it comes down to experience, user experience. If a chat interface isn't gonna be the experience for every single AI app, right?
[00:14:12]
In my app, we actually do have a chat experience now, because It's mimicking what you would have as an executive assistant. And you would actually chat with an executive assistant, so it's kinda of works there. But me personally, I don't think chatting is a good interface for chatting to get Figma designs.
[00:14:29]
That feels kind of weird to me. It's like I don't wanna chat to make designs. It doesn't feel creative. So I think, because of that, people are still gonna be able to make really unique experiences for very niche problems that no one, like OpenAI or Anthropic is ever interested in solving, because it's too specific for them.
[00:14:51]
It's like, I don't care about solving that, we have an AGI. So as we're more of a platform, we wanna empower the people who are making these products that can go make these unique experiences, because then we win, right? So, yeah, I'm really not concerned about them doing any of that, because at the end of the day, there's a GPT store right now and it has tons of functionality in there.
[00:15:11]
And people still don't really use it for a lot of this stuff. It's also kinda like Zapier. Zapier can kinda do a lot of stuff right now. You can even talk to OpenAI, but I don't see people just throwing away every single app they ever had and just going to Zapier and making all these apps that can automate stuff.
[00:15:26]
Because it's like, yeah, even though it can do, setting it up is kind of difficult, and I don't like the experience. So yeah, I think there's still lots of opportunity there, if you think about going after something vertical and niche.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops