Check out a free preview of the full Build an AI Agent from Scratch course
The "AI Agents Overview" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott provides an overview of the course project which is an agent that can read from Reddit, generate images, and provide random dad jokes. He also discusses different types of agents, such as conversational agents and transactional agents, and provides examples of their use cases.
Transcript from the "AI Agents Overview" Lesson
[00:00:00]
>> Scott Moss: So I'm gonna check out to Main. You don't have to do this, you can just watch me do this, this part here. So again, like I said, we're building an agent. This agent is kinda cool. It has the ability to read the front page of Reddit, generate images and get random dad jokes.
[00:00:17]
So [LAUGH] I don't know, a combination of those three things you can do some pretty wacky stuff. I experimented with it the other day, and the results that I got from asking it yourself, I was just like, what the hell? Why would you make that? It was just crazy, it was crazy.
[00:00:32]
So cuz I mean, throw Reddit in there and it's just like, you don't know what you're gonna get on Reddit, and then you'll see. So the way you use it is, so I have a command here in the package JSON. It should just be start. So you can just do npm start like this, and then you can pass in a message like this.
[00:00:55]
So I'm gonna say show me a dad joke with an image. So I'll do this. You can see that the assistant is thinking it ran the dad joke function. It's now generating an image [LAUGH]. I don't know what we're gonna get. I don't know what dad joke you got, but it's gonna try to make an image for that dad joke, and then it'll give it to us and then we can go look at it.
[00:01:19]
And then we can keep asking questions. So, here's a joke that I got. I used to work in a shoe recycling shop. It was soul destroying [LAUGH] okay? And then here's the image that it generated. [LAUGH] There it is, right? So it's like, if you throw Reddit in there, it's gonna get real crazy real quick.
[00:01:41]
I'm almost hesitant to even do it. I did it one time, I was like, okay, maybe not that [LAUGH]. But as you can see, it can get pretty cool. So, that's what we're gonna be building and then, and you can continue to ask it questions, right? It has memory and knows your whole conversation.
[00:01:57]
So you can be like, okay. What does that joke mean, right? If I literally didn't know what that joke meant, I can just ask it and then it'll be like, okay, cool, idiot. There's a play on words here for soul and soul, that's what the joke is about and it's trying to tell me that.
[00:02:16]
So our agent, it has that power to be able to connect to those things and then have a memory to remember our conversation and do stuff. So that's what you're gonna be building today. There's a lot that goes into this, from chat history, memory running on a loop, executing tools, combining results.
[00:02:36]
There's just so much that kind of goes into it. But once you have the foundation, it'd be really cool. So couple examples of I see of like, agents in the wild. First, you gotta think about two different types of agents, there's conversational agents, which we're gonna be building, and then there's transactional agents, agents that are just like, one off things.
[00:02:55]
Conversational agents, really good use cases for those are the number one thing that I see the most is like. You go on a website and you see like the intercom bubble pop up or like some chat thing pop up. Everybody has AI thing there now, right? Because I can't remember like, I don't know, maybe like 2014 you'd go to someone's website and intercom thing pops up and it's like support teams like offline right now.
[00:03:17]
Okay, not with AI. Their support team is on all the time. So they can take this agent, they can train it on all their customer support documentation, all their product documentation, and then you can just chat with it. So I see a lot of that and that's not just on a website in a chat interface, but it's also on a phone, in a voice interface.
[00:03:38]
It's also over email where you're emailing an agent. Any way that you can communicate via customer service, I'm starting to see it being outsourced to AI. So that's probably the number one use case for agents that are chat based. But you also see other things when it comes to what I call vertical agents, so taking an agent, making it work for this, just this one use case, like a common one was like, talk to this PDF, or talk to this spreadsheet.
[00:04:07]
You can take any document that has data in it, or even a database, and be like, I wanna talk to this database. I wanna talk to this spreadsheet. That's a very common use case for like an agent that's chat based, because it has access to that data source and it can pull from it to answer questions for you.
[00:04:23]
And if you wrote the other side of it, it can write to it as well, which is, I don't know, if you want your agent writing SQL for you, go ahead. Yes, it can do that as well. There's literally nothing it can't do. So if you tell it to push the nuke button, it will push the nuke button.
[00:04:37]
So, don't tell it to push the nuke button. So you see a lot of agents in that way. Then you have like the transactional agents. Those are just like what you would get in a text editor, like a coding thing, right? So if I go in here and I asked this to be like a, make a function that adds two numbers.
[00:05:01]
It's a transactional thing. It does allow me to do like follow up instructions and stuff like that, but it's not gonna remember this. It's not saving that anywhere. It's transactional. So you see a lot of that. It's just like one off task, another good one is like the tools that are, go from design to code or something like that.
[00:05:20]
Those are very transactional. So you see those a lot as well. I would say a lot of those agents are hidden where they're like, the AI is invisible. Like you don't know that it's doing something for you in a background. You press this button in an email client and it organized something for you or like, it's more like the invisible compute.
[00:05:38]
So I see a lot of those things kind of hidden from the user. They feel more magical. You just just experience it. Whereas the chat ones are literally in a chat interface. It's in the chat app. It's in your text messages. It's in Slack. It's in Discord. It's in a pop up bubble on your website.
[00:05:52]
So yeah, tons of use cases for agents, as you can see. Anything you can do as an engineer, that you can write code for ahead of time and set it, the AI can do that dynamically. So what I'm trying to describe is like, if you're gonna write a code that hits the Stripe API that always goes to your Stripe account and gets this information.
[00:06:15]
You can get the AI the same tools to do that, and then it can decide when it should and how it should do that in a dynamic way, whereas like you couldn't do that because you would have to write it ahead of time and it can never change.
[00:06:31]
Whereas the AI could change it on the fly. It can change the parameters of it. It's like, imagine if you took an engineer and you deployed the engineer to AWS [LAUGH]. And then you sent the engineer request, and the engineer did all the work every time you sent it a request on the fly, versus there's always some code that's written ahead of time that's running the same way every time.
[00:06:51]
So the downside of that is that it's non-deterministic, so you'll run the same thing and get back a different answer. Which is something we've never had to deal with in coding before, but now you do. So there's a lot of interesting problems there now, yes?
>> Speaker 2: What's the difference between an agent and an assistant?
[00:07:09]
>> Scott Moss: The difference between an agent and an assistant is an assistant can be a type of agent. But not all agents are assistants. I guess the way that I think about it is an assistant is like, I think if an assistant is just like an LLM in a chat interface, you can chat with it.
[00:07:24]
It has memory, you can chat with it, that's it. Agent is something to me that, and I was gonna get into it later, but essentially it's something that can run on a loop. It has access to tools and they can run on a loop. Not every assistant has access to tools and can run on a loop.
[00:07:40]
Some assistants. If you remember ChatGPT or GPT 3.5 or ChatGPT, it didn't have tool calling or anything. You just chatted with it. It couldn't access the outside world or anything like that. That would be an assistant, in my opinion. It can help you with things, but very limited up to its training data.
[00:07:56]
If you try to ask it something outside of its training cutoff data, it'd be like I don't know that. I've never seen that before, my training date was this summer of whatever. But as soon as you add functions to it where it can think, run a function, get an answer, decide to run another function, maybe get an answer, ask you a question.
[00:08:14]
Then you're starting getting into like an agent, something that's running on a loop, and it's like thinking on what to do. An assistant can be an agent, but not all agents are assistants, because you can also run an agent not in a chat interface. And then at that point, it's not really an assistant to me, it's just a transactional agent.
[00:08:34]
That's my definition, that is not what the internet says. These things are used interchangeably, for sure.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops