Check out a free preview of the full Build an AI Agent from Scratch course
The "Generate Dahl-E Image" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:
Scott demonstrates how to define the image generation tool, set up the prompts parameter, and guide the AI in generating the prompt. He also explains how to export the tool and integrate it into the tool runner.
Transcript from the "Generate Dahl-E Image" Lesson
[00:00:00]
>> Scott Moss: But yeah, we got Reddit now, so that's good. Let's generate image.
>> Scott Moss: I'm just gonna copy these things up here.
>> Scott Moss: There we go. We're gonna import OpenAI.
>> Scott Moss: OpenAI from AI, make our definition. We'll say, generate image. This one will have an argument, it's gonna be the prompt.
[00:00:40]
>> Scott Moss: So, where do you think this prompt is gonna come from? Do you think the user has to type in this prompt? [INAUDIBLE]
>> Scott Moss: You said what?
>> Speaker 2: The LLM will generate it based on the user's initial prompt.
>> Scott Moss: Exactly, the user is gonna say something that, like, some condition that makes AI think it needs to call the generate_prompt or generate_image tool.
[00:01:07]
And then, because it has a prompt parameter that's required, the AI now has to figure out how to make a prompt. So this is where you can guide it. You can be like, prompt for the image, be sure to consider the user's, got to do back ticks.
>> Scott Moss: Be sure to consider the user's original message when making the prompt.
[00:01:49]
If you are unsure, then ask the user to provide more details. You can say whatever the hell you want here, it doesn't matter.
>> Scott Moss: Okay, we got that. We got our type here. We're gonna infer the generate_image tool definition. And then we're going to export const,
>> Scott Moss: Generate image off of our tool function.
[00:02:39]
>> Scott Moss: That's a lot of work, should have said tab. So then we have our tool args, and these tool args are gonna be type-checked to fit these. Now I got tool args.prompt, so I can do that. I also have the original user message here if I wanted to do something here.
[00:02:57]
Maybe, I mean, there's so many things. Maybe I could go and I could add another thing here of score. And it tells me, how confident do you think the prompt that you made is what the user wanted? And if it's not over a certain score, then I can go down here with this user message and call another LLM, whose job is to make a really good prompt for image generator.
[00:03:22]
Get that thing and then pass it to, like, there's literally so many things you can do. You can do whatever the hell you want, it's just magic. But I'm not gonna do that. So now we're just going to call OpenAI. That was pretty simple, you just call the openai.images.generate, assuming you're using OpenAI.
[00:03:41]
If you're not gonna use this, again, you can try a different one, or replicate, or just not, just return anything here, it doesn't matter. You just won't be able to use it, it's totally fine. But I'm gonna use this, and I'm gonna say, const response
>> Scott Moss: Wait, openai., what is it?
[00:04:04]
Images.generate. Model, dall-e-3, and there we go. We have dall-e-3, the prompt is whatever the AI said the prompt was. N is how many images you want, just generate one, your wallet will thank you. [LAUGH] And the size, I forgot what the other dimensions is, but you can make this smaller, I'm sure it'll cost less.
[00:04:31]
So we're gonna do that. And then the URL here, this will be the URL to a hosted image that it has for us, so we can take a look at that. The next thing is, I'm going to make a new file here, call it index.ts, and I'm just gonna import all those tools and just export them, that did that so quick.
[00:04:50]
That's kind of crazy. So this is just, so I can just pass around my tools as an array, versus one by one, since the agent and the LLM both take an array of tools, it just makes sense to do that. So I'll do that, but not before going back to the tool runner and getting rid of this garbage, we don't need this.
[00:05:08]
We also don't need this, so I'm gonna get rid of that. And then I'm gonna import all the tools one by one. So I'm gonna import generateImage from tools, generateImage, there we go. And get those two like that. Why did you do that? Go away. So now we have that, and then we can go down here and add our switch statement.
[00:05:40]
So I can say, case, generateImage.name. It just did it.
>> Scott Moss: So if the tool call is whatever the name of this is, call the generateImage function. If it's the name of whatever this Reddit one is, call the red one. No, no, this is wrong. I need to get the definition.
[00:06:06]
So, there we go, get the generateImage, and this is hallucination at its best. It almost just killed our app. So, redditToolDefinition, and then the dadJokeToolDefinition. There we go. So now I can say, generateImageToolDefinition.name, and then redditToolDefinition.name, and then dadJokeToolDefinition.name.
>> Scott Moss: And then throw an error here. I could also just like, let's not throw an error in SSA.
[00:06:44]
>> Scott Moss: Never run this tool again, or else.
>> Scott Moss: All right, you really gotta let them know. Hold on. I want this to be in the message chat so it can look back and reflect.
>> Scott Moss: Cool, so I'll tell the AI, stop doing that.
>> Scott Moss: Okay, so we got that, the only thing we gotta do now is, I think in the index is just passing our tool.
[00:07:17]
So, get rid of this. You get rid of that.
>> Scott Moss: So I say tools here, and then, import tools from tools, and that's it.
>> Speaker 2: Should that be the tools or the tool definitions? We already fixed that.
>> Scott Moss: What did I put in here? Yeah, this would have killed us.
[00:07:37]
I need the tool definitions. So, generate tool definition, redditToolDefinition, dadJokeToolDefinition.
>> Scott Moss: And then fix those. So nice.
Learn Straight from the Experts Who Shape the Modern Web
- In-depth Courses
- Industry Leading Experts
- Learning Paths
- Live Interactive Workshops