Build an AI Agent from Scratch

Improving Agents Q&A

Scott Moss

Scott Moss

Superfilter AI
Build an AI Agent from Scratch

Check out a free preview of the full Build an AI Agent from Scratch course

The "Improving Agents Q&A" Lesson is part of the full, Build an AI Agent from Scratch course featured in this preview video. Here's what you'd learn in this lesson:

Scott answers student questions regarding recommended tools and resources, including Hugging Face and Brain Trust, and discusses the challenges and considerations when using AI services from providers like OpenAI, Azure, and AWS. The lesson concludes with a brief discussion on fine-tuning models and updating them as needed.

Preview
Close

Transcript from the "Improving Agents Q&A" Lesson

[00:00:00]
>> Scott Moss: Any questions on creating an agent from scratch? Because I'm gonna tell you, everything that I just taught you today is literally the foundation of what we're using in production right now. And what I can promise you what, other tools with hundreds of thousands, if not millions of users are also using in production right now.

[00:00:18]
It's the same foundation, and everything else that they have in it, it's just like guard rails, performance, things like that. But this foundation, this is really what people start with, and it's just more of the same, so you really do have an example. You've went through it, you've seen how people build with this stuff today.

[00:00:35]
Because let's be fair, there aren't any experts today. [LAUGH] There's no such thing as an expert. Everyone is just figuring this stuff out as they go. I get on Twitter and there's this person on Twitter who's like a day ahead of me on something. And that person's on the edge, and I'm like, okay, nobody knows this, we're all just floundering, right?

[00:00:52]
So this stuff's just moving really, really quickly, more quickly than we can keep up with it. So if you're feeling overwhelmed by it, so am I, so is everyone else. But it doesn't mean you can't use it today, it doesn't mean you can't get value from it today.

[00:01:05]
So trust me when I tell you what you have here is literally what everyone is doing right now, yes?
>> Speaker 1: Where do you like to go to stand the cutting edge? Is that Twitter usually?
>> Scott Moss: Twitter is really good, but only if you're following the right people. So if you're not following the right people, I would say Hugging Face is great, I love Hugging Face.

[00:01:22]
It's like the Github of AI models, I highly recommend going to Hugging Face and playing around with some of these models, they're super cool. They're really easy to get started with, some of them, just like APIs, you can just use them, super great. That and Hacker News and a bunch of random Discords and Reddits is pretty much where I stay up to date.

[00:01:43]
Yeah, Mark?
>> Mark: Is Google's AI worth looking at or just go all in on OpenAI?
>> Scott Moss: Gemini.
>> Mark: Or even Llama?
>> Scott Moss: Do not use Llama unless you are LLM ops person and you know how to fine tune and train and you got tons of data and cutting costs while increasing performance is your goal.

[00:02:09]
If that's not your goal, if you just wanna build an app that use AI, you should not be using Llama. Llama is the end goal of a company that is successful and wants to save money and keep the same performance. It's not something you should start with, unless your product is offline.

[00:02:24]
We only run on people's computers and our customers are only people who have extreme GPUs, and yeah, I guess, go do that. But no, you probably won't do that, because then you got to find out where you're gonna host this thing and like GPU infrastructure right now is crazy.

[00:02:42]
Google's Gemini thing is actually really good, I'll admit it is really good. The problem with it is that it's so hard to set up, it's so hard to set up Google's APIs. You got to go do this, you got to set this up, you got to make an account.

[00:02:53]
It takes 30 minutes to set up an account with Google to start using their APIs, whereas, everything else is like five minutes or less.
>> Mark: Yeah, I saw levels on Twitter.
>> Scott Moss: You talked about this?
>> Mark: Yeah, today, he was complaining about it.
>> Scott Moss: Wow, [LAUGH]
>> Mark: And then Tyler on chat says that Hugging Face even has some great tutorials that he's been following and yeah, cool.

[00:03:16]
>> Scott Moss: Yes, Hugging Face has a great tutorial, so there's this really cool thing. We talked about transformers, right? So check this out, transformers js, ooh, is this the one? Yes, yes, yes, so there's this really cool thing, and is this the one? Hold on, okay, yeah, so this is really cool thing of Python called transformers.

[00:03:43]
The best way I can think about is, it's like a framework for combining models together. They have it in JavaScript now, so there's some really cool models out there that can do some really interesting stuff. And now you can just use all that same stuff that you could in Python in JavaScript using transformers, so I highly recommend checking out transformers.

[00:04:02]
I actually use this a lot on my gaming computer to generate images. I train images on, my friends faces, my cat, all types of stuff, and I just generate stuff for fun. So transformers kind of makes that possible, yes?
>> Mark: Do you recommend any tools for doing evals?

[00:04:21]
>> Scott Moss: I'm going to get into that in the next course, we will be doing evals. But as a preview to that, I do use Braintrust. Personally, I use Braintrust data, it's not something you should use for your personal products, because it's super expensive. Yeah, I use them to evaluate and log and do everything.

[00:04:39]
But honestly, you don't even need. This is just a great place for a dashboard and collecting your data, all the real value you can get just from an open source tool and just logging it yourself and keeping track of it in your own files. This is just like a UI for it to be honest, but it is valuable, don't get me wrong.

[00:04:57]
And there's tons of different eval stuff and I'll cover a lot of them. It's just that right now a lot of them are very Python heavy, it's really hard to get into some of the JavaScript stuff. But I like Braintrust because the founder is really cool, and they're TypeScript shops.

[00:05:13]
They really love TypeScripts, they always like do that. And then the really cool thing is, we go to their blog they have, or maybe it's their docs, let's see. Yeah, here we go, cookbook, this one's really cool, so they have a cookbook on how to evaluate certain systems.

[00:05:30]
And there's this one that they have in here from Zapier, that's really cool, of how Zapier evaluates its tool usage in their chat box and TypeScript. And it literally walks you through how they evaluate whether or not the right tools are being called in their chat system, and then how they can improve it, and it's really cool.

[00:05:48]
I actually use most of this in my eval framework, so these cookbooks have been great because I'm not a data scientist, it's really hard for me to understand. I don't even know what metric I should be picking, there's so many different types of metrics that are very scientific, and I don't know what any of this stuff means.

[00:06:05]
So this stuff really is very helpful.
>> Speaker 2: So back to Llama, and one of the use case scenarios that I've run into is where company doesn't necessarily want to be using an AI service because of privacy of their information or regulatory type things, or government agencies. What kind of alternatives?

[00:06:36]
I mean, what's your take on that, and how do you?
>> Scott Moss: I don't blame them, I don't want your OpenAI, on my stuff. So yeah, I think there's a lot of levels to it. I think the first level is, if you use something like OpenAI or anthropic, a lot of them have clauses where you can say, a no train clause.

[00:06:52]
Well, first of all, I think by default, they don't train on anything that's sent in by the API. They only train on things that are sent in through like ChatGPT or so they say, but you can also work with OpenAI to be like, don't train any of my stuff, and they should oblige to that.

[00:07:07]
If you don't feel comfortable with that, or you just want something that's like more underneath on your infrastructure, OpenAI partnered with Azure. So you can get OpenAI, you can get a lot of their models on Azure, or anthropic partner with AWS, so you can get sonnet 3.5 which, in my opinion, is really great on AWS, on your own infrastructure.

[00:07:27]
So you can have that as well.
>> Speaker 2: Still public cloud infrastructure, though, right?
>> Scott Moss: Yeah, I mean, it's still on AWS, you could put it in a VPC, you're not gonna have it on your own rack. Yeah, they're not gonna give you the file for it, yeah, so that's like that, and then you move further down, then it's like, yeah.

[00:07:42]
At that point, you do need to do, a Llama or a Mistral, or whatever else is out there, and you're gonna need fine tuning. You're gonna need a lot of fine tuning to get it to where you want it to be, which means you're gonna need lots of data.

[00:07:56]
So that's why I think what you wanna do is you wanna start with one of the other suggestions, collect the data that you need. Once you reach a threshold, use that data to go train Llama, and then now you can throw Llama on whatever, hardware, whatever racks that you have.

[00:08:14]
If you have the compute for it, then you can do it there, but, yeah, that's what I would recommend.
>> Mark: Have you used AI services by AWS?
>> Scott Moss: I have, yeah, we have sonnet 3.5 on AWS, and honestly, it's actually pretty good, I'm not gonna lie. I don't really like AWS, it's kind of confusing and weird UI, but they have a lot of really good tools on there, and they have lots and lots of models.

[00:08:44]
Pretty much every model that isn't OpenAI is on AWS, and they have prompt management tools, they have evaluation frameworks. They have a lot of stuff on AWS to help you, and it's just good enough. So, yeah, I haven't had any issues with it, I think it's pretty good, yep.

[00:09:01]
>> Mark: There's questions around fine tuning a model and then continuing to update the model. Do you have any experience?
>> Scott Moss: I do, I have experience fine tuning models, LLMs and diffusion models. So if you think of a model, it's a foundational model, it's trained. Fine tuning is like just training the last mile, the last 2% of that model.

[00:09:31]
And typically, you would fine tune if you wanted the output to look a certain way. It's less about accuracy, it's more like I wanna save tokens, and I want the output to always look a certain way or instance, an example would be, if we want our AI to respond like an executive system would respond.

[00:09:48]
I would go get a bunch of inputs and example outputs, or I get a bunch of inputs and I would go hire an army of executive assistants and say, respond to this the way that you would respond. And then I would take that and I would fine tune a model like this is how you respond in this vocabulary, this type of stuff.

[00:10:07]
So it feels and sounds a certain way, and that's really the only way to get to what I call SME level of a of an AI. Cuz right now, OpenAI is just so generic, you can just tell when someone generates something with OpenAI or any API. But if it's fine tuned, then it kind of becomes really blurry, cuz it's very specific if it was trained well.

[00:10:29]
So fine tuning is mostly for that, and then, updating the model as you go, fine tuning is really expensive. So there's not something that you would just be doing all the time. There really is no need for, there's just like you need to give it access to more relevant data.

[00:10:46]
That's what RAG is for, that's what system prompts are for. If you need to update it because the weights have changed, because the output that you're expecting is different than when you originally fine tune it, then, yeah, I guess that's fine. But if you're fine tuning constantly all the time, I would imagine you need to rethink your architecture to where you don't need to be doing that and probably rely on something else, like RAG or something.

Learn Straight from the Experts Who Shape the Modern Web

  • In-depth Courses
  • Industry Leading Experts
  • Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now