Lesson Description
The "Sandboxed Execution" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:
Scott explains sandboxing to run code safely, covering methods like VMs, Docker, and services like Daytona. He demonstrates creating a shell command tool in JavaScript and tests it by running commands through the AI.
Transcript from the "Sandboxed Execution" Lesson
[00:00:00]
>> Scott Moss:In reality, what you would probably want to do is something called sandboxing, which is exactly what it sounds like. You run code or you create some virtual environment that the LLM can do whatever it wants without, you know, potentially destroying your own machine or executing malicious code. You can do that, and there's many, many, many different ways to do that.
[00:00:23]
I mean, you got VMs, you have Docker, which is a VM, you have V8 isolates, you have, you know, the sandboxing in different languages. I don't know it has sandboxing. There's so many things you can do network isolation, like for example running this and like, Deno where by default you don't have access to a lot of these things unless you turn them on.
[00:00:43]
That's a sandbox, right? There's many different things and ways you can accomplish this. There are services like Daytona that allow you to execute code and run things in a sandbox with a simple API call or, you know, need to be. Sandbox is another one that is very similar to that, and then recently, oh, Cloudflare has their sandbox SDK.
[00:01:13]
All three of those sites look so familiar, I just realized that, they're so similar. You can just get a sandbox and you can just give it commands, you can do a git checkout, you can do npm test all programmatically, right? Like you can pass this to the agent so it can do things. So like, we've never been doing things like this before, like, what is this?
[00:01:34]
This is insane. But if you pass this to the agent, let the agent do it, it's controlling your computer. It's really powerful. So, there are ways to protect yourself. And of course we're just gonna run on our computer cause we don't care, but be safe. Make sure it doesn't do anything you don't want it to do.
[00:01:51]
All righty, let's make this shell command then. It's gonna be pretty simple. Cause we're gonna use a library that's gonna do the heavy lifting. So we don't have to worry about it, going to check out branches. There we go. OK. We're just gonna make a new tool in the tools, we're gonna call it shell. I mean, you can also call it whatever you want, it doesn't matter.
[00:02:27]
And we're gonna import, you know, the things that we normally import for a tool from AI. We're gonna get the tool from Zod, we're gonna get Z for Zod. And then this new one we're gonna import called Shell. This is from Shell, a package called ShellJS. If you've ever heard of ShellJS, it's exactly what it sounds like.
[00:02:48]
It allows you to execute shell commands in JavaScript. That's it. That's what it does. It's pretty powerful. I've been using it for like almost a decade now, and we're going to have a tool called Run commands because it takes a shell command and it runs it. So the name of the tool is called Run command.
[00:03:13]
Again, you can call these whatever you want. So we have a run command tool. The description is execute a shell command and return its output. Use this for system operations, running scripts, or interacting with the OS. Right. Got our input schema, which is a Z object. Which just has a, the command that you want to run, which will be a Zdo string.
[00:04:11]
Describe. This is the shell command to execute. I see. Ki. There we go. And of course, the execute function. Which takes in our command. Here. Here we go. And the way we're gonna do this is pretty simple. We just say results, we say await shell or shell.exec. And exec takes in a string of the command that you would want to run, right?
[00:04:45]
And I guess I need to spell command correctly? There we go. Takes the command that you want to run. And then we want to say silent truth so it doesn't log anything. We'll do that. And then we just need to collect all the output and then return it so the agent can see the output of whatever command it ran cause not, you know, all commands are gonna have outputs, but some do.
[00:05:15]
So we'll say outputs, it's just gonna be a string, and then we'll say if results.stdout. Then what we want to do is say output plus equals result.stdout. If there was an error then we want to obviously add that. And if it died, as in the result dot code. It's literally non-zero, so on the Mac, it's non-zero, I guess Linux too.
[00:06:09]
Like if it exit with a 1 or anything non-zero really, then there was an error, so we'll say return command failed, exit code result code. And then what we can do, let's just put the original, uh, output from that command, so you get a more detailed description to the LLM, otherwise we can return the output.
[00:06:40]
If there was no outputs, this command just didn't have any output, but we still need to let it know that like we completed the or command completed successfully. And then just say, no, I'll put. Cool. Oh, wait, I guess it doesn't need to wait, never mind. There we go, it's, I forgot it's synchronous. Super simple.
[00:07:18]
It's a lot simpler than you would think. So we have our shell command, we want to add this to our tools, so let's import that. Run command from shell. And our run command here. And then I'll just make a new, uh, I'll make a new export here that we aren't using, but we could just call it like, uh, terminal tools, whatever you wanna call it.
[00:07:55]
We're not using these, but like I said, it's really great just to have them separated out like that, run command, and then we'll export it singularly. There we go, boom. That should now be in our runner, because we import all the tools and it's in there. And now I'm going to ask our AGI if it has the ability to do stuff like that and see what happens.
[00:08:49]
OK. Can you access my terminal and run commands? Yes or no. All right, I suggest exact commands, ah, it's very specific, look at that, it's like, no, I actually can't run them, but I can give you a command that you could run. And cuz I'm not actually running them, so let's see, actually, I'm gonna say run, run git, uh, status and tell me what it's correct.
[00:09:41]
Wait. Run, git status and tell me what you see. Mm, no, I want you to run it. OK, why is this thing not running? Let's see. Did I give you the commands? We have shell here. Execute a shell. I'm gonna say execute shell or terminal command and return its output. Let's try that. When you create a tool, does the AI have the ability to look at the code you have for execution, or is it relying entirely on your description?
[00:10:27]
It has no idea. It doesn't even see this. Doesn't even get it, yeah. That's a good question though. Let's see, Ron. Tools are here. To the hair, to the hair. That looks good. To, uh, that looks fine. OK, so, yeah, I don't know. Oh, I didn't see it. Let me build that again. Yeah. Can you run terminal or shell commands?
[00:11:25]
Yes, I can. I don't know why it said no last time. Did I forget to save? I don't know. OK, run git status and tell me what you see. Do not run any other command, or I will fire you. Run command, I did it, ran a git status and there it is. There's a git status. Quite powerful. The shell command, you damn near have an AGI now, web search, shell command, file system.
[00:00:00]
You do some pretty crazy stuff. I mean, this would be 3 years ago, this would be a bleeding edge coding agent, like. If you did nothing else and released this 3 years ago, you would have $20 million in funding. Like it's, you would be swimming in bucks, for sure.
Learn Straight from the Experts Who Shape the Modern Web
- 250+In-depth Courses
- Industry Leading Experts
- 24Learning Paths
- Live Interactive Workshops