AI Agents Fundamentals, v2

Shell & Code Execution

Scott Moss
Netflix
AI Agents Fundamentals, v2

Lesson Description

The "Shell & Code Execution" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott explains giving an AGI agent terminal access, allowing it to run commands, install packages, and perform tasks efficiently. He also highlights safety considerations, emphasizing supervision to prevent unintended actions.

Preview

Transcript from the "Shell & Code Execution" Lesson

[00:00:00]
>> Scott Moss:So, shell tool and code execution. So I really believe, and I've said this a few times, that the best thing you could do for an agent is give access to a computer and what other tool next to the file system and the internet would an agent have that in its tool, in its toolbox to enable it to do powerful things than like being able to interact with the terminal, right? Like we as engineers, we can do powerful things in the terminal.

[00:00:23]
Like I live in the terminal. I open up apps from the terminal, like I just, there's so much stuff that I can do in it, it's great. Imagine you gave that to an agent. It's even more powerful, right? How many times have you been like, oh, what's that one command and you like ask some LLM what that command is and it gives it to you, you can paste in your terminal. What if you could just do that itself?

[00:00:43]
It could make files, delete files and stuff. Cool, we have that. You can do a web search, cool, we have that. What if it can run CLIs? What if it can interact with binaries, you know, like all different types of stuff like that, like it's quite powerful. What if it can help you set up your, you know, your bash scripts, your profile and make it look cool, you know, actually I use an LLM to help me set up to set this up, right?

[00:01:08]
Like this is what I have here. I use an LLM to help me set this up. So like, it's quite powerful and its ability to do that, so I think the power of computer access, as strong as it is for us engineers to wield that power, it's equally strong, if not stronger for an LLM to do that. So to complete that arc, I want to give our AGI agents access to the terminal, so it can do commands like cat, grep, ls, all these really cool things, install packages for you, rebuild, test, run tests, start servers, run lint commands, all these different things that grunt work that we don't want to do, we can just have this thing do it.

[00:01:48]
You wouldn't have a sophisticated coding agent without shell access. Sure, I know the file system is the cake there when it comes to coding agents, but like being able to run commands is also equally great as well because then it can complete a loop. It can make changes for you, then it can run a command to lint those changes to see if they have linting errors, then it can run a command to start the server to see if it starts without errors, and if it does, it can see those errors and make the changes all without you having to ask it every single time.

[00:02:18]
It's pretty impressive to be able to do that, so that's why I want to give it access to it. Talked about self-verification, and then like, yeah, once if you really think about it on an atomic level, file system, terminal access and the internet, you can extend that out infinitely. Everything can possibly, anything that could possibly be done on the internet in like a virtual sense, not a physical sense, like you're not going to be able to make robots, but like virtually you'll be able to do with those three tools because we could as engineers, so an LLM should definitely, you know, let's just say it's like really great LLM, it's just perfect, it gets things right all the time with these tools, it should be able to pretty much do everything and a terminal can pretty much do everything, right?

[00:03:09]
So this is why I wanted to give it that, so we have two approaches for giving it that power. We have, you know, the terminal approach where we just give it a tool where it can execute terminal commands, we're going to do that. Another approach is like you can give it the ability to execute its own code, which is very similar but not quite because I guess there's nothing stopping the LLM from writing like bash commands or, you know, whatever flavor you have of a terminal and executing that as well.

[00:03:40]
They're very similar. Or we can do both. So we're definitely going to do the shell command. If we get to it, we'll do the code execution ones. I think it's a great exercise. And yeah, I think that would be really, really cool. Obviously when you get into some of these powerful tools, just like the file system when it comes to like deleting and even reading personal sensitive things, giving an LLM access to your computer on a terminal level is insane.

[00:04:06]
Like it can do some really crazy stuff. Like, imagine all the crazy commands that you have at your disposal that are destructive, like drop table because for some reason you have the URL to your production database as an environment variable on your machine. I don't know why you do that, but you do and it drops the table or rm -rf it just completely deletes the directory. Hopefully you can go recover it from the trash can or you have Git.

[00:04:34]
You know, it can get stuck in infinite loops and, you know, completely destroy your computer. It can take the sensitive information that might be in your bash profile, like environment variables and things like that, and send them up to whatever server that you're hosting that you're running the LLM from. So those things get exposed somehow and might show up in someone's responses because it's being trained on.

[00:04:57]
Who knows? And then you just don't, you know, imagine that thing had access to sudo on a Mac, which is like root access, administrative access on your computer to run very sensitive commands and protected resources like hidden folders and things that modify paths that should be modified. It can get pretty terrifying. And we'll talk about how we can solve those things the best way possible in the next lesson, but this lesson, I do want to bring up the safety considerations of the power of this, right?

[00:05:27]
It's kind of like self-driving car, if you ever like got in a self-driving car, you can't actually sit in the back seat and have it drive. I mean, there are Waymos and things like that, but like the ones that you can buy for yourself, you still have to be in the passenger, you still have to be in the driver's seat and you still have to like pay attention to the wheel, they'll like nudge you, like, hey, are you looking?

[00:05:44]
You know, there's cameras in the cabin. It's like this, just because it can do it, that doesn't mean you should look away. You should probably be watching this thing to make sure it's not doing it for now. We'll introduce different ways of how you can get past that, but for now, you probably need to watch it, even though it's doing the driving. You're not entirely out of the car, you're still a passenger, and you're still in control, you just relinquished it for a little bit.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now