AI Agents Fundamentals, v2

Use Cases for File System Tools

Scott Moss
Netflix
AI Agents Fundamentals, v2

Lesson Description

The "Use Cases for File System Tools" Lesson is part of the full, AI Agents Fundamentals, v2 course featured in this preview video. Here's what you'd learn in this lesson:

Scott explores implementing file system tools, emphasizing responsible design, error handling, and their role in enabling agents to write code, manage data, and store state. He highlights use cases like agent memory, context loading, communication, and tool output storage to enhance agent capabilities.

Preview

Transcript from the "Use Cases for File System Tools" Lesson

[00:00:00]
>> Scott Moss:All right, welcome back. So we did our multi-turn evals. The next thing we want to do is actually implement those tools that we've been mocking out. So we have our file system tools, we have a web search that we want to implement. Well, we won't implement that, but I'll show you how we can get a tool from another source, which also includes some context management, and then we have our shell tool.

[00:00:22]
OK, so files and tools. This has very little to do with agents, but somewhat, like tool design is mostly not unique to agents. They're just functions that do things. You can think of them as units that do a thing, but you do need to think about, you have to be responsible in your thought about what you return and how long something might take and the different error capability, the error handling, because whatever you return, that's what the agent is going to see.

[00:00:57]
So you have to be responsible with that influence. You're influencing the agent, so you want to make sure you properly influence it with the appropriate things. So that's the difference between, you know, just writing a regular function versus writing a tool that an agent is going to use. So, you know, whatever the result of this is, it's meant for the agent to see. It's not meant for some other code to take as an input that you then do something else.

[00:01:22]
It's meant for an agent, some type of intelligence to parse and understand. So that's the only other caveat. Other than that, it's mostly the same as writing any other function. There's really nothing too unique here. There's some strategies and things that you can do to make it easier for you, but it's mostly the same. So, a lot of the stuff we do with these tools, so the file system tool stuff, it's mostly just what you would have made yourself outside of the context of agents, so it's not really too difficult.

[00:01:51]
But I do want to talk about the importance of why files matter for agents. In my opinion, giving an agent access to a file system is one of the most powerful things you can do to an agent. The most obviously powerful one is that it can help you write code, right? Like if a coding agent could not interact with the file system, it couldn't help you write code. That is the number one tool that Cursor or Claude Code or pick your favorite coding agent uses.

[00:02:18]
It's the file system, right? That is the number one tool that if you took away every single tool and it only had that, it could still help you write code. It could read your whole repo, it could edit your files, it could delete files, it could create files, right? If you took away everything else, it could still do that very well. So in my opinion, not even just coding agents, but I think all agents, if given access to a file system, whether that's a real file system, a virtual file system, they can do some pretty remarkable stuff.

[00:02:47]
And, you know, not all agents are computer-based agents that have access to a file system. Some of them sit on the server, you know, interact with the data, and that's it, and there's nothing wrong with that. But in my opinion, what I'm starting to learn is that agents themselves, if you give them a computer, which includes something like a file system, a terminal, internet, they can accomplish anything that an engineer would accomplish, and I'm not talking about like writing code.

[00:03:12]
We write code to solve a problem, we write scripts to automate things. OK, that's solving a problem. You're not just writing code to write code. So we are using agents right now to help us write code, that's great, but what if you let an agent write its own code to solve a problem, right? Then it becomes as powerful as you because you can solve a lot of problems with the code that you write. Whether an agent helped you do it or not.

[00:03:34]
At the end of the day, we're solving problems. So it's kind of deep to think about, but I truly believe the best thing you can do for an agent is give it a file system. So that's why I wanted to do that. So, some of the tools, you know, some of the use cases, you know, like reading source code, writing files. Configuration, writing outputs, managing data. Data is a big one. You want to do data analysis?

[00:03:59]
Take a CSV. Have this thing generate a, you know, give it a tool where you can generate images with I don't know, Banana from Gemini or something else, and it'll generate you a chart that you can put in a presentation or on a YouTube video for some B-roll, like whatever you want to do. It's quite useful. You can also use files as agent memory. You don't need a database to store what we call a scratch pad.

[00:04:23]
This would be like short-term memory, right? If you just want the agent to remember something as it's going through this process where it can write things down like, ah, I tried this, it didn't work, or here was the result of this, I think I might call it again. It's just something that it can look at as it's going through, like we've built the agent loop. Imagine if we added something else in there that it can write things down, it can write down its thoughts, it can write down something that would be helpful that we don't want to include in the system prompt or the messages array because it might blow up the context window as in we might run out of tokens or it'll slow things down or it'll cost too much money, we can put it in a file somewhere and also because it is in a file, whenever we do another run later, we can reference that file and pick up from that same memory, so it doesn't have to think about those things again.

[00:05:11]
It's already been thought of, right? That's just a file. You don't need a SQL database for that. You could just have this thing write some markdown to a file that it can reference later. That's what a Claude MD file is or a cursor rule is. It's just markdown files that an agent can look at to help it reference it later, no matter how many times you run it. It's perfect for that, right? So, agent memory is a great place for that, you know, storing state is also a good thing, you know, putting things like JSON and stuff like that, it's just like, you know, a cheap database essentially.

[00:05:44]
Scratch pad, you know, to-do lists, things like that is really good. Context loading is obviously one of the biggest ones when you have Cursor or Claude Code, whatever you're using. Read your code, that's context loading. It's like grabbing and reading all those files and understanding it, putting it in the context. The contents of those files are being returned from the read file tool. We know what that looks like now, it adds it to the array.

[00:06:11]
That's what context loading is, right? And we want to be efficient about that. That's just a read file that one simple tool allows the agent to understand everything about your codebase, just read file. Inter-agent communication, this is great. Imagine if you had I don't know. Let's say you have three agents working on your repo in the case of writing code. How would they know that another agent is working on something else and not to step on their toes, right?

[00:06:40]
You can use a file as a communication between that. Hey, every time you do a step, you know, put what you're doing in this file on every single step, and then every time you're about to do a step, read that file to make sure someone else isn't doing it. So if all the agents are writing to the file like this is what I'm working on, this is what I did work on, and another agent is reading that and also doing the same thing, it helps them collaborate.

[00:07:00]
It's like a live Google Doc that they can work in and collaborate together to not step on each other's toes. I actually use something like this a lot when I'm like, all right, I need a parallel, and I'm going to open up six tabs, so I'm going to do a bunch of sub-agents and call a code and have them just talk to this file and then I'll just sit there and look at the file and it's just like messages popping in like, oh I'm doing this, don't touch this, and you know, it's kind of crazy but it works really well.

[00:07:27]
Audit trails is another good one, tool output storage. This is great if you ran some tool that outputted a bunch of stuff like a web search tool, and you don't want that to just sit in your context window and be expensive and slow. Just write it to a file and then tell the agents like, hey, whenever you want to look at the results of that web search, you can just go look at that file. Versus having to keep it in the context window the whole time.

[00:07:47]
You can just look at it right now quickly and then that's where it'll be. And then yeah, files, that's configuration. Claude Code does a lot of this. To configure Claude Code, everything is just a file. It's usually just like either a JSON file or like if you want to add MCP to Claude Code, it's just a JSON file somewhere. You can tell Claude Code to go modify its own JSON to add the MCP for you. So if you want to install an MCP instead of figuring out what is the install command of this thing, just copy the name of it, maybe the GitHub URL.

[00:08:15]
I'm like, hey, Claude, install this on yourself. And it'll do it because it can edit its own file. It's great. You don't even have to open the file, right? You could do the same thing with something like Cursor, I believe, because the files themselves are just the configuration in which the agent exists. You can do the same thing with skills, like when I make, you know, custom agents and call code, I just say, hey, I want to make a sub-agent, by the way, here's a link to Anthropic's documentation on how to make sub-agents if you forgot, but here's a sub-agent I want to make.

[00:08:47]
Can you go make it? Ask questions if you need help. And it'll just make it and it'll put it in its file where it's supposed to go and I don't have to do anything. And that's even the fact that they have a slash agents command that will help you walk through and do it. No, I'll just have you do it. You do it. I don't want to do it. So, files as a configuration is very meta. There's tons of ways and methodologies of interacting with file systems.

[00:09:10]
We're not going to get into the best efficient ones I think you should do because that's just like a course on coming up with a good file API. I'm not going to sit here and do that. So we're just going to do these four easy ones. We're going to read a file, write a file, list the files, which is important so you can see where you're going, and then delete a file. That's it, we're going to implement these.

[00:09:39]
These are going to be very simple. Basic stuff, like any coding agent could one shot these in like two seconds, so. Implementation considerations, you know, it's mostly the inputs. How might an agent input something like a path? Is it relative? Is it absolute path? Is it, you know, traversal, whereas it might have like, you know, going up a directory. We have to come up with that policy because when we write our code, we either A have to enforce that you can only do it this way through prompt engineering or input validation or whatever we want to do, or we just have to, I don't know, we want it to be able to handle all this so our code just makes sure it can, it makes sure that we can handle that.

[00:10:20]
You've got to figure out what your policy is, but this is where I was saying earlier, these tools, you have to think about how to hint to the agent what to do. Like let's say for instance, we don't want to allow our agent to pass in paths to any one of these files that have like, you know, going up a directory. We don't want that. So what we'll do is we'll put that in a description somewhere. Maybe we'll put that in the path input description and then we'll also do an input validation where when we, before we run the tool, the very first line of the execute function, we'll check to see if the path has two dots and a slash or something like this, because we can check that with code, with the regex or just type, just, you know, duck checking, duck typing.

[00:11:05]
And then if it does, we'll immediately return back to the agent, hey, you cannot pass in these types of URLs. That's the string that we will return back from that tool. So when the agent sees that they're like, ah, my mistake, and then hopefully it'll call that tool again but with the correct path this time because the hint that we returned said, hey, stop doing that, we don't accept those types of paths, right?

[00:11:25]
So you've got to think about it like that. Whereas if you were just writing this function, not for an LLM, you wouldn't have done that. You'd have just thrown an invariant error like, oh, input invalidated, we don't accept that, or maybe you would have just broken silently, who knows, but you definitely wouldn't have returned a string for some person to read, like that wouldn't make sense. Error handling, we kind of talked about that.

[00:11:49]
There's so many issues around error handling. Directory creation, I don't think we're going to make a tool for that. Yeah, we're not going to make a tool for that, but obviously that's a tool you might want to do. Yeah, what happens when an agent tries to read a 10 megabyte log file? You know, like how do you handle different things like that, you know, do you just say, hey, this file is too long, this file is too big.

[00:12:15]
You should, you know, ask the user something else or like, we don't allow this, pick another file or tell the user this isn't capable, or maybe you do handle it and you stream it in some way and you try to output that to the agent in a stream way, but then you need to change your architecture and how you handle tools or just truncate just the first few lines and things like that. So, or try to summarize it somehow.

[00:12:41]
You know, upload this file to another LLM over here as an attachment, return a summary, and then take that summary and feed it back to my local LLM. Who knows? You'll have to figure out what the strategy is for yourself and how you want to do that. Binary files and stuff, not going to handle that right now, but typically, you would do, you would use some multimodality-based model that can handle attachments, that can read these natively.

[00:13:05]
You can also just try to parse them yourselves with some type of local parser, like maybe you want to try to parse some of these things out, you know, Base64 encoding an image or something like that, but in this case, you most likely want to try to just attach this to the agent itself natively unless you don't want some hosted provider getting access to your files and doing stuff with it, but it's probably what you want to do.

[00:00:00]
So when we're not talking about file tools, I'm specifically talking about the agent being able to interact with local files on your machine and not supporting uploading files to the LLM as an attachment, that's a different thing. That's not what I'm talking about.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now