AI Agents Fundamentals, v2

Name: AI Agents Fundamentals, v2
Author: Scott Moss

Scott Moss

Netflix

7 hours, 10 minutes CC

Course Description

Create a CLI agent from scratch! Learn the foundations of agent development like tool calling, agent loops, and and evals. Add human-in-the-loop approvals for higher-stakes operations. Monitor token usage and implement advanced for managing the context window.

This course and others like it are available as part of our Frontend Masters video subscription.

Join Now

Preview

What They're Saying

This is my first time working with agents, and Scott shared some really great knowledge that I can instantly apply to my work moving forward.

Daniel Kpatamia

Scott's courses are some of the best offered on this platform. He's honest, frank, and thorough. I never feel like he's getting high on his own intelligence or knowledge. I really appreciate his in-depth explanations and approach to teaching.

Elizabeth Hennessy

I really like Scott's teaching style. He has this super relaxed, but brilliant mad scientist vibe, that makes learning feel comfortable and enjoyable.

Mike Laurel

Course Details

Published: January 20, 2026

Rating

4.9

Learning Paths

Topics

Learn Straight from the Experts Who Shape the Modern Web

Your Path to Senior Developer and Beyond

250+ In-depth courses
24 Learning Paths
Industry Leading Experts
Live Interactive Workshops

Join Now

Introduction

Section Duration: 13 minutes

Introduction
00:00:00 - 00:03:51 View Transcript
Scott Moss introduces the course by showcasing AI agents through a simple browser-controlling agent built from natural language commands, emphasizing how an SDK handles the heavy lifting while the focus remains on tools and frameworks for building reliable agents.
CLI Agent Demo
00:03:52 - 00:13:04 View Transcript
Scott explains how the course will cover building agents from scratch, covering the tool loop, decision frameworks, and iterative improvements, then demonstrates a personal agent that can make API calls, search the web, write files, and use approval mechanisms.

Agent Basics

Section Duration: 14 minutes

Intro to AI Agents
00:13:05 - 00:21:51 View Transcript
Scott explains agents as LLMs that use reasoning frameworks and tools to adapt at runtime, contrasting them with rigid workflows and noting key limitations where agents may not perform well.
Create an Agent with OpenAI SDK
00:21:52 - 00:28:04 View Transcript
Scott walks through creating a simple "Hello World" LLM, covering environment setup, basic imports, and a function to interact with the model. He demonstrates running prompts in the terminal to show how easily an LLM can generate text.

Tool Calling

Section Duration: 20 minutes

What is Tool Calling?
00:28:05 - 00:33:17 View Transcript
Scott explains tool calling, showing how LLMs can interact with the outside world through custom functions and execution. He demonstrates defining a tool and emphasizes its importance for giving agents context and enabling tasks beyond basic text generation.
Create an Agent Tool Call
00:33:18 - 00:48:17 View Transcript
Scott shows how to create a new tool using a helper function and Zod for schema validation, including a description, input schema, and an execute function that returns the current date and time.

Evals

Section Duration: 1 hour, 49 minutes

Understanding Evals
00:48:18 - 01:03:37 View Transcript
Scott explains single-turn evals, which track metrics from one agent pass, highlighting their importance for testing non-deterministic AI. He also contrasts offline and online evals, emphasizing their role in guiding improvements and informed decisions.
Evals Telemetry with Laminar
01:03:38 - 01:10:11 View Transcript
Scott explains synthetic data and creating use cases to test agent performance, covering data collection and evaluation. He also demonstrates using Open Telemetry with Laminar for improved observability and metrics.
Adding Laminar Tracing
01:10:12 - 01:17:27 View Transcript
Scott explains setting up observability for message generation, showing how to import components, initialize functions, and enable telemetry. He highlights the importance of offline evaluations and flushing telemetry events to ensure data is sent correctly.
Single-Turn Eval Executor
01:17:28 - 01:42:27 View Transcript
Scott explains creating evals using data files with input-output pairs to test AI tool selection and improve tool descriptions. He walks through making mock tools and a single-turn executor that uses conversation history for dynamic evaluation.
Evaluators
01:42:28 - 01:47:36 View Transcript
Scott discusses evaluators, which score tool outputs against expected results, noting that deterministic JSON is easier to quantify than text. He demonstrates a tool selection score evaluator that compares expected and chosen tools to calculate precision.
Running Evaluations
01:47:37 - 02:07:16 View Transcript
Scott walks through writing an evaluation, covering scores, mocked data, and executors. He demonstrates creating an evaluation file for file tools, setting up an executor with single-turn mocks, and using evaluators to convert outputs into quantitative scores.
Analyze Eval Results
02:07:17 - 02:38:21 View Transcript
Scott covers running and interpreting evaluations, including average scores and analyzing individual runs to understand agent behavior. He discusses naming strategies for experiments, examining successes and failures, forming hypotheses for improvement, and emphasizes the essential role of human expertise in the iterative evaluation process.

Agent Loop

Section Duration: 46 minutes

Agent Loop Overview
02:38:22 - 02:46:20 View Transcript
Scott explains the agent loop, showing how it manages tasks with uncertain steps and adapts to changing requirements. He demonstrates creating the loop, handling LLM responses, stop conditions, and streaming tokens for smoother interaction.
Coding an Agent Loop
02:46:21 - 03:10:28 View Transcript
Scott demonstrates filtering messages to keep the LLM focused, setting up chat history, and streaming text generation. He shows how to handle tool calls and append responses to maintain the conversation flow.
Running an Agent Loop
03:10:29 - 03:24:43 View Transcript
Scott demonstrates executing tool calls sequentially, updating the UI to show progress, and pushing results into the messages array to maintain conversation flow. He also highlights presenting results in a user-friendly way for non-technical users.

Multi-Turn Evals

Section Duration: 55 minutes

Multi-Turn Evals
03:24:44 - 03:39:28 View Transcript
Scott explains multi-turn evaluation, where the agent runs with message history and tools to judge outputs. He highlights its role in assessing complex tasks, user experience, and using language models to evaluate unstructured outputs.
Coding a System Prompt
03:39:29 - 03:47:49 View Transcript
Scott demonstrates creating an evaluator by defining a schema for the judge, including output structure, score constraints, and reasoning. He shows using the AI SDK’s generate function to produce structured outputs, similar to setting up tool call inputs.
Coding a User Message
03:47:50 - 04:01:41 View Transcript
Scott walks through building a multi-turn executor with mocks, structuring messages, and collecting tool calls and results for evaluation. He emphasizes experimentation and fine-tuning to optimize prompts and model performance.
Coding the Eval
04:01:42 - 04:20:27 View Transcript
Scott demonstrates creating a multi-turn agent evaluation, including importing functions, setting up a mock executor, and considering various scenarios. He emphasizes using mock data early and running the evaluation to assess agent performance.

File System Tools

Section Duration: 50 minutes

Use Cases for File System Tools
04:20:28 - 04:33:54 View Transcript
Scott explores implementing file system tools, emphasizing responsible design, error handling, and their role in enabling agents to write code, manage data, and store state. He highlights use cases like agent memory, context loading, communication, and tool output storage to enhance agent capabilities.
Read & Write
04:33:55 - 04:51:23 View Transcript
Scott demonstrates creating file system tools, covering organization, read and write tool implementation, input schemas, and execution steps. He emphasizes error handling, detailed messages, and guiding the AI for accurate task completion.
List & Delete
04:51:24 - 05:05:12 View Transcript
Scott demonstrates listing files and directories with optional default paths, formatting results for clarity, and safely deleting files using methods like fs.unlink.
Testing
05:05:13 - 05:11:06 View Transcript
Scott demonstrates setting up a safe testing environment, creating a new directory, and configuring environment variables. He also addresses potential CLI errors and shows how to troubleshoot them.

Web Search & Context Management

Section Duration: 55 minutes

Web Search for Agents
05:11:07 - 05:25:07 View Transcript
Scott covers web search for agents, showing how LLMs can access online information while managing context and grounding outputs in truth. He discusses using native tools, handling costs and limits, and balancing efficiency with model context constraints.
Strategies for Managing Context
05:25:08 - 05:35:15 View Transcript
Scott explores strategies for managing context, including summarization, eviction, sliding windows, sub-agents, and RAG. He explains how RAG uses vector search to dynamically add relevant information for efficient LLM use.
Creating a Web Search Tool
05:35:16 - 05:40:24 View Transcript
Scott demonstrates adding a web search tool, showing how to integrate it with existing tools and trigger compaction when token usage is high. He tests the tool by asking questions and verifying search result accuracy.
Model Token Limits
05:40:25 - 05:47:42 View Transcript
Scott explains building a custom compaction system, covering token counting, usage limits, and context window management. He discusses recursive compaction, potential data loss, and strategies for balancing performance and detail.
Context Window Compaction
05:47:43 - 05:57:10 View Transcript
Scott demonstrates setting up a compaction system using an LLM, including summarization prompts to create concise conversation summaries. He shows filtering messages, converting them to text, and generating summaries to maintain seamless agent interactions.
Adding Context Window Management
05:57:11 - 06:07:06 View Transcript
Scott shows how to update the run loop for context window management, including importing elements and checking model token limits. He demonstrates compacting conversations when thresholds are exceeded and reporting token usage throughout the code.

Shell Tool

Section Duration: 25 minutes

Shell & Code Execution
06:07:07 - 06:13:07 View Transcript
Scott explains giving an AGI agent terminal access, allowing it to run commands, install packages, and perform tasks efficiently. He also highlights safety considerations, emphasizing supervision to prevent unintended actions.
Sandboxed Execution
06:13:08 - 06:25:03 View Transcript
Scott explains sandboxing to run code safely, covering methods like VMs, Docker, and services like Daytona. He demonstrates creating a shell command tool in JavaScript and tests it by running commands through the AI.
Code Execution Tool
06:25:04 - 06:33:07 View Transcript
Scott discusses code execution as a tool like shell commands but avoids implementing it due to safety and complexity. He emphasizes building higher-level tools to streamline workflows and improve AGI functionality.

Human Guidance & Approvals

Section Duration: 22 minutes

Human in the Loop
06:33:08 - 06:40:32 View Transcript
Scott explains reinforcement learning with human feedback, highlighting the role of deterministic approvals in runtime actions. He emphasizes the human-in-the-loop as key for trustworthy AI and maximizing productivity gains.
Approval Flow Architectures
06:40:33 - 06:46:13 View Transcript
Scott explains synchronous and asynchronous approval flows, showing how asynchronous flows allow tasks to be paused and resumed efficiently. He covers various approval methods, highlighting the flexibility and resource savings of asynchronous approaches.
Adding Approvals
06:46:14 - 06:56:01 View Transcript
Scott demonstrates adding an approval flow for tool calls, showing how to seek approval, handle rejections, and execute tools only when approved. He emphasizes passing arguments for prompts and managing user rejections gracefully.

Wrapping Up

Section Duration: 15 minutes

Wrapping Up
06:56:02 - 07:11:15 View Transcript
Scott wraps up the course by reviewing agent frameworks like OpenAI Agents SDK, Mastra, and Voltagent, highlighting their tools for context, model, and guardrail management. He also recommends exploring Browserbase and Director to support agent development and deployment.

Learn Straight from the Experts Who Shape the Modern Web

250+
In-depth Courses
Industry Leading Experts
24
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now

Course Description

What They're Saying

Course Details

Rating

Learning Paths

Topics

Learn Straight from the Experts Who Shape the Modern Web

Table of Contents

Introduction

Agent Basics

Tool Calling

Evals

Agent Loop

Multi-Turn Evals

File System Tools

Web Search & Context Management

Shell Tool

Human Guidance & Approvals

Wrapping Up

Learn Straight from the Experts Who Shape the Modern Web