AI Agents Fundamentals, v2

Scott Moss
Netflix
7 hours, 10 minutes CC
AI Agents Fundamentals, v2

Course Description

Create a CLI agent from scratch! Learn the foundations of agent development like tool calling, agent loops, and and evals. Add human-in-the-loop approvals for higher-stakes operations. Monitor token usage and implement advanced for managing the context window.

This course and others like it are available as part of our Frontend Masters video subscription.

Preview

Course Details

Published: January 20, 2026

Rating

4.9

Learn Straight from the Experts Who Shape the Modern Web

Your Path to Senior Developer and Beyond
  • 250+ In-depth courses
  • 24 Learning Paths
  • Industry Leading Experts
  • Live Interactive Workshops

Table of Contents

Introduction

Section Duration: 13 minutes
  • Introduction
    Scott Moss introduces the course by showcasing AI agents through a simple browser-controlling agent built from natural language commands, emphasizing how an SDK handles the heavy lifting while the focus remains on tools and frameworks for building reliable agents.
  • CLI Agent Demo
    Scott explains how the course will cover building agents from scratch, covering the tool loop, decision frameworks, and iterative improvements, then demonstrates a personal agent that can make API calls, search the web, write files, and use approval mechanisms.

Agent Basics

Section Duration: 14 minutes

Tool Calling

Section Duration: 20 minutes

Evals

Section Duration: 1 hour, 49 minutes
  • Understanding Evals
    Scott explains single-turn evals, which track metrics from one agent pass, highlighting their importance for testing non-deterministic AI. He also contrasts offline and online evals, emphasizing their role in guiding improvements and informed decisions.
  • Evals Telemetry with Laminar
    Scott explains synthetic data and creating use cases to test agent performance, covering data collection and evaluation. He also demonstrates using Open Telemetry with Laminar for improved observability and metrics.
  • Adding Laminar Tracing
    Scott explains setting up observability for message generation, showing how to import components, initialize functions, and enable telemetry. He highlights the importance of offline evaluations and flushing telemetry events to ensure data is sent correctly.
  • Single-Turn Eval Executor
    Scott explains creating evals using data files with input-output pairs to test AI tool selection and improve tool descriptions. He walks through making mock tools and a single-turn executor that uses conversation history for dynamic evaluation.
  • Evaluators
    Scott discusses evaluators, which score tool outputs against expected results, noting that deterministic JSON is easier to quantify than text. He demonstrates a tool selection score evaluator that compares expected and chosen tools to calculate precision.
  • Running Evaluations
    Scott walks through writing an evaluation, covering scores, mocked data, and executors. He demonstrates creating an evaluation file for file tools, setting up an executor with single-turn mocks, and using evaluators to convert outputs into quantitative scores.
  • Analyze Eval Results
    Scott covers running and interpreting evaluations, including average scores and analyzing individual runs to understand agent behavior. He discusses naming strategies for experiments, examining successes and failures, forming hypotheses for improvement, and emphasizes the essential role of human expertise in the iterative evaluation process.

Agent Loop

Section Duration: 46 minutes
  • Agent Loop Overview
    Scott explains the agent loop, showing how it manages tasks with uncertain steps and adapts to changing requirements. He demonstrates creating the loop, handling LLM responses, stop conditions, and streaming tokens for smoother interaction.
  • Coding an Agent Loop
    Scott demonstrates filtering messages to keep the LLM focused, setting up chat history, and streaming text generation. He shows how to handle tool calls and append responses to maintain the conversation flow.
  • Running an Agent Loop
    Scott demonstrates executing tool calls sequentially, updating the UI to show progress, and pushing results into the messages array to maintain conversation flow. He also highlights presenting results in a user-friendly way for non-technical users.

Multi-Turn Evals

Section Duration: 55 minutes
  • Multi-Turn Evals
    Scott explains multi-turn evaluation, where the agent runs with message history and tools to judge outputs. He highlights its role in assessing complex tasks, user experience, and using language models to evaluate unstructured outputs.
  • Coding a System Prompt
    Scott demonstrates creating an evaluator by defining a schema for the judge, including output structure, score constraints, and reasoning. He shows using the AI SDK’s generate function to produce structured outputs, similar to setting up tool call inputs.
  • Coding a User Message
    Scott walks through building a multi-turn executor with mocks, structuring messages, and collecting tool calls and results for evaluation. He emphasizes experimentation and fine-tuning to optimize prompts and model performance.
  • Coding the Eval
    Scott demonstrates creating a multi-turn agent evaluation, including importing functions, setting up a mock executor, and considering various scenarios. He emphasizes using mock data early and running the evaluation to assess agent performance.

File System Tools

Section Duration: 50 minutes
  • Use Cases for File System Tools
    Scott explores implementing file system tools, emphasizing responsible design, error handling, and their role in enabling agents to write code, manage data, and store state. He highlights use cases like agent memory, context loading, communication, and tool output storage to enhance agent capabilities.
  • Read & Write
    Scott demonstrates creating file system tools, covering organization, read and write tool implementation, input schemas, and execution steps. He emphasizes error handling, detailed messages, and guiding the AI for accurate task completion.
  • List & Delete
    Scott demonstrates listing files and directories with optional default paths, formatting results for clarity, and safely deleting files using methods like fs.unlink.
  • Testing
    Scott demonstrates setting up a safe testing environment, creating a new directory, and configuring environment variables. He also addresses potential CLI errors and shows how to troubleshoot them.

Web Search & Context Management

Section Duration: 55 minutes

Shell Tool

Section Duration: 25 minutes

Human Guidance & Approvals

Section Duration: 22 minutes
  • Human in the Loop
    Scott explains reinforcement learning with human feedback, highlighting the role of deterministic approvals in runtime actions. He emphasizes the human-in-the-loop as key for trustworthy AI and maximizing productivity gains.
  • Approval Flow Architectures
    Scott explains synchronous and asynchronous approval flows, showing how asynchronous flows allow tasks to be paused and resumed efficiently. He covers various approval methods, highlighting the flexibility and resource savings of asynchronous approaches.
  • Adding Approvals
    Scott demonstrates adding an approval flow for tool calls, showing how to seek approval, handle rejections, and execute tools only when approved. He emphasizes passing arguments for prompts and managing user rejections gracefully.

Wrapping Up

Section Duration: 15 minutes
  • Wrapping Up
    Scott wraps up the course by reviewing agent frameworks like OpenAI Agents SDK, Mastra, and Voltagent, highlighting their tools for context, model, and guardrail management. He also recommends exploring Browserbase and Director to support agent development and deployment.

Learn Straight from the Experts Who Shape the Modern Web

  • 250+
    In-depth Courses
  • Industry Leading Experts
  • 24
    Learning Paths
  • Live Interactive Workshops
Get Unlimited Access Now