Netflix
Course Description
Wrap an LLM in a production-grade harness, adding durable execution, secure sandboxing, memory systems, and multi-agent orchestration. Go beyond brittle agent demos and ship a harness that's recoverable, trustworthy, and safe to deploy.
Prerequisite: Comfort with TypeScript and Node.js. Experience building basic agents or AI-powered applications is also recommended.
Preview
Course Details
Published: July 1, 2026
Learn Straight from the Experts Who Shape the Modern Web
Your Path to Senior Developer and Beyond
- 300+ In-depth courses
- 24 Learning Paths
- Industry Leading Experts
- Live Interactive Workshops
Table of Contents
Introduction
Section Duration: 12 minutes
Scott Moss, a senior software engineer at Netflix, begins the course by explaining the need for agent harnesses. They are systems designed to create a reliable, durable environment where AI agents can operate effectively. The focus of the course is on the harness rather than the agent itself. The course repo is provided and contains solutions to each lesson along with the notes for following along.
Harness Engineering
Section Duration: 1 hour, 7 minutes
Scott explains the core features of a harness. A harness is the infrastructure that transforms an LLM into a fully functional agent. It can support memory, tool integration, durable execution, and self-healing capabilities. Without a harness, you only have transactional inference calls, not a true agent.
Scott walks through the JavaScript-based AI agent harness project using which uses Vercel's AI SDK. He covers the project structure, model flexibility, runtime design, and WebSocket communication for distributed agent execution. The starter project is located on the lesson-1 branch.
Scott builds out a basic LLM agent loop. The model decides whether to use a tool, gets the result, and repeats until it can answer. Simple loops are helpful for understanding agent anatomy, however, they lack failure handling and approval layers.
Scott continues building the basic agent loop. He adds tools for classifying user requests and drafting and sending replies to customers. A default system prompt is also added to the harness.
Scott implements the runAgent method, which interacts with the LLM using streaming responses, tool calls, and event emissions to update the UI in real time. This provides an in-memory message array and a hand-rolled loop. If the model calls a tool, that turn ends, and the system decides what happens next.
Scott highlights how streaming allows partial results to be sent to the UI as they arrive, improving responsiveness. Without streaming, users wait for the full response, which is slower and less practical. An OpenAI API key is added to the project and the basic agent harness is tested.
Durable Execution
Section Duration: 1 hour, 19 minutes
Scott discusses the necessity of harnesses having durable execution. Sessions should resume seamlessly after interruptions. This harness uses DBOS, which is backed by a PostgreSQL database hosted on Neon.
Scott creates a basic, durable event bus system using a Postgres database and the Drizzle ORM. He codes a database client, defining an event log table, and implementing event persistence and replay functionality.
Scott refactors the tool calls to centralize control of tool execution, making it more durable and manageable within a workflow system using DBOS. Removing automatic execution by the AI SDK and wrapping tool calls and model interactions builds a robust, persistent harness for AI tool workflows.
Scott adds durability to the agent loop. Every model turn and tool call is wrapped in a DBOS.runStep(), which serves as a checkpoint for each step's result. Should a recovery be necessary, the cached results are returned instead of rerunning each step.
Scott refactors the Express server to use DBOS along with the WebSocket connection. The entire server application is wrapped in a "main" function to start the application. After debugging some issues, Scott tests the durability of the harness.
Scott reviews the refactoring of actions such as tool calls, event emissions, and text streaming, which are now wrapped into single "steps" that persist results in a database. He also discusses how the server handles disconnects and reconnects, or when the LLM should respond.
Sandboxed Tools & Memory
Section Duration: 1 hour, 6 minutes
Scott introduces some advanced agent capabilities, such as sandboxing and code mode. Sandboxing protects the host environment by isolating code execution.
Scott creates a secure sandbox environment so the harness can execute code using the runCode tool. This will enable workflows to handle arithmetic and data operations (e.g., calculating charges) securely in the sandbox rather than relying on language model computations.
Scott updates the system prompt to reflect the harness's new code-mode capabilities. He tests the system with a duplicate-order scenario that requires the refund to be calculated. The harness successfully generated a script to find the duplicate order and produce a refund amount.
Scott discusses strategies for managing memory and context, specifically addressing the challenges of growing context windows and efficiently compacting and hydrating context for better performance.
Scott implements summarization and compaction to more efficiently manage memory within the conversation. The harness maintains an array of conversation turns and a running summary string. It will summarize the conversation history before each model turn to prevent hitting token limits.
Scott adds the ability to clear the conversation. This not only clears the UI but also empties the conversation history stored in the durable event log.
Orchestration & Supervision
Section Duration: 1 hour, 9 minutes
Scott explains the benefits and use cases for routing and handoffs in multi-agent systems, including when and why to use multiple agents versus a single agent. He highlights the difference between agent handoffs and sub-agents and outlines the initial steps for implementing agent primitives and handoff tools in a software system.
Scott creates a billing agent and a handoff tool for delegating work to the subagent. The model turn function is updated to accept agent-specific tools and system prompts.
Scott adds a triage agent to act as the initial point of contact, handling requests or delegating to specialized agents. This multi-agent handoff system focuses on managing agent transitions, tool calls, and message handling to ensure smooth task delegation.
Scott introduces the concept of "supervision" and adds a "plan mode" where a plan can be created and executed by sub-agents under supervision. Sub-agents are lightweight, read-only "investigators" with limited tools and no direct database write access. Investigators are specialized by domain (billing, technical, sales) with tailored system prompts and tools.
Scott implements the supervisor workflow, which leverages structured outputs and DBOS. The JSON plan schema is generated via the LLM and includes synthesized investigator findings along with workflow steps, including even emissions to track state.
Scott makes the supervisor workflow dispatch multiple sub-agents in parallel to investigate different objectives simultaneously. This demonstrates how the harness manages asynchronous execution, handles errors, and synthesizes findings into a final response for the user.
Scott spends a few minutes discussing how a human-in-the-loop workflow could be added to the harness, giving certain tasks, like issuing a refund, an approval mechanism. The lesson 7 notes have the full implementation details for this feature.
Wrapping Up
Section Duration: 8 minutes
Scott wraps up the course by emphasizing how bleeding-edge this work is and the advantage these skills give engineers who have the curiosity and motivation to dive deeper.
Earn a Completion Certificate
After completing this course, you'll receive a certificate of completion that serves as proof of your achievement, showcasing your expertise, and commitment to professional development. You can easily share this certificate on your LinkedIn profile to highlight your new skills and demonstrate continuous learning to potential employers and professional connections.
