Harness Engineering & Agent Orchestration

Scott Moss
Netflix
5 hours, 5 minutes CC
Harness Engineering & Agent Orchestration

Course Description

Wrap an LLM in a production-grade harness, adding durable execution, secure sandboxing, memory systems, and multi-agent orchestration. Go beyond brittle agent demos and ship a harness that's recoverable, trustworthy, and safe to deploy.

Prerequisite: Comfort with TypeScript and Node.js. Experience building basic agents or AI-powered applications is also recommended.
Preview

Course Details

Published: July 1, 2026

Learn Straight from the Experts Who Shape the Modern Web

Your Path to Senior Developer and Beyond
  • 300+ In-depth courses
  • 24 Learning Paths
  • Industry Leading Experts
  • Live Interactive Workshops

Table of Contents

Introduction

Section Duration: 12 minutes
  • Introduction
    Scott Moss, a senior software engineer at Netflix, begins the course by explaining the need for agent harnesses. They are systems designed to create a reliable, durable environment where AI agents can operate effectively. The focus of the course is on the harness rather than the agent itself. The course repo is provided and contains solutions to each lesson along with the notes for following along.

Harness Engineering

Section Duration: 1 hour, 7 minutes
  • What is an Agent Harness
    Scott explains the core features of a harness. A harness is the infrastructure that transforms an LLM into a fully functional agent. It can support memory, tool integration, durable execution, and self-healing capabilities. Without a harness, you only have transactional inference calls, not a true agent.
  • Course Project Setup
    Scott walks through the JavaScript-based AI agent harness project using which uses Vercel's AI SDK. He covers the project structure, model flexibility, runtime design, and WebSocket communication for distributed agent execution. The starter project is located on the lesson-1 branch.
  • Agent Tools Setup
    Scott builds out a basic LLM agent loop. The model decides whether to use a tool, gets the result, and repeats until it can answer. Simple loops are helpful for understanding agent anatomy, however, they lack failure handling and approval layers.
  • Classify & Reply Tools
    Scott continues building the basic agent loop. He adds tools for classifying user requests and drafting and sending replies to customers. A default system prompt is also added to the harness.
  • Harness Runtime
    Scott implements the runAgent method, which interacts with the LLM using streaming responses, tool calls, and event emissions to update the UI in real time. This provides an in-memory message array and a hand-rolled loop. If the model calls a tool, that turn ends, and the system decides what happens next.
  • Stream Messages to the UI
    Scott highlights how streaming allows partial results to be sent to the UI as they arrive, improving responsiveness. Without streaming, users wait for the full response, which is slower and less practical. An OpenAI API key is added to the project and the basic agent harness is tested.

Durable Execution

Section Duration: 1 hour, 19 minutes
  • Durable Execution Setup with NeonDB
    Scott discusses the necessity of harnesses having durable execution. Sessions should resume seamlessly after interruptions. This harness uses DBOS, which is backed by a PostgreSQL database hosted on Neon.
  • Implementing Durable Execution
    Scott creates a basic, durable event bus system using a Postgres database and the Drizzle ORM. He codes a database client, defining an event log table, and implementing event persistence and replay functionality.
  • Durable Tools
    Scott refactors the tool calls to centralize control of tool execution, making it more durable and manageable within a workflow system using DBOS. Removing automatic execution by the AI SDK and wrapping tool calls and model interactions builds a robust, persistent harness for AI tool workflows.
  • Durable Agent Loop
    Scott adds durability to the agent loop. Every model turn and tool call is wrapped in a DBOS.runStep(), which serves as a checkpoint for each step's result. Should a recovery be necessary, the cached results are returned instead of rerunning each step.
  • Connecting DBOS to the Server
    Scott refactors the Express server to use DBOS along with the WebSocket connection. The entire server application is wrapped in a "main" function to start the application. After debugging some issues, Scott tests the durability of the harness.
  • Durable Execution Recap
    Scott reviews the refactoring of actions such as tool calls, event emissions, and text streaming, which are now wrapped into single "steps" that persist results in a database. He also discusses how the server handles disconnects and reconnects, or when the LLM should respond.

Sandboxed Tools & Memory

Section Duration: 1 hour, 6 minutes

Orchestration & Supervision

Section Duration: 1 hour, 9 minutes
  • Agent Handoffs
    Scott explains the benefits and use cases for routing and handoffs in multi-agent systems, including when and why to use multiple agents versus a single agent. He highlights the difference between agent handoffs and sub-agents and outlines the initial steps for implementing agent primitives and handoff tools in a software system.
  • Billing Agent & Handoff Tool
    Scott creates a billing agent and a handoff tool for delegating work to the subagent. The model turn function is updated to accept agent-specific tools and system prompts.
  • Agent Triage & Handoff
    Scott adds a triage agent to act as the initial point of contact, handling requests or delegating to specialized agents. This multi-agent handoff system focuses on managing agent transitions, tool calls, and message handling to ensure smooth task delegation.
  • Supervision with Subagents
    Scott introduces the concept of "supervision" and adds a "plan mode" where a plan can be created and executed by sub-agents under supervision. Sub-agents are lightweight, read-only "investigators" with limited tools and no direct database write access. Investigators are specialized by domain (billing, technical, sales) with tailored system prompts and tools.
  • Supervisor Workflow
    Scott implements the supervisor workflow, which leverages structured outputs and DBOS. The JSON plan schema is generated via the LLM and includes synthesized investigator findings along with workflow steps, including even emissions to track state.
  • Dispatching Subagents
    Scott makes the supervisor workflow dispatch multiple sub-agents in parallel to investigate different objectives simultaneously. This demonstrates how the harness manages asynchronous execution, handles errors, and synthesizes findings into a final response for the user.
  • Human in the Loop
    Scott spends a few minutes discussing how a human-in-the-loop workflow could be added to the harness, giving certain tasks, like issuing a refund, an approval mechanism. The lesson 7 notes have the full implementation details for this feature.

Wrapping Up

Section Duration: 8 minutes

Earn a Completion Certificate

After completing this course, you'll receive a certificate of completion that serves as proof of your achievement, showcasing your expertise, and commitment to professional development. You can easily share this certificate on your LinkedIn profile to highlight your new skills and demonstrate continuous learning to potential employers and professional connections.

Sample completion certificate