Harness Engineering & Agent Orchestration

Netflix

5 hours, 5 minutes CC

Course Description

Wrap an LLM in a production-grade harness, adding durable execution, secure sandboxing, memory systems, and multi-agent orchestration. Go beyond brittle agent demos and ship a harness that's recoverable, trustworthy, and safe to deploy.

Prerequisite: Comfort with TypeScript and Node.js. Experience building basic agents or AI-powered applications is also recommended.

Start Watching for Free

Preview

Course Details

Published: July 1, 2026

Rating

4.2

Learning Paths

Coding with AI

Topics

Learn Straight from the Experts Who Shape the Modern Web

Your Path to Senior Developer and Beyond

300+ In-depth courses
24 Learning Paths
Industry Leading Experts
Live Interactive Workshops

Table of Contents

Introduction

Section Duration: 12 minutes

Introduction
00:00:00 - 00:12:54 Watch Lesson
Scott Moss, a senior software engineer at Netflix, begins the course by explaining the need for agent harnesses. They are systems designed to create a reliable, durable environment where AI agents can operate effectively. The focus of the course is on the harness rather than the agent itself. The course repo is provided and contains solutions to each lesson along with the notes for following along.

Harness Engineering

Section Duration: 1 hour, 7 minutes

What is an Agent Harness
00:12:55 - 00:19:55 Watch Lesson
Scott explains the core features of a harness. A harness is the infrastructure that transforms an LLM into a fully functional agent. It can support memory, tool integration, durable execution, and self-healing capabilities. Without a harness, you only have transactional inference calls, not a true agent.
Course Project Setup
00:19:56 - 00:29:50 Watch Lesson
Scott walks through the JavaScript-based AI agent harness project using which uses Vercel's AI SDK. He covers the project structure, model flexibility, runtime design, and WebSocket communication for distributed agent execution. The starter project is located on the lesson-1 branch.
Agent Tools Setup
00:29:51 - 00:42:55 Watch Lesson
Scott builds out a basic LLM agent loop. The model decides whether to use a tool, gets the result, and repeats until it can answer. Simple loops are helpful for understanding agent anatomy, however, they lack failure handling and approval layers.
Classify & Reply Tools
00:42:56 - 00:53:03 View Transcript
Scott continues building the basic agent loop. He adds tools for classifying user requests and drafting and sending replies to customers. A default system prompt is also added to the harness.
Harness Runtime
00:53:04 - 01:09:12 View Transcript
Scott implements the runAgent method, which interacts with the LLM using streaming responses, tool calls, and event emissions to update the UI in real time. This provides an in-memory message array and a hand-rolled loop. If the model calls a tool, that turn ends, and the system decides what happens next.
Stream Messages to the UI
01:09:13 - 01:20:55 View Transcript
Scott highlights how streaming allows partial results to be sent to the UI as they arrive, improving responsiveness. Without streaming, users wait for the full response, which is slower and less practical. An OpenAI API key is added to the project and the basic agent harness is tested.

Durable Execution

Section Duration: 1 hour, 19 minutes

Durable Execution Setup with NeonDB
01:20:56 - 01:29:24 View Transcript
Scott discusses the necessity of harnesses having durable execution. Sessions should resume seamlessly after interruptions. This harness uses DBOS, which is backed by a PostgreSQL database hosted on Neon.
Implementing Durable Execution
01:29:25 - 01:44:38 View Transcript
Scott creates a basic, durable event bus system using a Postgres database and the Drizzle ORM. He codes a database client, defining an event log table, and implementing event persistence and replay functionality.
Durable Tools
01:44:39 - 02:07:15 View Transcript
Scott refactors the tool calls to centralize control of tool execution, making it more durable and manageable within a workflow system using DBOS. Removing automatic execution by the AI SDK and wrapping tool calls and model interactions builds a robust, persistent harness for AI tool workflows.
Durable Agent Loop
02:07:16 - 02:17:42 View Transcript
Scott adds durability to the agent loop. Every model turn and tool call is wrapped in a DBOS.runStep(), which serves as a checkpoint for each step's result. Should a recovery be necessary, the cached results are returned instead of rerunning each step.
Connecting DBOS to the Server
02:17:43 - 02:34:00 View Transcript
Scott refactors the Express server to use DBOS along with the WebSocket connection. The entire server application is wrapped in a "main" function to start the application. After debugging some issues, Scott tests the durability of the harness.
Durable Execution Recap
02:34:01 - 02:40:45 View Transcript
Scott reviews the refactoring of actions such as tool calls, event emissions, and text streaming, which are now wrapped into single "steps" that persist results in a database. He also discusses how the server handles disconnects and reconnects, or when the LLM should respond.

Sandboxed Tools & Memory

Section Duration: 1 hour, 6 minutes

Code Mode & Security
02:40:46 - 02:46:01 View Transcript
Scott introduces some advanced agent capabilities, such as sandboxing and code mode. Sandboxing protects the host environment by isolating code execution.
Sandbox Boundary & runCode Tool
02:46:02 - 03:00:11 View Transcript
Scott creates a secure sandbox environment so the harness can execute code using the runCode tool. This will enable workflows to handle arithmetic and data operations (e.g., calculating charges) securely in the sandbox rather than relying on language model computations.
Updating the System Prompt
03:00:12 - 03:09:40 View Transcript
Scott updates the system prompt to reflect the harness's new code-mode capabilities. He tests the system with a duplicate-order scenario that requires the refund to be calculated. The harness successfully generated a script to find the duplicate order and produce a refund amount.
Memory & Context Hydration
03:09:41 - 03:24:44 View Transcript
Scott discusses strategies for managing memory and context, specifically addressing the challenges of growing context windows and efficiently compacting and hydrating context for better performance.
Summarization & Compaction
03:24:45 - 03:39:32 View Transcript
Scott implements summarization and compaction to more efficiently manage memory within the conversation. The harness maintains an array of conversation turns and a running summary string. It will summarize the conversation history before each model turn to prevent hitting token limits.
Clearing Durable Event Log
03:39:33 - 03:47:33 View Transcript
Scott adds the ability to clear the conversation. This not only clears the UI but also empties the conversation history stored in the durable event log.

Orchestration & Supervision

Section Duration: 1 hour, 9 minutes

Agent Handoffs
03:47:34 - 03:56:36 View Transcript
Scott explains the benefits and use cases for routing and handoffs in multi-agent systems, including when and why to use multiple agents versus a single agent. He highlights the difference between agent handoffs and sub-agents and outlines the initial steps for implementing agent primitives and handoff tools in a software system.
Billing Agent & Handoff Tool
03:56:37 - 04:09:58 View Transcript
Scott creates a billing agent and a handoff tool for delegating work to the subagent. The model turn function is updated to accept agent-specific tools and system prompts.
Agent Triage & Handoff
04:09:59 - 04:20:18 View Transcript
Scott adds a triage agent to act as the initial point of contact, handling requests or delegating to specialized agents. This multi-agent handoff system focuses on managing agent transitions, tool calls, and message handling to ensure smooth task delegation.
Supervision with Subagents
04:20:19 - 04:30:13 View Transcript
Scott introduces the concept of "supervision" and adds a "plan mode" where a plan can be created and executed by sub-agents under supervision. Sub-agents are lightweight, read-only "investigators" with limited tools and no direct database write access. Investigators are specialized by domain (billing, technical, sales) with tailored system prompts and tools.
Supervisor Workflow
04:30:14 - 04:40:27 View Transcript
Scott implements the supervisor workflow, which leverages structured outputs and DBOS. The JSON plan schema is generated via the LLM and includes synthesized investigator findings along with workflow steps, including even emissions to track state.
Dispatching Subagents
04:40:28 - 04:54:09 View Transcript
Scott makes the supervisor workflow dispatch multiple sub-agents in parallel to investigate different objectives simultaneously. This demonstrates how the harness manages asynchronous execution, handles errors, and synthesizes findings into a final response for the user.
Human in the Loop
04:54:10 - 04:57:07 View Transcript
Scott spends a few minutes discussing how a human-in-the-loop workflow could be added to the harness, giving certain tasks, like issuing a refund, an approval mechanism. The lesson 7 notes have the full implementation details for this feature.

Wrapping Up

Section Duration: 8 minutes

Wrapping Up
04:57:08 - 05:05:24 View Transcript
Scott wraps up the course by emphasizing how bleeding-edge this work is and the advantage these skills give engineers who have the curiosity and motivation to dive deeper.

Earn a Completion Certificate

After completing this course, you'll receive a certificate of completion that serves as proof of your achievement, showcasing your expertise, and commitment to professional development. You can easily share this certificate on your LinkedIn profile to highlight your new skills and demonstrate continuous learning to potential employers and professional connections.

Sample completion certificate