AI Agent: From Prototype to Production

Name: AI Agent: From Prototype to Production
Author: Scott Moss

Scott Moss

Netflix

3 hours, 50 minutes CC

Course Description

Build production-ready AI apps. Write evals to measure LLM and tool accuracy. Implement a Retrieval Augmented Generation (RAG) pipeline and explore how structured outputs provide a predictable schema for LLM responses. Responsibly manage costs and token limits with proper context memory management. Build better guardrails within the system with human-in-the-loop best practices.

This course and others like it are available as part of our Frontend Masters video subscription.

Join Now

Preview

Course Details

Published: December 11, 2024

Rating

4.8

Learning Paths

Coding with AI

Topics

Learn Straight from the Experts Who Shape the Modern Web

Your Path to Senior Developer and Beyond

250+ In-depth courses
18 Learning Paths
Industry Leading Experts
Live Interactive Workshops

Join Now

Introduction

Section Duration: 11 minutes

Introduction
00:00:00 - 00:08:56 View Transcript
Scott Moss begins the course with an overview of the prerequisites and tooling required to code, along with examples. This course builds on Frontend Masters' "Building an AI Agent from Scratch" course and will focus on preparing an agent for production.
LLM & Agents Review
00:08:57 - 00:11:29 View Transcript
Scott spends a few minutes reviewing LLMs and AI agents. LLMs can process text by analyzing relationships between the different parts or tokens of the input. Agents are a more sophisticated system that uses an LLM as its "brain" but enhances it with memory, tool use, decision-making abilities, and the capacity to take action in the real world.

Improving LLMs with Evals

Section Duration: 1 hour, 10 minutes

Evals
00:11:30 - 00:20:21 View Transcript
Scott introduces evals that test and score responses from LLMs and agents. Datasets are required to help an eval understand the quality of a response. Datasets can be curated or provided by users who continuously rank responses they receive from an agent. This measuring is an essential component of production agents to ensure high levels of quality.
What to Measure with Evals
00:20:22 - 00:30:22 View Transcript
Scott discusses what evals should measure. Quality metrics can help assess how well tasks are performed. Measuring also adds safety and reliability to the application by ensuring the system operates within acceptable bounds.
Setting Up an Eval Framework
00:30:23 - 00:43:00 View Transcript
Scott walks through the eval framework created for this course. A scorer is created to determine whether the system's output is what was expected given the input. A question about storing inputs, outputs, and expected results for later analysis is also addressed in this lesson.
Creating an Eval
00:43:01 - 00:52:13 View Transcript
Scott codes an eval for the Reddit tool. The eval's task calls the runLLM method with the input. Since the word "reddit" is contained in the input, eval is expected to call the Reddit tool.
Viewing Eval Results
00:52:14 - 00:56:45 View Transcript
Scott demonstrates the React dashboard application included in the project. The dashboard displays the results from each eval run. The data for the dashboard is pulled from the results.json file in the project's root.
Handling Evals on Subjective Inputs
00:56:46 - 01:10:22 View Transcript
Scott creates two more evals. One measures the dadJokes tool, and the other measures the generateImage tool. Since users may refer to an image as a "photo", the prompt is updated to help the system handle both scenarios. This demonstrates the interactive process of better accuracy.
Eval Multiple Tools
01:10:23 - 01:22:19 View Transcript
Scott creates an eval to test all the tools. The eval is populated with several data items and run. Depending on the score, modifications to the prompt may be required to achieve a 100% success rate. After creating the eval, some additional metric tools are discussed.

Retrieval Augmented Generation (RAG)

Section Duration: 1 hour, 7 minutes

RAG Overview
01:22:20 - 01:28:16 View Transcript
Scott introduces RAG, or Retrieval Augmented Generation, which is a shift in how to approach LLM knowledge enhancement. Rather than relying on an LLM's built-in knowledge, RAG dynamically injects relevant information into the conversation by retrieving it from an external knowledge base.
RAG Pipeline
01:28:17 - 01:42:19 View Transcript
Scott walks through the RAG pipeline to highlight the core components and how they work together. The pipeline includes document processing, embedding generation, and storage/indexing. Once the query is received, the retrieval process can begin.
Create an Upstash Vector Database
01:42:20 - 01:47:35 View Transcript
Scott introduces Upstash which is where the movie data is stored and indexed. Upstash is a vector database that allows for querying and filtering. Results are returned based on their proximity or correctness.
Ingesting Data into Vector DB
01:47:36 - 02:04:55 View Transcript
Scott codes a script to ingest the data in the Upstash. Once the script is complete, the data can be queried in Upstash with a movieSearch tool.
Create a Movies Query
02:04:56 - 02:13:09 View Transcript
Scott creates a query for searching the movies. For now, the query doesn't support filtering. The vector store will be sent the query and topK to return to the application.
Create a Movie Search Tool
02:13:10 - 02:29:51 View Transcript
Scott creates the movieSearch tool for executing the search of the movie vector database. Scott tests the system by asking for a specific movie and later asking for a movie poster. The system correctly uses the moveSearch tool to return results and the generateImage tool for the poster. The final code can be found on the `step/4` branch in the repo.

Structured Output

Section Duration: 16 minutes

Using Structured Outputs
02:29:52 - 02:42:18 View Transcript
Scott introduces structured outputs, which represent a significant advancement in handling AI responses. They ensure that LLM responses always adhere to a predefined JSON Schema, eliminating many common issues with free-form AI outputs.
Limitations of Structured Outputs
02:42:19 - 02:45:59 View Transcript
Scott shares some limitations with structured outputs. One limitation is constraints on the schema, like the maximum number of objects or nesting levels. Another is string limitations, such as property length, size of the description, or JSON needing to be fully loaded before it can be parsed.

Human in the Loop

Section Duration: 29 minutes

Using Human in the Loop
02:46:00 - 02:56:00 View Transcript
Scott explains why Human in the Loop (HITL) is a critical safety pattern in AI systems where specific actions require explicit human approval before execution. Approvals can be synchronous, asynchronous, or tiered. Several factors for designing the approval flow are also discussed.
Interpreting Approvals using LLMs
02:56:01 - 03:01:47 View Transcript
Scott adds a method to the LLM module to run the approval check. The method receives the model, temperature, schema for the response format, and an array of messages. The messages provide a system prompt instructing the LLM to determine if the user approved the request and follows that message with the user's response.
Adding Approvals to Agent
03:01:48 - 03:15:25 View Transcript
Scott updates the agent by adding the approval flow. A condition is added to check if the tool call is the generateImage tool. If so, the execution is stopped, and the user's approval is requested.

Memory Management

Section Duration: 28 minutes

History Management Strategies
03:15:26 - 03:23:28 View Transcript
Scott explores strategies for history and memory management. The challenge of managing chat history in LLM applications is balancing maintaining context and managing token limits. Every token sent increases the cost and consumes context window space. Losing important context can severely impact the quality of responses.
Summarizing Messages
03:23:29 - 03:36:11 View Transcript
Scott implements summaries and a way to shorten the context size. Whenever the system has more than 10 messages, it will remove the oldest five messages and replace them with a summary of those messages.
Advanced RAG & Fine-Tuning
03:36:12 - 03:44:08 View Transcript
Scott shares some resources for building a more robust evaluation pipeline and advanced RAG fine-tuning techniques. He also discusses some advances in generative UIs.

Wrapping Up

Section Duration: 6 minutes

Wrapping Up
03:44:09 - 03:51:02 View Transcript
Scott wraps up the course with some tips for further researching emerging AI techniques and discusses Vercel's AI SDK.

Learn Straight from the Experts Who Shape the Modern Web

In-depth Courses
Industry Leading Experts
Learning Paths
Live Interactive Workshops

Get Unlimited Access Now

Course Description

Course Details

Rating

Learning Paths

Topics

Learn Straight from the Experts Who Shape the Modern Web

Table of Contents

Introduction

Improving LLMs with Evals

Retrieval Augmented Generation (RAG)

Structured Output

Human in the Loop

Memory Management

Wrapping Up

Learn Straight from the Experts Who Shape the Modern Web