Loading video player...
There is no evals without observability. To identify failure modes and improve agent quality, you need granular visibility into complex agentic trajectories -- including model responses, retrieval steps, and tool calls -- along with the ability to monitor production metrics like latency, cost, token usage, and evaluation scores. In this cookbook, we will discuss how a robust observability process is critical to shipping reliable AI. 00:00 - Introduction 00:46 - Analyze logs and traces 04:48 - Search and filter through logs 07:00 - Online evals on logs 11:03 - Set up alerts on logs 12:11 - Curate datasets from logs