Loading video player...
LLMs started as tools we called. Now they’re agents that decide: for our users, for our engineering, for the industry. As LLM systems gain autonomy, two concerns that used to be separate, evaluation and monitoring, converge into a single continuous loop. Drawing from production experience building AI-powered healthcare systems, this talk presents a practical framework for evaluating and monitoring LLM systems at every stage of autonomy. We’ll cover how to define what failure actually means for your users (it’s not what you think), how to build a root cause taxonomy that tells you exactly where to invest, and how to turn a manual investigation into a self-improving monitoring pipeline. We’ll also explore what changes when the AI becomes more autonomous: agents evaluating agents, step-level quality signals, and the emerging pattern of self-healing systems. 🎙️ New to streaming or looking to level up? Check out StreamYard and get $10 discount! 😍 https://streamyard.com/pal/d/5238892701286400