Loading video player...
In the third session of our Python + Agents series, we’ll focus on two essential components of building reliable agents: **observability** and **evaluation**. We’ll begin with observability, using OpenTelemetry to capture traces, metrics, and logs from agent actions. You'll learn how to instrument your agents and use a local Aspire dashboard to identify slowdowns and failures. From there, we’ll explore how to evaluate agent behavior using the Azure AI Evaluation SDK. You’ll see how to define evaluation criteria, run automated assessments over a set of tasks, and analyze the results to measure accuracy, helpfulness, and task success. By the end of the session, you’ll have practical tools and workflows for monitoring, measuring, and improving your agents—so they’re not just functional, but dependable and verifiably effective. Prerequisites: To follow along with the live examples, sign up for a free GitHub account. If you are brand new to generative AI with Python, start with [our 9-part Python + AI series](https://aka.ms/pythonai/rewatch), which covers LLMs, embedding models, RAG, tool calling, MCP, and more. 📌 This event is a part of a series, learn more here: https://aka.ms/PythonAgents/YT Microsoft Agent Framework: https://learn.microsoft.com/agent-framework/overview/agent-framework-overview/?wt.mc_id=youtube_26690_organicsocial_reactor #microsoftreactor #learnconnectbuild [eventID:26690]