Loading video player...
Multi-agent systems promise specialized expertise and parallel processing, but they are hard to debug. When using CrewAI's multi-agent framework, comprehensive evaluations are necessary to ensure multi-agent systems are consistently reliable in production. Watch our workshop with CrewAI to learn how to evaluate multi-agent systems that actually work in production. You'll learn a practical, metric-driven approach to preventing failures by instrumenting the agent to monitor action completion, tool selection, latency, and user satisfaction. We walk through a real-world CrewAI implementation and how observability enables root cause analysis and systematic fixes. You'll see exactly where agents lose context during handoffs, when tool selection breaks down, and how to streamline your architecture. You'll learn: - An AI eval playbook purpose-built for multi-agent challenges - How to trace root causes across agent handoffs with session, step, and system-level metrics - How to use CrewAI’s orchestration framework with Galileo's observability platform to create reliable multi-agent systems 0:00 - Introduction & Welcome 1:03 - Why 90% of AI Agents Fail to Reach Production 4:00 - From Prototype to Production: The Agent Operations Framework 7:00 - Strategic Thinking: Moving Beyond Individual Projects 9:00 - Agent Workflows: Crews vs. Flows 12:00 - Crew AI Studio Demo: Building Agents with Copilot 13:00 - What is an Agent? Defining Roles, Tools, and Tasks 17:00 - Live Demo: Creating a Product Research Crew 19:00 - Introduction to Galileo Observability 22:00 - The Challenge: Making Multi-Agent Systems Production-Ready 25:00 - Galileo + Crew AI Integration: Two Lines of Code 27:00 - Live Observability Demo: Detecting Issues Before Production 29:00 - Agent Graphs & Automatic Issue Detection 31:00 - Why Multi-Agent Systems Outperform Single Agents 33:00 - Evaluation Metrics: Action Advancement, Tool Efficiency & More 36:00 - Q&A: Agent Discovery, MCP Integration & Tool Usage 42:00 - Q&A: Crew AI Tracing vs. Full Observability Platforms 45:00 - Q&A: Evaluation Metrics & Ground Truth Requirements 52:00 - Q&A: Deciding When to Use Multiple Agents 55:00 - Roadmap Preview & Closing Remarks 💬 CONNECT WITH US ► Follow us on X: https://x.com/rungalileo ► Connect on LinkedIn: https://www.linkedin.com/company/galileo-ai/ ► Have questions? Reach out at info@galileo.ai 🚀 GET STARTED ► Try Galileo for Free: https://app.galileo.ai/sign-up #evaluation #aievals #evals #aievaluation #crewai #agents #multiagentsystems #agentevaluations