Loading video player...
How do you actually trust an AI agent in production? In this talk from MLOps.WTF Meetup #8, Dmitry Leyko (AI Engineer at ThinkMoney) digs into the evaluations and observability practices that make agentic systems reliable ā especially in a regulated, compliance-heavy environment like financial services. Dmitry walks through ThinkMoney's real-world agentic AI project, covering how they built robust evaluation pipelines and observability across dev, QA, and production ā with a particular focus on ensuring their AI chatbot can never cross financial advice guardrails. What you'll learn: How to use DeepEval for evaluations and regression testing in an agentic pipeline Setting up Langsmith for LLM observability (including why AWS observability didn't cut it in the London region) Building financial guardrail evaluations to meet compliance requirements Running evaluations across CI pipelines, QA environments, and production observability Why evals need to span the full deployment lifecycle ā not just pre-launch If you're building or maintaining agentic AI in a regulated industry, or just trying to make your LLM-powered systems more trustworthy, this is essential viewing. šļø Speaker: Dmitry Leyko, ThinkMoney š MLOps.WTF Meetup #8 ā Manchester, March 2026 š Join the MLOps.WTF community: https://mlops.wtf #LLMOps #AgenticAI #AIEvaluations #LLMObservability #DeepEval #Langsmith #FinancialServicesAI #MLOps #MachineLearning #AICompliance #LLMTesting #AIInProduction #ResponsibleAI #GenAI