
Engageware: The Future of Agentic AI in Financial Services | Money20/20 USA
Financial IT
In this comprehensive talk (adapted from my presentation at ODSC), I provide a practical, hands-on framework for evaluating your GenAI and LLM applications. You'll learn the core technical nuances that make GenAI evaluation tricky, get a step-by-step workflow for creating your own evaluation datasets and automated tests, and see how to apply these principles to even the most complex agentic systems. This isn't about theory; it's about giving you the tools and confidence to start building better AI today. 0:00 - Introduction 0:44 - The High Cost of GenAI Failures 2:46 - Why Evaluation is a Must-Have Skill 5:58 - The Core Problem: Why GenAI is Hard to Evaluate 6:41 - Input Sensitivity (Prompts & System Messages) 11:08 - Model Inconsistency (Drift & Non-Determinism) 18:17 - A Practical Workflow for Evaluation 21:53 - Building Better Tests (By Talking to Experts!) 25:07 - Error Analysis & Avoiding LLM Judge Bias 29:21 - Advanced Technique: Using Unit Tests 34:50 - How to Evaluate Complex Agentic Systems 39:25 - Final Thoughts & Your Call to Action Links to: Slides: https://github.com/rajshah4/LLM-Evaluation/blob/main/presentation_slides/Evaluation_ODSC_Oct_2025.pdf References: https://github.com/rajshah4/LLM-Evaluation/blob/main/presentation_slides/references_Evaluation_ODSC_Oct2025.md ━━━━━━━━━━━━━━━━━━━━━━━━━ ★ Rajistics Social Media » ● Home Page: http://www.rajivshah.com ● LinkedIn: https://www.linkedin.com/in/rajistics/ ● Reddit: https://www.reddit.com/r/rajistics/ ━━━━━━━━━━━━━━━━━━━━━━━━━