Loading video player...
Get the Book "Evals for AI Engineers" here: https://learning.oreilly.com/library/view/evals-for-ai/9798341660717/ If your AI system can't find the right information, nothing else matters. The model, the prompts, the agent logic, none of it compensates for bad retrieval. Doug Turnbull has been doing search since before RAG existed. He shipped search at Shopify, Reddit, and Wikipedia, and wrote AI Powered Search. In this conversation he covers why evaluating retrieval in RAG is fundamentally different from traditional search, and the "seesaw" pattern he uses to keep retrieval improving without breaking what already works. Timestamps: 00:00 Who is Doug Turnbull and why retrieval matters 01:26 Two types of evals most teams confuse 04:36 How search metrics like DCG and NDCG actually work 08:49 Is ranking even the right thing to measure for RAG? 12:27 Why RAG breaks traditional search evaluation 12:56 The complexity ladder from search results to autonomous agents 18:11 The first thing to do on any RAG team 19:09 Why vector databases waste your first six months 20:00 Defining a search tool's promise to the agent 23:14 The seesaw pattern for growing your eval set 28:59 Why blind side-by-side comparisons beat labeled results 34:30 Business outcomes as guardrails, not objectives 37:02 Inner loop vs outer loop of retrieval quality Connect with Hamel: Website ► https://hamel.dev LinkedIn ► https://www.linkedin.com/in/hamelhusain/ Twitter/X ► https://x.com/hamelhusain Instagram ► https://www.instagram.com/hamelsmu/ Tik Tok ► https://www.tiktok.com/@hamel_husain Connect with Doug Turnbull: LinkedIn ► https://www.linkedin.com/in/softwaredoug/ Twitter/X ► https://x.com/softwaredoug