How to Build and Evaluate AI systems in the Age of LLMs - Hugo Bowne-Anderson | DailyDevLists

Loading video player...

How to Build and Evaluate AI systems in the Age of LLMs - Hugo Bowne-Anderson

DataTalksClub ⬛

26 days ago

1:01:32

AI Evaluation & Monitoring

Rank #3

Description

In this talk, Hugo Bowne-Anderson, an independent data and AI consultant, educator, and host of the podcasts Vanishing Gradients and High Signal, shares his journey from academic research and curriculum design at DataCamp to advising teams at Netflix, Meta, and the US Air Force. Together, we explore how to build reliable, production-ready AI systems—from prompt evaluation and dataset design to embedding agents into everyday workflows. You’ll learn about: - How to structure teams and incentives for successful AI adoption - Practical prompting techniques for accurate timestamp and data generation - Building and maintaining evaluation sets to avoid “prompt overfitting” - Cost-effective methods for LLM evaluation and monitoring - Tools and frameworks for debugging and observing AI behavior (Logfire, Braintrust, Phoenix Arise) - The evolution of AI agents—from simple RAG systems to proactive, embedded assistants - How to escape “proof of concept purgatory” and prioritize AI projects that drive business value - Step-by-step guidance for building reliable, evaluable AI agents LINKS - Escaping POC Purgatory: Evaluation-Driven Development for AI Systems - https://www.oreilly.com/radar/escaping-poc-purgatory-evaluation-driven-development-for-ai-systems/ - Stop Building AI Agents - https://www.decodingai.com/p/stop-building-ai-agents - How to Evaluate LLM Apps Before You Launch - https://www.youtube.com/watch?si=90fXJJQThSwGCaYv&v=TTr7zPLoTJI&feature=youtu.be - My Vanishing Gradients Substack - https://hugobowne.substack.com/ - Building LLM Applications for Data Scientists and Software Engineers - https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=datatalksclub TIMECODES: 00:00:00 Episode Introduction & Guest Bio 00:01:12 Podcasts Overview: Vanishing Gradients and High Signal 00:02:04 Career Journey: Academia to Data science and DevRel 00:03:57 Freelance Consulting, Advising, and Teaching Focus 00:07:11 Consulting vs Advisory: Hands-on Work and Organizational Advice 00:08:24 Incentivizing AI Adoption: Loss Aversion and Dedicated Experimentation Time 00:11:11 Practical Prompting Use Cases: Summaries, CSVs, and Role-Based Prompts 00:12:22 Timestamp Generation Tools & Workflows (Gemini, Descript) 00:13:56 Quality Control Pattern: Evaluator–Optimizer for Generated Outputs 00:17:38 Scaling Transcript Work: Automation and GitHub Actions 00:23:00 Gold Test Sets: Size, Cost, and Representativeness for Evaluation 00:26:43 Failure Analysis: Categorizing Errors and Prioritizing Retrieval Fixes 00:27:38 Monitoring & Vibe Coding: Logging, Traces, and Debuggable MVPs 00:33:14 Embedded Agents and IDE Integrations: Cursor, Copilot, and Slack Workflows 00:40:12 Agentic Value Beyond Chat: Actions, Documents, and Automation 00:44:26 Prioritizing RAG for Business Impact: Quick Wins over Moonshots 00:49:21 Chunking Strategies and Context Rot: Sliding Windows and Summarizers 00:50:19 Adding Tooling vs. Simplicity: When to Move from RAG to Agents 00:53:09 Practical Project Example: Building an Email Assistant 00:56:21 Four-Step Framework for Building Agents (Problem, Start Small, Data, Evaluation) 00:57:41 Memory Design: Multi-Turn Conversation Memory vs. Retrieval-Based Memory 01:00:55 Episode Wrap-Up and Next Steps Connect with Hugo - Twitter - https://x.com/hugobowne - Linkedin - https://www.linkedin.com/in/hugo-bowne-anderson-045939a5/ - Github - https://github.com/hugobowne - Website - https://hugobowne.github.io/ Connect with Alexey - Twitter - https://twitter.com/Al_Grigor - Linkedin - https://www.linkedin.com/in/agrigorev/ Check our free online courses: - ML Engineering course - http://mlzoomcamp.com - Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp - MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp - LLM course - https://github.com/DataTalksClub/llm-zoomcamp - Open-source LLM course: https://github.com/DataTalksClub/open-source-llm-zoomcamp - AI Dev Tools course: https://github.com/DataTalksClub/ai-dev-tools-zoomcamp 👉🏼 Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html 👋🏼 Support/inquiries If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev If you’re a company, reach us at alexey@datatalks.club #AI #ArtificialIntelligence #MachineLearning #DataScience #AIEngineering #PromptEngineering #RAG #AIAgents #LLM #GenerativeAI #AgenticAI #AIDevelopment #AIEvaluation #AIMonitoring #AIConsulting #MLOps #AIEthics #TechTalk #HugoBowneAnderson #AITools

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

November 14, 2025

Quality Rank

#3

AI Recommended