Loading video player...
00:00 - Introduction: Why AI Testing is Hard 01:30 - The "Seasoned Engineer" Paradigm: Deterministic vs. Probabilistic 05:30 - The Hallucination Problem in Company Policies 07:15 - Why Traditional Unit Tests Fail for LLMs 10:00 - Groundedness & Relevance 12:20 - Real-World Scenario: Financial Chatbot Evaluation 14:10 - Designing Quality Gates for Production 16:03 - Anatomy of a Test Data Set (Ground Truth) 17:40 - The Ragas Framework: Automated AI Evaluation 19:08 - Matrix Deep Dive: Precision, Recall, & Faithfulness 21:50 - Integrating Evaluation into CI/CD Pipelines 23:55 - Compliance Testing: Hallucination & Refusal Checks 25:10 - Diagnosing Retrieval Failures 27:10 - Optimization Strategies: Hybrid Search, Chunking & Metadata 28:45 - Building a Full CICD Evaluation Report 29:25 - Conclusion & Ragas Framework Homework