Loading video player...
Production RAG V2: Advanced Patterns for Scale A comprehensive 70-minute technical deep-dive covering advanced RAG architectures that go beyond basic retrieval. This video builds on the V1 foundation to address real-world failure modes with adaptive, agentic systems. ================ What You Will Learn: ================ Advanced Document Chunking - Semantic chunking with embedding similarity detection - Hierarchical (parent-child) chunking for context preservation - Contextual chunking with LLM-generated preambles - Vision-based extraction for complex document formats Query Understanding & Transformation - Query rewriting for vocabulary alignment - Query decomposition for multi-part questions - HyDE (Hypothetical Document Embeddings) - Building adaptive transformation pipelines Agentic RAG Architectures - Router pattern for query classification - Tool-use and function calling for dynamic retrieval - Self-reflection and adaptive retrieval loops - Multi-hop reasoning across documents - Multi-agent orchestration patterns Knowledge Graphs for RAG - Entity and relationship extraction - Hybrid graph + vector retrieval - When graph retrieval adds value vs. overhead Multi-Modal RAG - Vision-language models for document understanding - Cross-modal embedding and retrieval (CLIP) - Complete multi-modal architecture design Evaluation Deep-Dive - Component-level metrics (Recall@K, MRR, NDCG) - LLM-as-Judge patterns and limitations - Synthetic test set generation - Continuous online evaluation - Hallucination detection and faithfulness verification Fine-Tuning for RAG - Embedding model fine-tuning with hard negatives - Reranker fine-tuning for domain precision - Generator fine-tuning for faithfulness Production Hardening - Access control and multi-tenancy - Incremental indexing and cache invalidation - Guardrails and safety - Latency and cost optimization - Quality-focused monitoring =========== Timestamps: =========== 00:00 - Introduction: Production RAG V2 Overview 01:19 - V1 Architecture Recap 02:50 - Where V1 Breaks: Five Failure Categories 04:46 - V2 Architecture Overview 06:26 - Section 2: Advanced Chunking Strategies 06:30 - Recursive Chunking Limitations 08:28 - Semantic Chunking 10:23 - Hierarchical (Parent-Child) Chunking 12:19 - Contextual Chunking 14:04 - Handling Complex Document Formats 15:53 - Chunking Strategy Decision Framework 17:46 - Section 3: Query Understanding & Transformation 17:51 - The Raw Query Problem 19:50 - Query Rewriting 21:33 - Query Decomposition 23:00 - HyDE: Hypothetical Document Embeddings 24:52 - Query Transformation Pipeline 26:34 - Section 4: Agentic RAG 26:39 - From Static Pipelines to Adaptive Agents 28:11 - The Router Pattern 29:31 - Tool Use for Retrieval 30:56 - Self-Reflection and Adaptive Retrieval 32:15 - Multi-Hop Retrieval 33:41 - Multi-Agent Architectures 34:49 - Agentic RAG Decision Framework 36:12 - Section 5: Knowledge Graphs 36:14 - What Vector Search Cannot Do 37:34 - Building a Knowledge Graph 39:09 - Hybrid Graph + Vector Retrieval 40:17 - Knowledge Graphs at Scale 41:31 - Section 6: Multi-Modal RAG 41:33 - The Multi-Modal Challenge 42:56 - Vision Models for Document Understanding 44:15 - Multi-Modal Embedding and Retrieval 45:32 - Multi-Modal Architecture 46:47 - Section 7: Evaluation Deep-Dive 46:51 - Why Evaluation is Hard 48:06 - Retrieval Evaluation Metrics 49:28 - Generation Evaluation 50:35 - LLM-as-Judge 51:54 - Synthetic Test Set Generation 53:12 - Continuous Online Evaluation 54:30 - Hallucination Detection 55:41 - Section 8: Fine-Tuning for RAG 55:43 - Why Fine-Tuning Matters 56:41 - Fine-Tuning Embeddings 57:48 - Fine-Tuning Rerankers 58:32 - Fine-Tuning Generators 59:18 - Section 9: Production Hardening 59:21 - Access Control and Multi-Tenancy 01:01:20 - Guardrails and Safety 01:02:15 - Latency Optimization 01:03:14 - Cost Optimization 01:04:08 - Quality-Focused Monitoring 01:04:54 - Deployment Checklist 01:05:47 - Complete V2 Architecture Revisited 01:06:51 - Maturity Model: Where to Start ========= About Me: ========= I'm Mukul Raina, a Senior Software Engineer and Tech Lead at Microsoft, with a Master's in Computer Science from the University of Oxford. On this channel, I create technical deep-dives on System Design and ML/AI architectures. #RAG #AgenticRAG #SystemDesign #AIEngineering #MachineLearning #LLM #VectorDatabase #KnowledgeGraphs #ProductionML #MLOps #RetrievalAugmentedGeneration