Loading video player...
How do you know if your LLM system is actually working before deployment? In this video, I show my two-level evaluation approach - offline testing with RAGAS and online monitoring in production. š What You'll Learn: - Why you need both offline and online evaluation - RAGAS framework for RAG evaluation - Retrieval metrics: Precision@K, Recall@K, MRR - Generation metrics: Accuracy, Faithfulness, Answer Relevance - Human evaluation for catching edge cases - Online monitoring: latency, error rate, user feedback - A/B testing for data-driven decisions - Two-layer debugging: retrieval vs generation š My Targets: - Precision@K: 80%+ - Recall@K: 90%+ - p95 Latency: under 1 second - Error rate: under 1% š ML End-to-End Series: - Video 1: End-to-End AI Development - Video 2: LLM Evaluation & Testing (this video) - Video 3: Production Deployment - Video 4: Monitoring & Iteration #LLM #RAGAS #MachineLearning #MLInterview #AIEngineer #Evaluation #MLOps