Loading video player...
Your RAG system is giving confident answers. But are they actually grounded in your documents? Most teams ship RAG to production without measuring a single metric. Today we fix that. In this video I break down the 3 metrics that actually matter ā faithfulness, context recall, and answer relevance ā and show you how to measure all three in 10 lines of code using RAGAS. š Full evaluation notebook (run it on your own pipeline) ā https://github.com/simplifyaimm/rag-evaluation-demo āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š TOPICS āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā The silent failure that shipped to production ā Why "it seems fine" is not evaluation ā The 3 ways RAG fails without you knowing ā Faithfulness ā your hallucination detector ā Context Recall ā your retrieval quality score ā Answer Relevance ā when answers miss the point ā Live demo: RAGAS catches a hallucination in 10 lines ā Per-question breakdown ā why aggregates lie ā India builder context: free eval with Ollama + AIKosh ā Recap + what to measure first āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š RESOURCES MENTIONED āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā¶ RAGAS (free, open source) ā https://docs.ragas.io ā¶ Full demo notebook ā https://github.com/simplifyaimm/rag-evaluation-demo ā¶ Ollama (run LLMs locally, free) ā https://ollama.ai ā¶ AIKosh GPU Portal (ā¹65/hr T4) ā https://aikosh.in ā¶ DeepEval (CI/CD evaluation) ā https://docs.confident-ai.com ā¶ TruLens (production monitoring) ā https://www.trulens.org āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š RAG SYSTEMS PLAYLIST āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā Watch the full series in order: š RAG Systems: From Zero to Production ā https://www.youtube.com/playlist?list=PLT9Lk6Efplu5QFjZoWeSb_Ndh6X-08lYL ā Previous: Reranking Explained ā https://youtu.be/zUOGotiUFvg?si=xx0JnfHM2bXjO65V ā¶ Next: Agentic RAG (coming next week) āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā” WHAT YOU'LL LEARN āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā Why RAG systems fail silently without evaluation ā The difference between faithfulness, context recall, and answer relevance ā How RAGAS scores each metric and what to do when one is low ā Why aggregate scores hide your worst failures ā always use per-question breakdowns ā How to run evaluation for free using Ollama locally ā Indian GPU context: full eval suite costs under ā¹10 on AIKosh T4 āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š®š³ FOR INDIAN AI BUILDERS āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā RAGAS uses OpenAI by default ā but you don't have to pay for evaluation. Run your eval LLM locally with Ollama (Llama 3.1 or Mistral 7B) ā completely free, works natively with RAGAS. Or use a T4 GPU on AIKosh at ā¹65/hr ā one full eval suite costs under ā¹10. Full setup guide in the notebook linked above. āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š SUBSCRIBE āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā One deep-dive AI engineering video every week. No hype. No beginner fluff. Just the systems that actually work in production. Subscribe ā https://www.youtube.com/channel/UCcvtM6qWvrhWBMLmczsOy8Q āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā š· TAGS āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā #RAGEvaluation #RAGAS #tutorial, #faithfulnessscore, #contextrecall, #answerrelevance, #RAGpipelinetesting, #LLMevaluationframework, #hallucinationdetectionRAG, #DeepEvalvsRAGAS, #TruLensmonitoring, #AIengineeringindia, #RAGsystemindia, #Ollamaevaluation, #AIKoshGPU, #retrievalaugmentedgenerationevaluation, #simplifyAIwithMM