How to Actually Evaluate Your RAG System (Before It Lies to You) | DailyDevLists

Loading video player...

How to Actually Evaluate Your RAG System (Before It Lies to You)

Manish Malik

47 days ago

9:07

AI Evaluation & Monitoring

Rank #1

Description

Your RAG system is giving confident answers. But are they actually grounded in your documents? Most teams ship RAG to production without measuring a single metric. Today we fix that. In this video I break down the 3 metrics that actually matter — faithfulness, context recall, and answer relevance — and show you how to measure all three in 10 lines of code using RAGAS. 📓 Full evaluation notebook (run it on your own pipeline) → https://github.com/simplifyaimm/rag-evaluation-demo ────────────────────────────────── 🕐 TOPICS ────────────────────────────────── — The silent failure that shipped to production — Why "it seems fine" is not evaluation — The 3 ways RAG fails without you knowing — Faithfulness — your hallucination detector — Context Recall — your retrieval quality score — Answer Relevance — when answers miss the point — Live demo: RAGAS catches a hallucination in 10 lines — Per-question breakdown — why aggregates lie — India builder context: free eval with Ollama + AIKosh — Recap + what to measure first ────────────────────────────────── 🔗 RESOURCES MENTIONED ────────────────────────────────── ▶ RAGAS (free, open source) → https://docs.ragas.io ▶ Full demo notebook → https://github.com/simplifyaimm/rag-evaluation-demo ▶ Ollama (run LLMs locally, free) → https://ollama.ai ▶ AIKosh GPU Portal (₹65/hr T4) → https://aikosh.in ▶ DeepEval (CI/CD evaluation) → https://docs.confident-ai.com ▶ TruLens (production monitoring) → https://www.trulens.org ────────────────────────────────── 📚 RAG SYSTEMS PLAYLIST ────────────────────────────────── Watch the full series in order: 🔗 RAG Systems: From Zero to Production → https://www.youtube.com/playlist?list=PLT9Lk6Efplu5QFjZoWeSb_Ndh6X-08lYL ◀ Previous: Reranking Explained → https://youtu.be/zUOGotiUFvg?si=xx0JnfHM2bXjO65V ▶ Next: Agentic RAG (coming next week) ────────────────────────────────── ⚡ WHAT YOU'LL LEARN ────────────────────────────────── ✔ Why RAG systems fail silently without evaluation ✔ The difference between faithfulness, context recall, and answer relevance ✔ How RAGAS scores each metric and what to do when one is low ✔ Why aggregate scores hide your worst failures — always use per-question breakdowns ✔ How to run evaluation for free using Ollama locally ✔ Indian GPU context: full eval suite costs under ₹10 on AIKosh T4 ────────────────────────────────── 🇮🇳 FOR INDIAN AI BUILDERS ────────────────────────────────── RAGAS uses OpenAI by default — but you don't have to pay for evaluation. Run your eval LLM locally with Ollama (Llama 3.1 or Mistral 7B) — completely free, works natively with RAGAS. Or use a T4 GPU on AIKosh at ₹65/hr — one full eval suite costs under ₹10. Full setup guide in the notebook linked above. ────────────────────────────────── 🔔 SUBSCRIBE ────────────────────────────────── One deep-dive AI engineering video every week. No hype. No beginner fluff. Just the systems that actually work in production. Subscribe → https://www.youtube.com/channel/UCcvtM6qWvrhWBMLmczsOy8Q ────────────────────────────────── 🏷 TAGS ────────────────────────────────── #RAGEvaluation #RAGAS #tutorial, #faithfulnessscore, #contextrecall, #answerrelevance, #RAGpipelinetesting, #LLMevaluationframework, #hallucinationdetectionRAG, #DeepEvalvsRAGAS, #TruLensmonitoring, #AIengineeringindia, #RAGsystemindia, #Ollamaevaluation, #AIKoshGPU, #retrievalaugmentedgenerationevaluation, #simplifyAIwithMM

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

Quality Rank

#1

AI Recommended