Loading video player...
This short podcast-style discussion explains how LLM-as-judge can make weak RAG systems look strong and good systems look broken, especially when judges rely on plausibility instead of evidence. It covers why this happens, how it can hide retrieval failures, and why teams may end up optimizing the wrong part of the stack.