Loading video player...
RAG Evaluation covers how to measure and improve RAG system quality. It outlines key evaluation challenges — lack of ground truth, faithfulness assessment, and no standardized benchmarks — then proposes a 5-step framework: setup, create evaluation dataset, select metrics, compute scores, and optimize. Metrics span both retrieval (context relevance, recall, precision) and generation (faithfulness, answer relevance, semantic similarity). Amazon Bedrock provides a fully managed evaluation tool with visual reports and LLM-as-judge scoring.