Loading video player...
For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education November 21, 2025 This lecture covers: • LLM-as-a-judge overview • Best practices and benefits • Biases and pitfalls To follow along with the course schedule and syllabus, visit: https://cme295.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:07:08 Inter-rater agreement metrics 00:18:24 Rule-based metrics 00:21:00 METEOR, BLEU ROUGE 00:28:00 LLM-as-a-judge 00:33:44 Structured outputs 00:36:48 Variants 00:38:47 Position, verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15 Agent evaluation 01:23:50 Benchmarks 01:25:12 Knowledge with MMLU 01:29:34 Reasoning AIME, PIQA 01:33:57 Coding with SWE-bench 01:36:15 Safety with HarmBench 01:40:51 Agents with Tau-Bench Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOCXd21gf0CF4xr35yINeOy