AI-curated developer content, daily. Quality videos and tutorials on AI, DevOps, Frontend, Backend, Web3, and more. Updated daily at 7:30 AM UTC.

Navigation

Home
All Feeds
How It Works

Resources

Contact Support
API Docs
API Status
Privacy Policy
Terms of Service

© 2026 DailyDevLists. All rights reserved.

All content belongs to their respective creators.

Mar 2

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation | DailyDevLists

Loading video player...

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford Online

90 days ago

1:49:25

AI Evaluation & Monitoring

Rank #1

Description

For more information about Stanford’s graduate programs, visit: https://online.stanford.edu/graduate-education November 21, 2025 This lecture covers: • LLM-as-a-judge overview • Best practices and benefits • Biases and pitfalls To follow along with the course schedule and syllabus, visit: https://cme295.stanford.edu/syllabus/ Chapters: 00:00:00 Introduction 00:07:08 Inter-rater agreement metrics 00:18:24 Rule-based metrics 00:21:00 METEOR, BLEU ROUGE 00:28:00 LLM-as-a-judge 00:33:44 Structured outputs 00:36:48 Variants 00:38:47 Position, verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15 Agent evaluation 01:23:50 Benchmarks 01:25:12 Knowledge with MMLU 01:29:34 Reasoning AIME, PIQA 01:33:57 Coding with SWE-bench 01:36:15 Safety with HarmBench 01:40:51 Agents with Tau-Bench Afshine Amidi is an Adjunct Lecturer at Stanford University. Shervine Amidi is an Adjunct Lecturer at Stanford University. View the course playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rOCXd21gf0CF4xr35yINeOy

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

December 4, 2025

Quality Rank

#1

AI Recommended