Prompt Registry, Tracing, LLM Judges: The Complete MLflow Upgrade #mlflow #ai | DailyDevLists

Loading video player...

Prompt Registry, Tracing, LLM Judges: The Complete MLflow Upgrade #mlflow #ai

Rahul Pandey

53 days ago

25:54

AI Evaluation & Monitoring

Rank #3

Description

Remember when being an ML engineer meant building the entire system to get a model to production? Versioning datasets. Setting up experiment tracking. Building deployment pipelines. Monitoring drift. Governing the full lifecycle from training to serving. The whole MLOps stack was built around this loop. Now? You call an API. The model lives on someone else's servers. Your most important artifact is a prompt — a string — and when it breaks production on a Thursday night, you have zero observability into why. The game changed. The tooling didn't. Until now. MLflow just shipped features built for the LLM era: 🔹 AI Gateway — single endpoint across any model provider 🔹 Tracing — see inside every LLM call, every agent handoff 🔹 Prompt Registry — version and manage prompts like code 🔹 Evaluation Datasets — structured data to measure your agent's quality 🔹 Built-in & Custom Judges — automated & custom scoring tailored to your domain 🔹 Single & Multi-turn Evaluation — test your agent across full conversations, not just single responses In this video, I build a complete multi-agent school system from scratch using LangGraph + MLflow, adding one feature at a time. Every line of code runs. Timestamps: 00:00 - The Shift to Agentic Systems 02:05 - MLflow UI Overview 05:41 - MLflow AI Gateway 08:11 - MLflow Autologging and Tracing 11:24 - Prompt Registry 13:12 - Multi-Agent System Development 16:45 - Evaluation Datasets 18:00 - Built-in Judges 19:00 - Custom Judges 20:10 - Multi-Turn Simulations 24:50 - Conclusion and Key Takeaways Documentations: https://mlflow.org/ Repo: https://github.com/iRahulPandey/multi-agent-skool-system.git #MLflow #LLMOps #MLOps #AI #LangGraph

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

Quality Rank

#3

AI Recommended