Evaluating AI Models with Microsoft Foundry | MVP Unplugged | DailyDevLists

Loading video player...

Evaluating AI Models with Microsoft Foundry | MVP Unplugged

Microsoft Developer

73 days ago

39:04

AI Evaluation & Monitoring

Rank #1

Description

Welcome to the next MVP Unplugged, where Microsoft MVPs share real-world projects and insights from the field! In this episode, host Justin Garrett sits down with Microsoft MVP Veronika Kolesnikova to explore how she picks the right AI model for common developer tasks using Evaluations in Microsoft Foundry, complete with query-context-results, Python SDK workflows, and measures for evaluation. We'll explore some fun datasets around hiking, acrobatics, and common developer terminology too! Veronika walks through her full process—from generating datasets with GitHub Copilot, to running multi-model evaluations, to analyzing outputs in the Microsoft Foundry portal. Whether you’re building agents, experimenting with models, or ensuring AI reliability at scale, this episode breaks down a repeatable and practical approach you can use today. In this episode, you’ll learn ✅How to build custom evaluation datasets using AI ✅How to compare outputs across models like GPT‑5, Grok-4, and Claude Sonnet 4.5 ✅How to run Evaluations programmatically using the new Microsoft Foundry SDK ✅How to measure AI performance using F1, METEOR, similarity scores, and thresholds ✅Tips for choosing the right model for your AI agent ✅Practical debugging and iteration strategies for model quality ✅How to store and version evaluation datasets in Microsoft Foundry 🔗 Chapter Markers 00:00 – Intro to MVP Unplugged 00:20 – Meet Microsoft MVP Veronika Kolesnikova 02:20 – Introducing Microsoft Foundry Evaluations 03:03 – Circus, Hiking & Engineering: Creating AI Data Sets 04:55 – Inside the Jupyter Notebooks & Evaluation Setup 06:11 – Connecting to Azure AI Projects 07:55 – Dataset Structure: Query, Context, Ground Truth 09:14 – Why Evaluations Matter for Real AI Projects 10:42 – Exploring the Foundry UI (Classic + New Portal) 12:04 – Uploading and Versioning Data Sets in Foundry 13:37 – Evaluation Results: GPT‑5, Claude, Grok 18:29 – Thresholds, System Prompts & Model Behavior 21:07 – Deep Dive: Quad vs GPT‑5 Performance 23:17 – Short Answers vs Long Answers & Scoring 24:20 – Circus Data Set Analysis 25:25 – Software Engineering Data Set Results 27:51 – Documentation & Learning Resources 29:04 – Running Evaluations with the New Foundry SDK 31:12 – Differences Between Old & New SDK 32:17 – How Veronika Chooses the Best Model 35:22 – GitHub Copilot for Model Testing 36:28 – Microsoft Learn Resources 37:26 – What Veronika Wants AI To Do Next 38:32 – Final Advice for Developers 39:03 – Closing 👉 Subscribe for more MVP insights and AI-powered development tips! #microsoftdeveloper #MVPUnplugged #MicrosoftFoundry #AIEvaluations #AzureAI #GitHubCopilot #AIModels #MachineLearning #Claude #Grok #GPT5 #DeveloperCommunity #AIEngineering #JupyterNotebooks #pythondevelopers 🔗 Resources & Links 🎁Free Microsoft Foundry Trial https://aka.ms/devrelft 📚 Microsoft Foundry Observability https://learn.microsoft.com/azure/ai-foundry/concepts/observability 🧪 Foundry Model Leaderboard https://ai.azure.com/explore/models/leaderboard 📘 Evaluating AI Models https://learn.microsoft.com/azure/ai-services/foundry/evaluations 💻 Veronika’s GitHub Repo (Evaluation Project) https://github.com/Veroni4ka/RAI_notebooks/ 🚀 Try GitHub Copilot https://github.com/features/copilot MVP Unplugged Playlist https://youtube.com/playlist?list=PLlrxD0HtieHhclud3yVB88znZPKCZYX_8&si=4HoycKJyUcl1qwV- About Veronika Veronika Kolesnikova is a Microsoft MVP in AI and a Principal AI Engineer at Liberty Mutual in Boston MA. Veronika started her career as a QA engineer and then moved to Software engineering and recently to AI engineering. She's an international public speaker, Boston Azure AI user group co-organizer and a tech mentor. Follow Veronika on LinkedIn About Justin Justin Garrett is host of MVP Unplugged, Principal PM in Developer Relations which is part of Microsoft Cloud + AI. Justin’s career at Microsoft also spans 20 years across Windows, Bing, Edge, Web Platform, Students/ University Relations, Cloud Advocacy, and most recently a leader of the MVP Program at Microsoft. Follow Justin on LinkedIn. About MVP Unplugged About MVP Unplugged AI is reshaping how we work and live. And for developers and technologists alike, the pace of innovation–new tools, new models, patterns & practices, and even culture itself–is changing even faster. It can be difficult to know what to learn, what to prioritize, what truly lives up to the promise of unlocking creativity and boosting productivity. Join Justin Garrett, Principal PM in DevRel and leader in the Microsoft MVP Program as he speaks with MVPs to share what they’re learning using a real-world project in this conversational series. In each episode, they’ll experiment, code, and share honest insights that can make a real difference for the audience. Justin and his guests share stories of navigating technological change and look ahead for what’s next in tech. Come discover with us how to thrive in this era of AI!

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

January 27, 2026

Quality Rank

#1

AI Recommended