Loading video player...
Scaling LLMs to production introduces critical challenges: How do you orchestrate multi-node execution? Optimize GPU scheduling? Achieve low-latency data transfers between nodes? This session reveals how NVIDIA Dynamo and Azure Kubernetes Service solve these challenges together. Through a real-world e-commerce recommendation scenario, discover how open-source frameworks and managed Kubernetes unlock enterprise-grade inference performance. Learn to harness advanced hardware like GB200 NVL72, optimize distributed workloads on AKS, and deliver cost-efficient AI at scale to turn infrastructure complexity into competitive advantage. #microsoftreactor #learnconnectbuild [eventID:26859]