Loading video player...
In this video, I walk through how I monitored important LLM runtime metrics using a custom GPU dashboard. This includes token throughput, total tokens in/out, processing speed, latency, and GPU behaviour under load. 📌 What you’ll learn: • How to expose LLM metrics • How to build a monitoring dashboard (Grafana) • How to read token-level performance signals • Tips for understanding LLM serving efficiency Perfect for SREs, MLOps engineers, and anyone running LLMs on GPUs. 👍 Like + Subscribe for more AI Infra & SRE content!