How the AI Team Scaled Grafana Assistant Context and Built AI Observability | DailyDevLists

Loading video player...

How the AI Team Scaled Grafana Assistant Context and Built AI Observability

Grafana

5 hours ago

25:21

AI Evaluation & Monitoring

Rank #2

Description

Grafana Assistant turned hours of dashboard work into minutes — but scaling an AI agent from impressive demo to reliable production system requires solving harder problems than just making it smarter. In this GrafanaCON session, Grafana engineers Ivana and Yasir walk through the three challenges they had to overcome as Assistant grew from a team of nine to 90+ contributors and thousands of active users. Context engineering comes first: Assistant now auto-discovers your services weekly via Prometheus, Loki, and Tempo (Memories), captures team runbooks as reusable agentic workflows (Skills with MCP support), and uses hooks and one-click actions to understand exactly what you're working on without requiring you to re-explain your environment every session. The second challenge is reliable iteration — when they tested consistency (not just whether Assistant passes once in three tries, but every single time), the numbers dropped sharply, and simply adding more instructions didn't fix it; so the team built a self-improvement loop where coding agents like Claude Code run benchmarks, analyze thousands of transcripts, and propose targeted prompt changes, while humans review and merge. To close the gap between benchmark performance and real-world behavior, Grafana built AI Observability (now in public preview) — a platform that runs online evaluations on 20% of production Grafana Assistant conversations, scoring for groundedness, prompt injection attempts, and response quality, with alerting when pass rates fall below 80%. 0:00 Introduction: Grafana Assistant Grows Up 1:02 A Year of Growth: 4K PRs, 90+ Contributors 1:46 Three Scaling Challenges 2:10 Challenge 1: Context Engineering 3:44 Memories: Auto-Discovering Your Environment 4:28 Skills: Team Knowledge as Repeatable Workflows 6:00 Hooks, Actions & Image Attachments 7:23 Managing the Context Window 9:00 Challenge 2: The Consistency Problem 10:46 Using AI to Improve AI 11:10 The Self-Improvement Loop Explained 12:15 Evaluating Agents: Deterministic, LLM & Fact-Based 14:28 o11y-bench: Open Observability Benchmark 15:02 Reflection & Change: Closing the Loop 17:36 Challenge 3: AI Observability in Production 18:50 Live Demo: Grafana AI Observability 19:44 Online Evaluations: Groundedness & Security 21:38 Alerts & Drilling Into Failures 24:31 Closing: From Demo to Production-Ready Links/resources: Learn about Grafana Assistant: https://grafana.com/products/cloud/ai-assistant/?src=yt Read the announcement around AI Observability: https://grafana.com/blog/ai-observability-for-agents-in-grafana-cloud/?src=yt Get started with the Grafana Cloud forever-free tier: https://grafana.com/g/cloud Have a question? Ask Grot, your AI helper: https://grafana.com/grot/ Reach out in our community forums: https://gra.fan/communityyf --- Thanks for watching! 👍 Was this video helpful? Like and subscribe to our channel for more videos. Connect with Grafana Labs: X: (https://www.twitter.com/grafana) LinkedIn: (https://www.linkedin.com/company/grafana-labs/) Facebook: (https://www.facebook.com/grafana) #Grafana #Observability #AI #LLM #GrafanaAssistant

Watch on YouTube

Video Details

Category

AI Evaluation & Monitoring

Featured Date

Quality Rank

#2

AI Recommended