Loading video player...
Agents break the old rules of observability. Latency, throughput, and error rates still matter, but once software starts making decisions and taking actions on someone else’s behalf, the real question becomes: is it doing the right thing, and is it doing it for the right reasons? In this episode of #F5's Pop Goes the Stack, Lori MacVittie and Joel “OpenClaw” Moses are joined by observability expert Chris Hain to unpack what changes when systems become agentic. Instead of a single prompt-response interaction, you get decision chains that branch, loop, call tools, and evolve over time. A system can “succeed” operationally while still being wrong, expensive, or misaligned with intent. Chris argues you don’t have to throw away what already works. Distributed tracing still applies, but now each agent step becomes a span, decorated with richer metadata like model identity, tool calls, token usage, prompts, and cost. The discussion also dives into why standardization matters, including OpenTelemetry and emerging semantic conventions for generative and #agentic AI, and why auto-instrumentation approaches like eBPF become critical when agents generate code that has no built-in telemetry. Joel adds a new set of metrics that feel uncomfortably necessary: decision loops per task, drift in tool-call chains, human override frequency, and the cost and token patterns that signal something has changed. The group also tackles the awkward feedback loop of using agents to make observability actionable, while acknowledging the risk of agents optimizing the dashboard instead of the system. If you’re building agentic workflows, this episode is a practical guide to why “failed successfully” is now a real production state, and why instrumenting for correctness and intent alignment is the next observability frontier. Chapters: 00:00 Welcome to Pop Goes the Stack 00:25 How do agents change observability and what we measure? 02:04 Agentic vs #genAI: Decision chains, not single prompts 03:33 Use tracing for agents: Spans as steps + metadata = rich replay 04:58 Problem: AI-generated code has zero instrumentation 05:57 OpenTelemetry to the rescue: Semantic conventions for AI 08:29 New agent metrics: Loops, cost drift, overrides, “regrets/sec” 10:47 The data explosion: Multimodal logs, privacy, and governance 12:08 Real-time vs fleet-wide views: Dashboards vs trend analysis 13:49 Auto-instrumentation: eBPF for the unknown unknowns 15:13 Agents using observability and making telemetry actionable (risk!) 18:06 Key takeaways: Early instrumentation and standardization, correctness, and awareness Learn how you can stay ahead of the curve and keep your stack whole with additional insights on app security, multicloud, AI, and emerging tech: https://go.f5.net/81bim4gl More about F5: https://go.f5.net/tg7gaarr Read our blog: https://go.f5.net/nijag141 Follow us on LinkedIn: https://go.f5.net/2cn62bhe