Loading video player...
AI agents don't fail the way traditional software does — they hallucinate, drift, and cascade silently, producing subtly wrong answers with no error codes and no alerts to catch them. Legacy observability platforms built for microservices and deterministic APIs were never designed for this, and the scale of telemetry generated by agentic workloads is about to dwarf everything that came before. In this exclusive interview with Swapnil Bhartiya at TFiR, Jeremy Burton, General Manager of the Observability Unit at Snowflake, explains how Snowflake's unified data-plus-observability architecture is purpose-built to handle the economics, scale, and explainability demands of AI-driven production environments — and why he believes the walled-garden observability era of tools like Datadog and Dynatrace is about to be blown apart. Key Topics Covered: - Why AI agents introduce a new class of observability problem — not failure detection, but behavioral drift and response explainability across probabilistic LLM systems - How Snowflake's scan-based, index-free query engine and elastic compute/storage separation enables petabyte-scale observability economics that traditional platforms cannot match - Why Observe (now part of Snowflake) processes ~300 million queries/day and ingests multiple petabytes of data daily — and what that architecture unlocks for agentic workloads - The shift to headless observability: developers querying telemetry data via MCP servers, CLI, and coding agents like Claude Code and Cursor instead of curated UIs - How OpenTelemetry plus Apache Iceberg on S3 is dismantling proprietary observability lock-in, and why the largest enterprises are building their own incident management workflows on top of open data formats Read the full story & transcript at www.tfir.io #Observability #AIAgents #Snowflake #OpenTelemetry #Iceberg #AIOps #LLMOps #Datadog #SRE #CloudNative #Telemetry #AIInfrastructure #DevOps #PlatformEngineering #MLOps