Loading video player...
In this exclusive VMblog interview, we sit down with Kolton Andrus, CEO and founder of Gremlin, to discuss the evolution of chaos engineering and how companies are transforming reliability testing from isolated projects into organization-wide standards. What We Cover: Kolton shares insights from his pioneering work at Amazon and Netflix that led to the creation of Gremlin, which has evolved from a chaos engineering platform into a comprehensive reliability management suite. The conversation explores how the discipline has matured, moving beyond "that's a cool idea" to become essential infrastructure for modern software development. Key Highlights: New Dynatrace Partnership: Learn how Gremlin's integration with Dynatrace streamlines reliability testing by automatically discovering relevant metrics and alerts, enabling safer chaos engineering experiments with built-in health monitoring and automatic test halting when issues arise. Kubernetes Focus: Why containerized systems have become the most common pattern Gremlin customers use, and what makes Kubernetes essential for agile development and rapid incident response. AI and Chaos Engineering: How chaos engineering serves as "acceptance testing" for AI-generated code, providing crucial safety nets as development velocity accelerates and ensuring AI agents account for critical concerns like circuit breakers, exponential backoff, and load shedding. Reliability Intelligence: Gremlin's new AI-powered feature that not only tells you when tests fail but provides specific remediation guidance—from high-level architecture recommendations to specific bugs that need addressing. Kolton emphasizes that effective chaos engineering requires more than just technology—it demands organizational change, measurable results, and leadership visibility to demonstrate ROI. The goal: find and fix issues before your cloud provider has an outage. Visit gremlin.com to try their free trial and explore how chaos engineering can help your organization build more reliable systems. #ChaosEngineering #Kubernetes #KubeCon #Reliability #SRE #DevOps #Gremlin #Dynatrace #AI #CloudNative