Loading video player...
Only 26% of organizations actively use SLOs after a decade of Google's SRE principles being gospel. Here's why adoption is so low and how to adapt SRE for 2025... 👇 JUMP TO YOUR SECTION ⏱️ TIMESTAMPS: 00:00 - Introduction: The 26% Adoption Problem 02:00 - What Google Got Right: Timeless Principles 05:00 - Why Adoption Failed: Implementation Reality 08:00 - Platform Engineering vs SRE 11:00 - AI/ML Systems Adaptation 13:00 - Practical Next Steps 📊 KEY STATS: • 26% of organizations actively use SLOs (Grafana 2024) • 49% say SLOs more relevant than ever (Grafana 2024) • 85% adopted OpenTelemetry (process more important than tooling) • $115K Platform Engineers vs $127K SREs (Glassdoor) • 99.999% SLO with 0.0002% problem = 20% quarterly budget spent 💡 WHAT YOU'LL LEARN: ✓ Why only 26% of organizations use SLOs despite 85% adopting OpenTelemetry—process transformation is harder than tooling, with unrealistic targets undermining entire systems ✓ Error budget fundamentals that remain timeless: transforming reliability from political arguments into data-driven release decisions with mathematical precision ✓ How Platform Engineering ($115K) and SRE ($127K) are complementary not competitive—Platform teams build systems, SREs ensure reliability, both use error budget thinking ✓ AI/ML systems need adapted SRE principles: data freshness SLOs, model drift detection, training pipeline reliability, different error budget math (one LLM training failure = tens of thousands in compute) ✓ Starting from zero playbook: pick 3-5 critical services, one SLO per service (99.9% = 43min/month), automate with OpenTelemetry, get cross-functional buy-in, 12-month timeline 🔗 RESOURCES: 📝 Full transcript & notes: https://platformengineeringplaybook.com/podcasts/00025-sre-reliability-principles 💻 GitHub: https://github.com/vibesre/Platform-Engineering-Playbook 🎯 FOR SENIOR PLATFORM ENGINEERS, SRES, DEVOPS ENGINEERS WITH 5+ YEARS EXPERIENCE Senior platform engineers, SREs, DevOps engineers with 5+ years experience seeking practical, data-driven insights on production reliability engineering and SRE implementation. #SRE #PlatformEngineering #DevOps #Reliability #ErrorBudgets Tags: SRE, reliability, error budgets, SLOs, SLIs, platform engineering, observability, OpenTelemetry, Google SRE, site reliability engineering, ML systems, AI reliability, chaos engineering, production engineering