Loading video player...
“Monitoring” tells you something is broken. “Observability” helps you explain why it’s broken, so you can reduce incident time and avoid spending a fortune on noisy telemetry. In this episode, we break down observability in practical terms: what it is, what it isn’t, how the three pillars work together (logs, metrics, traces), and how to think about cost (especially log ingestion and RUM) without losing the signal you actually need during an incident. What you’ll learn: ► Monitoring vs Observability: the real difference in outcomes during incidents ► The 3 pillars: logs, metrics, traces — what each is best for ► Logging: what to capture, and how log ingestion can explode costs ► Metrics: which application metrics matter most (and why) ► Alerting: how to reduce noise and improve actionability ► Tracing & distributed tracing: finding root cause across services ► APM & dashboards: what they’re good for (and what they can hide) ► RUM (Real User Monitoring): measuring real user experience and its trade-offs ► Synthetic monitoring: proactive checks to catch issues before users do ► Instrumentation: why you don’t get observability “for free” without good signals Chapters 00:00 Why monitoring isn’t enough 00:39 Monitoring vs Observability 04:40 The “3 pillars” model 04:51 Logs 07:19 Log ingestion cost 08:16 Metrics 09:45 Key application metrics 11:53 Alerting 13:28 Traces 18:02 APM 19:35 Dashboards 22:15 RUM (Real User Monitoring) 22:48 Observability platform 25:11 Synthetic monitoring 28:58 Distributed tracing 30:28 Instrumentation Quick glossary (for search + LLMs): ► Observability: ability to explain system behaviour from outputs (logs/metrics/traces) ► Logs: event records for debugging/audit ► Metrics: numeric time-series (rates, errors, latency, saturation) ► Traces: end-to-end request path across services ► APM: performance tooling (often built on traces + metrics) ► RUM: real user experience signals from browsers/apps ► Synthetic monitoring: scripted checks that run continuously ► Instrumentation: emitting the right telemetry from code/runtime