Loading video player...
7 Real-World Kubernetes Failures on AWS EKS (Argo CD + Helm) — Diagnose & Fix Like a DevOps Engineer The video demonstrates diagnosing and fixing seven common real-world Kubernetes failures across nine microservices running on AWS EKS and deployed via Argo CD with Helm charts managed through a GitOps GitHub repo. The presenter uses kubectl and Argo CD events/logs to troubleshoot issues such as ImagePullBackOff (wrong image tag and incorrect ECR repository name), CrashLoopBackOff due to OOMKilled (exit code 137 from low memory limits), Pending pods from insufficient cluster resources (overstated memory requests and high replica count), and services stuck at 0/1 because of failing readiness or liveness probes (wrong ports/paths and overly aggressive probe timings). A Helm template/indentation error causing Argo CD sync failure prevents pod creation for one service until corrected. The episode emphasizes starting with pod describe/events before logs and fixing problems by updating Helm values in GitOps. 00:00 Kubernetes Failures Overview 00:47 Architecture and Argo CD Setup 01:42 Common Error Cheat Sheet 03:28 Fix ImagePullBackOff Auth Service 06:44 GitOps Sync and Autosync Settings 09:16 Fix Catalog Image and Memory 14:49 Inventory Pod Pending Diagnosis 15:29 Fix Memory Limits 16:37 Adjust Replica Count 17:06 Sync Changes ArgoCD 17:43 Verify Inventory Pod 18:30 Manufacturing Probe Port 22:05 QC Liveness Path 24:03 Tune QC Probe Timings 25:47 Notification Readiness Fix 27:28 Supplier Helm Syntax 29:49 Wrap Up Troubleshooting