Loading video player...
Run Apache Spark jobs on Kubernetes with ZERO permanent infrastructure! In this comprehensive tutorial, you'll learn how to deploy a production-ready Spark setup that creates pods ONLY when jobs run and automatically cleans up when done. Say goodbye to costly always-on Spark clusters. šÆ WHAT YOU'LL LEARN ā Deploy Spark on Kubernetes WITHOUT permanent master/worker nodes ā Build custom Spark images with embedded PySpark jobs ā Submit jobs that auto-scale executors and self-cleanup ā Run real-world analytics: customer segmentation, cohort analysis, revenue trends ā Set up the Spark History Server for job monitoring ā Implement proper RBAC security for production ā Debug and monitor jobs using kubectl and Spark UI TIMESTAMPS 0:00 Introduction 1:37 System Architecture 5:48 Setting up K8S 8:10 Setting up the project 10:00 K8S Namespaces 11:45 K8s Service Accounts, RBAC 17:27 Creating Spark Jobs for K8S 26:40 k8s Spark History Server 34:24 Spark Control Dashboard 42:42 k8s API layer 49:52 Spark Dashboard, Job submissions and review 56:52 Outro š RESOURCES & LINKS FULL SOURCE CODE - https://buymeacoffee.com/yusuf.ganiyu/source-code-spark-k8s ⢠Apache Spark K8s Docs: https://spark.apache.org/docs/latest/running-on-kubernetes.html ⢠Kubernetes Documentation: https://kubernetes.io/docs/ ⢠PySpark API Reference: https://spark.apache.org/docs/latest/api/python/ Like this video? Support us: https://www.youtube.com/@CodeWithYu/join #ApacheSpark #Kubernetes #DataEngineering #BigData #PySpark #DevOps #CloudNative #K8s #DataPipelines #ETL #Tutorial