Loading video player...
Build the transformation layer of an Open Data Stack on your Kubernetes home lab. Install Trino as a distributed SQL engine and Querybook for collaborative data exploration, use dbt to transform raw Iceberg tables into analytics-ready star schema, and run advanced queries and notebooks with real-time collaboration. š What You'll Learn: - Install Trino and Querybook on Kubernetes - Build a custom Querybook image with Trino support - Transform raw data with dbt into staging views and marts tables - Create star schema (dimensions, facts, bridge tables) in Apache Iceberg - Run incremental transformations with dbt - Write data quality tests in dbt - Execute ad hoc queries and build DataDocs in Querybook - Analyze product affinity and customer segments (RFM) āļø Stack Components: - k3s or another Kubernetes cluster - Trino (distributed SQL engine) - Querybook (collaborative query IDE) - dbt (data transformation framework) - Lakekeeper (Apache Iceberg REST catalog) - MinIO (object storage) - Keycloak (OIDC) - Cloudflare Tunnel - Just task runner š Prerequisites: - Working Kubernetes environment with MinIO and Lakekeeper (see previous video) - Basic kubectl and Helm knowledge - If exposing publicly: domain + DNS (e.g., Cloudflare) - Keycloak and Vault set up from earlier episodes š Resources: GitHub Repositories: - buun-stack ā https://github.com/buun-ch/buun-stack - payload-ecommerce-lakehouse-demo ā https://github.com/buun-ch/payload-ecommerce-lakehouse-demo Previous videos: - Data Ingestion on Kubernetes ā Open Data Stack with dlt, Airflow, Dagster, and Lakekeeper ā https://youtu.be/0mYK-haDKtg ā±ļø Timestamps: 00:00 | Introduction & demo result preview 01:18 | Agenda 01:56 | dbt overview 02:40 | Trino overview 03:12 | Querybook overview 03:52 | Install Trino and Querybook 08:03 | Demo app introduction (Payload CMS ecommerce) 09:56 | Data seeding 11:11 | Data ingestion with dlt 14:21 | Data transformation with dbt 19:20 | Data analysis in Querybook 22:30 | Product Affinity Analysis 23:28 | RFM Analysis 25:06 | Wrap-up & next episode š” Key Benefits: ā End-to-end data pipeline from ingestion to analysis ā Real-world ecommerce data ingestion from Payload CMS app with dlt ā Transform raw data into analytics-ready star schema with dbt ā Run distributed SQL queries across Iceberg tables with Trino ā Ad hoc queries and DataDocs with Querybook ā Self-hosted stack you can later port to cloud šÆ Who This Is For: - Developers/data engineers building an open data stack on Kubernetes - Teams evaluating dbt, Trino, and Querybook for data transformation and analysis - Data analysts who want to build star schema for BI and analytics with an open data stack š Subscribe for more Kubernetes and data engineering tutorials! āļø Contact: - buun@buun.channel #k8s #kubernetes #dbt #trino #querybook #apacheiceberg #lakekeeper #selfhosted #dataengineering #datastack #payloadcms