Data Transformation on Kubernetes — Open Data Stack with dbt, Trino, Querybook, and Apache Iceberg | DailyDevLists

Loading video player...

Data Transformation on Kubernetes — Open Data Stack with dbt, Trino, Querybook, and Apache Iceberg

Buun ch.

10 days ago

25:53

Kubernetes & Container Orchestration

Rank #8

Description

Build the transformation layer of an Open Data Stack on your Kubernetes home lab. Install Trino as a distributed SQL engine and Querybook for collaborative data exploration, use dbt to transform raw Iceberg tables into analytics-ready star schema, and run advanced queries and notebooks with real-time collaboration. 🚀 What You'll Learn: - Install Trino and Querybook on Kubernetes - Build a custom Querybook image with Trino support - Transform raw data with dbt into staging views and marts tables - Create star schema (dimensions, facts, bridge tables) in Apache Iceberg - Run incremental transformations with dbt - Write data quality tests in dbt - Execute ad hoc queries and build DataDocs in Querybook - Analyze product affinity and customer segments (RFM) ⚙️ Stack Components: - k3s or another Kubernetes cluster - Trino (distributed SQL engine) - Querybook (collaborative query IDE) - dbt (data transformation framework) - Lakekeeper (Apache Iceberg REST catalog) - MinIO (object storage) - Keycloak (OIDC) - Cloudflare Tunnel - Just task runner 📋 Prerequisites: - Working Kubernetes environment with MinIO and Lakekeeper (see previous video) - Basic kubectl and Helm knowledge - If exposing publicly: domain + DNS (e.g., Cloudflare) - Keycloak and Vault set up from earlier episodes 🔗 Resources: GitHub Repositories: - buun-stack — https://github.com/buun-ch/buun-stack - payload-ecommerce-lakehouse-demo — https://github.com/buun-ch/payload-ecommerce-lakehouse-demo Previous videos: - Data Ingestion on Kubernetes — Open Data Stack with dlt, Airflow, Dagster, and Lakekeeper — https://youtu.be/0mYK-haDKtg ⏱️ Timestamps: 00:00 | Introduction & demo result preview 01:18 | Agenda 01:56 | dbt overview 02:40 | Trino overview 03:12 | Querybook overview 03:52 | Install Trino and Querybook 08:03 | Demo app introduction (Payload CMS ecommerce) 09:56 | Data seeding 11:11 | Data ingestion with dlt 14:21 | Data transformation with dbt 19:20 | Data analysis in Querybook 22:30 | Product Affinity Analysis 23:28 | RFM Analysis 25:06 | Wrap-up & next episode 💡 Key Benefits: ✅ End-to-end data pipeline from ingestion to analysis ✅ Real-world ecommerce data ingestion from Payload CMS app with dlt ✅ Transform raw data into analytics-ready star schema with dbt ✅ Run distributed SQL queries across Iceberg tables with Trino ✅ Ad hoc queries and DataDocs with Querybook ✅ Self-hosted stack you can later port to cloud 🎯 Who This Is For: - Developers/data engineers building an open data stack on Kubernetes - Teams evaluating dbt, Trino, and Querybook for data transformation and analysis - Data analysts who want to build star schema for BI and analytics with an open data stack 🔔 Subscribe for more Kubernetes and data engineering tutorials! ✉️ Contact: - buun@buun.channel #k8s #kubernetes #dbt #trino #querybook #apacheiceberg #lakekeeper #selfhosted #dataengineering #datastack #payloadcms

Watch on YouTube

Video Details

Category

Kubernetes & Container Orchestration

Featured Date

November 7, 2025

Quality Rank

#8

AI Recommended