Active filters:

All

AI-curated developer content, daily. Quality videos and tutorials on AI, DevOps, Frontend, Backend, Web3, and more. Updated daily at 7:30 AM UTC.

Navigation

Home
All Feeds
How It Works

Resources

Contact Support
API Docs
API Status
Privacy Policy
Terms of Service

All content belongs to their respective creators. We provide curated links to publicly available content.

Active filters:

All

Building an Incremental RAG Pipeline with Spark, Delta Lake & Ollama | CS441 Project HW2 | DailyDevLists

Building an Incremental RAG Pipeline with Spark, Delta Lake & Ollama | CS441 Project HW2

Pranay Dhopate

1 day ago

16:27

RAG & Vector Search

Rank #15

Description

This video presents our CS441 project on building an incremental Retrieval-Augmented Generation (RAG) indexing pipeline using Apache Spark, Delta Lake, and Ollama. In this system, we process a corpus of academic PDFs, extract and normalize their text, generate embeddings locally using Ollama, and store the results in Delta Lake tables. Unlike a traditional batch indexer that reprocesses everything on each run, this pipeline performs intelligent change detection by identifying new, modified, and deleted documents. It then re-chunks and re-embeds only the relevant data, demonstrating a truly incremental and idempotent RAG ingestion workflow. The video walks through the full architecture, configuration, local execution in a Linux environment, and log verification that proves the incremental behavior. We also highlight how the design supports real-world deployment, including compatibility with AWS EMR and S3. By the end of this demonstration, viewers will understand how to construct a scalable, production-style embedding index that automatically stays up to date as source documents evolve.

Watch on YouTube