
RAG Pipeline: 7 Iterations Explained!
Cyril Imhof
RAG that actually reads charts and pulls the numbers — not just the gist. In this episode, Nisaar and Rohan live debug their document pipeline, fix zero based indexing, rewrite prompts to extract numeric values and units from graphs, and compare GPT 4o and 4o mini for accuracy and cost. We also walk through parallel page processing, caching choices using CSV versus Redis, and choosing a CPU friendly LLM in the 7 to 10B range for local RAG. What you will learn A robust PDF to Vision to RAG pipeline that extracts values from charts and tables Generalized prompts for per point numeric extraction with units and conditions Practical trade offs between 4o mini speed and cost and full 4o accuracy Local RAG on CPU including quantization, 7 to 10B models, and Ollama Why CSV is not for production and when to use Redis for replayable chunks Timestamps 00:00 Intro indexing today’s pipeline goals 00:50 Pipeline and indexing confusion page 20 to 22, inflation notes 01:12 Tesla page vehicle deliveries, cash flow, EBITDA, net income TTM 02:07 Our RAG journey from early document intelligence to iterative improvements 03:24 Rewriting the prompt dive inside the graph and extract every value 04:57 Problem found model gets the trend but misses numbers 06:29 Audience philosophy deep tech over hype, quality over vanity metrics 08:02 CPU LLM for RAG quantized 7 to 10B, Ollama, Qwen and others, 4 bit and 1 bit 10:10 Hit Run All see the end to end processing 10:48 Parallel extraction five pages at a time, PNGs left, logs right 11:28 Metadata and storage CSV in demo, use Redis in production 12:46 Verifying extraction vs the Tesla charts did we get the numbers 13:32 Financial metrics readout deliveries trend, cash flow and net income 14:22 Root cause 4o mini accuracy gaps, try full GPT 4o for vision Tags RAG, Retrieval Augmented Generation, RAG tutorial, multimodal RAG, PDF data extraction, chart extraction, graph OCR, numeric extraction, GPT 4o, GPT 4o mini, OpenAI Vision, document intelligence, LLM pipelines, local LLM, CPU LLM, quantization, 4 bit quantization, Ollama, Qwen, Llama, Mistral, FAISS, Pinecone, vector database, Redis cache, CSV vs Redis, parallel processing, Python pipeline, LangChain, LlamaIndex, Tesla analysis, financial charts, EBITDA, cash flow, zero based indexing, Hindi tech, AIBROS Podcast, Nisaar, Rohan, AI podcast India, machine learning, generative AI, AI engineering, prompt engineering, production AI Hashtags #RAG #AIBROS #AI #GPT4o #OpenSource #Ollama #LLM #MachineLearning #DataExtraction #PDF #Charts #Redis #Python #IndiaTech

Cyril Imhof

Mehul Mathur

Nidhi Chouhan

Nidhi Chouhan

Daksh Rathore

Vikash Kumar

LOUIS PYTHON

Data Science Gems