
RAG Pipeline: 7 Iterations Explained!
Cyril Imhof
VIEW ORIGINAL SLIDES: https://docs.google.com/presentation/d/1WVBxlwvCHc-lt4yyt9F_a_eKkVX2NOp-ChGWiPFvkwM/edit?slide=id.g3821790bc06_0_82#slide=id.g3821790bc06_0_82 _____ Embeddings power RAG, search, agents, and recommendations—but production reality is a different story. This talk distills patterns from companies running embedding inference at scale. We’ll map where latency and throughput degrade and discuss architectural fixes, as well as model selection trade-offs, dimensionality, and quantization considerations. Finally, we’ll share open-source tools that can boost any embedding API, along with deployment tips for compound AI systems where multiple models and tools coordinate. You’ll leave able to diagnose bottlenecks, design resilient pipelines, and ship faster systems without overspending. _____ This video is a part of a conference series from Qdrant's #VectorSpaceDay 2025. Read about the full event recap here: https://qdrant.tech/blog/vector-space-day-2025-recap/ and check out all other speaker presentations in this playlist: https://www.youtube.com/playlist?list=PL9IXkWSmb36-peUPGzdzjAZ0dDmaSnzUw