The Simplest RAG Stack That Actually Works (Complete Guide) | DailyDevLists

Loading video player...

Full Transcript

4,939 words • EN

Today I'm going to show you how to

implement a hybrid search rag agent.

This is one of my favorite rag

strategies. It's really powerful and I'm

going to break it down for you very

simply right now. So the idea here is

that we give our agent the ability to

search our documents and data both

semantically understanding the

relationship between concepts but also

with keywords so we can very accurately

pick out super specific information from

our documents. And so we're going to get

the best of both worlds here because

we're going to use both strategies for

every single search, but it's still

going to be fast. And we have a really

simple tech stack here. And honestly, it

just works. And that's the thing as I've

been evolving my own rag strategies.

I've also been simplifying things, just

putting more and more focus into the

strategies like hybrid search that just

works so well no matter the use case. So

I'm really excited to get into this with

you. And I also built a complete AI

agent for this that demonstrates hybrid

search. So, I'll use this to explain all

the concepts and then this is also a

template that you can feel free to use

for yourself because by the end of this

video, you'll probably be convinced that

hybrid search is the way to go and so

you can use this as a starting point.

And we'll do some live demonstrations in

this video to talk about the different

kinds of queries that we can ask our

agent based on our knowledge. And I have

this Excal diagram to explain our text

stack. And we'll also get into how

hybrid search works specifically with

MongoDB because we're going to use it as

our database and it will essentially

serve as our vector database as well.

And I know this architecture might seem

like a lot, but it is really

fascinating. And don't worry, I'll break

it down nice and simple for you. It's

worth understanding how we've

architected things so our agent can

handle so much data. So let's get into

the tech stack first. Then I'll cover

how hybrid search works and our agent.

Now the first big decision we have to

make for any rag agent is what is our

database going to be? Where are we going

to store our documents for search? And

MongoDB is a platform that I have never

covered on my channel, but I've used it

a lot in the past and there are some

components built into it that will

specifically help us with hybrid search.

And so we can use it as our vector

database. And so it is a NoSQL database.

So we can store our document records and

then also our chunks where we have all

of the embeddings for rag and we can

connect things together. We can perform

our text searches and our semantic

searches very very quickly. And so this

is just a really efficient and easy to

work with option for us. They also have

some rag guides that I'll link to in the

description. I use this one as well as

uh this cookbook right here is a lot of

inspiration for the agent that I built

for this video. So great option for our

database. The kinds of things you can do

here with hybrid search is not the sort

of thing available for a lot of other

vector databases. Now for our AI agent

framework, we're going to be using

Pantic AI. This has been my favorite

framework for the entire year and it

still is. And man, Pideantic AI keeps

pumping out incredible updates. They're

already on version 1.27 and they had

their official version one release only

a couple of months ago. and their

documentation just keeps getting better

and better. They're having more and more

integrations and it's getting easier and

easier to use the framework while still

giving us the flexibility that makes me

really love Pantic AI. This will be the

core for our agent and then all the

tools we give to our Pantic AI agent

will be leveraging MongoDB for rag. And

last but not least for file processing,

we have Dockling because we need a

powerful library in our rag pipeline to

get data into our vector database in the

right format. And Dockling makes it

really easy to extract text from PDFs,

Word documents, markdown documents, even

audio files. And our agent is going to

handle all of those. Plus, we are also

going to be using Dockling for our

chunking strategy because we need some

way to split up our larger documents to

store them neatly in our vector

database. And we're going to be using

hybrid chunking for this. So, I have a

video on my channel where I covered

Dockling and hybrid chunking in more

detail. I'll link to that right here.

This is going to power our rag pipeline.

And out of all of the chunking

strategies, hybrid is by far my

favorite. And so we're going to have

this for all of our chunks that we put

in MongoDB. And so you can see here that

we have really nice starts and ends to

all of our chunks thanks to our hybrid

chunking strategy. And so this is the

kind of thing that like chunking is not

an easy problem to solve, but we're able

to do this very easily just using what

Dockling has for us out of the box. So

that is our tech stack in total. Now I

want to cover how the hybrid search

works and I'll do some live demos with

the agent as well. Now I'm going to be

focusing on how hybrid search works with

MongoDB, but you can apply these

concepts to other databases as well.

We're not limited to MongoDB, but they

have done some things with their

platform that makes it optimal for

hybrid search specifically, even to the

point where I've reached out to their

team to work with them on this video to

make sure that I'm presenting hybrid

search in the best way to you. And they

have features that are in preview right

now that help us with combining our

keyword and semantic search like Rank

Fusion. I will talk about this later,

but right now let's go ahead and focus

on the pros and cons of keyword and

semantic search. So, what I'm about to

explain here is going to help you

understand why we care about combining

both of these strategies together for

our agent. Now, with keyword search, the

big benefit here is pretty obvious.

We're able to find exact terms with very

high accuracy because if I search for a

certain term and that exact word or

phrase appears in my knowledge base, I

am guaranteed to find that chunk. And

you are not actually guaranteed that

with semantic search because when we do

the more traditional rag search, that's

what semantic search is where we work

with the embedding model, it's more of a

conceptual search. And so we're able to

find concepts and related ideas, but we

aren't guaranteed to find exact terms.

And so keyword search might miss some

concepts that semantic search hits on,

and it will fail on synonyms, for

example. But you're always guaranteed to

find when you look up a specific

character in a movie or a legal statute,

like if that exists in the knowledge

base, you will find it. And so that's

why the hybrid search is the solution

here because we're going to be able to

find concepts and exact terms. And so

our search is going to pull in chunks

from both of those strategies and then

we just have to find the best of both.

We need some kind of strategy to merge

things. And that is specifically what we

can do with MongoDB. That's what I'll

dive into in a little bit here. And so

with that, I want to show you hybrid

search in action with the agent that I

have for you as a template. So we'll ask

it a few questions leveraging the

documents that I already have ingested

in MongoDB. So I'm not going to be

showing you things from scratch with the

setup and everything in this video. But

if you want to leverage this template,

which I very much encourage you to do,

just go to the link in the description.

And then I made the setup very

straightforward in the readme here.

getting all your dependencies installed,

setting up MongoDB, getting the

documents ingested, which the documents

that I'm working with in this video, I

have in the repository as well if you

want to use the exact same data set that

I am using. And so going back here,

let's go ahead and ask our first

question. And so this is coming from a

PDF document that I'll show you in a

little bit. What is the revenue

breakdown by service line? And you can

see that it uses the search

knowledgebased tool. The agent defines a

query and then for the search type it is

specifying hybrid because we are

combining keyword and semantic search

for this single tool call. And there we

go. Q4 revenue and then it lists out the

different service lines. And if we go to

our PDF document, which I already have

this pulled up here, the table that you

saw in the response it gave maps exactly

to what we have here. Now I don't

necessarily know did the keyword search

or did the semantic search give us the

right chunk here or maybe both of them

did. This is a more simple example but

you can imagine certain situations where

maybe when we ask for service line it

conceptually understands that like these

are service lines and so it's the

semantic search that pulls that out or

maybe the keyword revenue finds this

specific chunk because we have some kind

of preference before we list out this

table here. Right? If we go to the

document here, revenue, right? Like

maybe the keyword found that and then it

loaded the chunk that has this entire

table. And so I can't really tell you

right now exactly what strategy found

this chunk, but you can really see here

how based on these kinds of questions

that either one of them could be the

savior here that finds the chunks when

the other misses it. That's why it's so

powerful for us to include both

strategies. Now, one kind of question

that semantic search often messes up on

is when you ask it for a value from a

specific year. So, classic rag will

often fail with this because if you have

a lot of different years worth of data

in your knowledge base, it might find

2023 revenue instead of 2025 because

it's more of that conceptual search

versus an exact keyword search. And so,

the agent gives the query of Neuroflow

revenue 2025. And so, it gets the right

answer here. I know this from one of the

markdown documents in my knowledge base,

but if I had many different years worth

of data, I would trust the keyword

search more because it's going to pull

out 2025 and revenue. That's going to be

somewhere in a document. And it doesn't

have to be exact because we're using

what's called fuzzy search here as well.

But yeah, just giving one example of

when keyword search is probably going to

be better. Now, of course, I'll also

give you an example where semantic

search really shines. The kind of thing

that I don't think that a keyword search

would find for us here. So I'm asking

for the timeline for the Converse Pro

launch prep which this information is

coming from a meeting note one of the

word documents that I have in the

knowledge base. Now the thing is I don't

mention timeline explicitly in this

document. So a keyword search is

probably going to fail. I have to

conceptually understand that the launch

plan is what correlates to a timeline

for Converts Pro. And so for example,

early access program February 1st

through 28th, launch the webinar on

March 20th. Let's see if we get this

answer when we send in this query. So it

still will do hybrid. It's going to do

both, but it's probably going to be the

chunks from semantic search that gives

us the correct information. There we go.

Like for example, the early access

program from February 1st to 28th and

then we are launching the webinar on

March 20th. And so it was able to

understand that timeline is equivalent

to the launch preparation. And by the

way, the reason why the search type is a

parameter that the agent can set for the

tool is because it can decide to also

just do a semantic search or just do a

keyword search. Now, for the system

prompt that I've given the agent right

now for demo purposes, I'm telling it to

always do both, to do a hybrid search

type. But sometimes for the sake of

speed or less tokens, you might want to

tell it to only do a keyword search or

only do a semantic search for certain

kinds of questions that you know is

going to be optimal for one or the

other. And that is actually why I don't

include hybrid search explicitly in my

YouTube video where I cover all rag

strategies because I would actually

consider hybrid search a form of a

gentic rag. So I've covered a gentic rag

a lot on my channel before. It's just

generally the idea of you give your

agent the ability to choose how it

explores your knowledge base. In this

case, it's able to do keyword search,

semantic search, or both at the same

time. And so, I've covered a lot of

different kinds of examples of agentic

rag, like being able to do a rag search

or just read a full document. But hybrid

search is just kind of another version

of agentic rag. Now, the question you're

probably asking is, how do we know how

to instruct the agent in the system

prompt whether to do keyword or semantic

search? But let me cover a couple of

examples here. I think it'll become

really obvious when one is better than

the other if it isn't already. And so,

when does vector search or classic rank

search do well? Well, it's when we want

to connect concepts together. Like, if

we search for king, we will find records

that mention queens as well. So maybe

we're more focused on royalty and we

want to stick to that entire domain and

not be limited to just finding

information on kings. Han Solo is going

to find Chewbacca. Berlin will find

Germany. Microservices will find

architecture. Cheap flights will find

affordable airfare. And for example,

searching for slow PC might find these

articles that are talking about how to

make your PC run very quickly. So we can

connect concepts even to the point where

we find opposites because keyword search

would only find things talking about

slow PCs. So it might miss an article in

your knowledge base that's talking about

how to make your PC really fast, for

example. But when a user is searching

for slow PC, they probably care about

that. And so how about keyword search?

When does it do well? Well, a lot of

these examples are very, very specific.

And that's because these are the kinds

of things that we might miss when we're

searching at a higher level more

conceptually. So if we're in code

searching for a 409 error, it'll find

the exact code in the docs that

references this error. Or searching for

a specific product that we have and

maybe an Excel file, for example. A lot

of times semantic search doesn't do well

with that. And we search AAPL, we'll

find the stock and not the fruit. We can

search for specific legal statutes. King

will find King George instead of finding

queens. So like maybe you do really want

to limit to kings and you don't want to

pull chunks with queen as well. That is

also when keyword search is better. And

then Berlin will map to the capital of

Germany because if you have some kind of

like Wikipedia article that's talking

about the capital of Germany, you know

that Berlin is going to be right there

in the same sentence and so it will pull

that entire chunk. And one last really

important thing to mention here for

keyword search is that we have what is

called fuzzy matching. I mentioned this

very briefly earlier, but all it does is

it allows for a certain number of edits

and prefix length differences to make it

so that we can have typos and it will

still be able to find those keywords in

our knowledge base. This is really,

really powerful because if we don't let

the agent come up with the queries and

it's just our own typing, there might be

a typo or the agent might not understand

exactly how to spell something in our

knowledge base or there's a typo in our

knowledge base itself. all these

different ways that a true keyword

search could go wrong. That's why we

want fuzzy matching. And so it still is

pretty flexible even though it can't

understand concepts like a semantic

search. All right, so you understand

keyword and semantic search. Now I want

to get into the pipeline. How does the

actual lookup work for both semantic and

keyword search? And then how do we

combine things together? So I'm going to

talk about the aggregation pipeline that

we have in MongoDB and then get into the

reciprocal rank fusion. This is the

algorithm for us to merge the results

together from both of our search

strategies. And so we're looking at a

bit more of a technical part of the

video. Now I'm going to get into the

code a little bit as well, but I still

want to stay pretty high level. I just

want to show you how we pull the data

from the database, how we transform it,

and how we combine it. So I'm going to

go over to the code now and then I'll go

back to the Excal diagram after we take

a quick look here. So I have my agent

defined with Pantic AI. This is really

standard for all the Pantic AI agents

I've created on my channel. And then we

have our single tool here to search our

knowledge base with that parameter where

the search type can either be just

semantic, just text or keyword search or

hybrid combining both. And then we call

one of these functions that we have in

our tools.py right here. based on the

specific search type. That's pretty much

the logic that we have in the agent.

Most of what we have with working with

MongoDB is now within these functions

that we have right here. So, for

example, right here with semantic

search, we have the context from our

panici agent. The agent generates a

query for the lookup based on what we

told it. And then we have the match

count. How many chunks do we return? And

this defaults to 10. Now, the only thing

that we're doing here, it is just a

two-step process. We define our pipeline

for how we want to pull and transform

data from MongoDB and then we execute

it. It gets a little fancy here. This is

what I'm going to go back to the Excal

diagram to explain. But the real power

of MongoDB comes out because not only

can we pull specific data and search for

it in MongoDB, but we can also transform

it into the exact structure that is

optimal for our AI agent, even including

other metadata so our agent can site its

sources. for example. And so we define

the pipeline and then we simply execute

it on our database. And then it's the

exact same thing for the text search.

It's just the pipeline looks a little

bit different because we're doing that

fuzzy search instead of our semantic

search. And then for the reciprocal rank

fusion, I'll get into this more in a

little bit, but we have a specific

algorithm where we're going to take all

the chunks we got from our semantic

search, all the chunks from keyword

search. Each one of them comes with a

score that was assigned. And so we're

going to use that score to figure out

which chunks we want to take from each

strategy. And then that is finally what

we return to the agent when we are

merging things and doing the hybrid

search. So going back to the diagram, I

want to dive a little bit deeper into

the pipeline with you. So we're going to

cover the pipeline specifically for the

semantic search and then I won't cover

the keyword search pipeline as well

because it is very similar, but you can

definitely go to the code we were just

looking at and analyze it more yourself

if you really want to dive into things,

which of course I would encourage you to

do. And so we've got really four stages

in total with our pipeline. And so we

send in a request and then what we get

at the end of the pipeline is all of the

chunks that we retrieved from our

lookup. So zooming in a little bit more,

let's start with the entry point to our

pipeline. This is where we do our first

lookup. So we create that vector

representation of our query to find the

most relevant chunks. And so what we get

out of this is the top 10 chunks by

similarity. But now at this point we

don't have a score yet. We also don't

have extra metadata. we need to do a lot

more of the pipeline to really enrich

the information that we have here before

we merge it with what we get from the

keyword search as well. And so next we

do a lookup. We are joining with the

documents collection. So in the first

stage we are searching through the

chunks that we have in MongoDB. Now we

want to take the top 10 chunks that we

find or whatever that match count is and

we want to associate them with the

original documents that those bite-sized

pieces of information came from. And the

reason we want to do this is because now

we have all the metadata where this

chunk came from, the file, how long the

file is, when the ingestion date was.

All this information can be really

relevant to the agent. For example,

going back to our terminal here, our

last question, what is the timeline for

the Converse Pro launch prep? I can now

say, where did you get this info? And so

based off the metadata that it got from

calling this tool, it now knows that it

got it from the internal meeting notes

dated January 8th, 2025. And going back

to that Google Doc that we have here,

sure enough, that is exactly where it

got this information from. So it has

that thanks to being able to pull the

document record along with the chunks.

And we're doing that through the join

here. And then next up, we are doing an

unwind. And basically all we're doing

with this is a bit of a data

transformation. So this is an array. We

want to turn it into an object just so

that we're making this more neat to feed

it into the merge and then the agent

after that. Now the last thing that we

need to do is we need to extract our

similarity score. Now there are a couple

of other things that we're doing here.

Basically just making this a really nice

flat object. But the similarity score is

the main thing because this is how we're

going to merge things with the results

that we get from the keyword search

pipeline as well. And so speaking of

that, I didn't cover that in detail in

this diagram, but you can look at the

code here like I mentioned earlier. We

do our simple fuzzy search. We do a

lookup. We also have a limit that we're

including as well. Then we do the same

unwind and the same getting that

similarity score. And so at the end of

both of these pipelines running, that's

when we have to go into our reciprocal

rank fusion algorithm to merge things

together. Try saying reciprocal rank

fusion 10 times in a row. It's

definitely a tongue twister. Now going

back to our diagram here, the reason

that we need an algorithm in the first

place is because the similarity scores

from our two pipelines have a completely

different scale. This is really standard

for traditional rag. Your similarity

score for a vector search is going to be

between 0ero and one. It's always a

decimal value, something like 0.85. But

for a text search or our keyword search,

it's going to be something different

like 15 or 13 or 11. And so the big

question here is like how do we know if

a score of 15 for text search is more

relevant than a 0.85 for vector search?

And yes, this algorithm does get kind of

technical here, but I want to at least

cover it at a high level because we have

this formula where we're going to use

rank positions instead of raw scores.

And if you're really curious how this

works, there are a lot of resources

online to learn about RFF. And also,

this is something that's in preview

right now, but MongoDB is working on

building directly into the platform. So,

we can include Rank Fusion in our

pipelines instead of having to create

the code for it ourselves like I did in

this demo. Now, this is in preview, so

it doesn't work for the free tier of

MongoDB. I want to make it very easy for

you to get started, which is why I'm not

using this and I'm coding it myself. But

just know that this is coming from

MongoDB. It's another reason why MongoDB

is great for hybrid search specifically

because this merging that we do at the

end is very very important right like we

have our semantic search pipeline and

our keyword search pipeline then we have

to combine things and they have a lot of

different parameters we can set to make

this really robust as well. So very very

important and so we have our final

rankings at the end where we have a

third score using this formula so that

we are kind of normalizing these

different values from the different

pipelines so that we truly know like

from the 20 chunks that we got from both

the semantic and keyword search here are

the five or here are the 10 that we

finally want to send to our agent so it

can enhance its context to give us the

final response. That's the goal of rag

overall. And so that leads us to our

complete hybrid search flow. We have the

user query. The agent is going to send

some query that it defines based on this

into both of the pipelines. And then we

use RFF to combine by rank and then send

those to our agent to give us the final

response. And like you saw with our demo

earlier, even though there's a lot going

on under the hood here, we have to do

both pipelines and we have to merge

things. the total latency it's still

really really low and so even just doing

another example here like if I do an

exit close out of this open up again and

I just say like you know what is the

revenue from uh 2025 doesn't really

matter here I just want to show you

again like this is really fast overall

takes more time to side the tool call

than it does to even finish the tool

call and then we get our final response

streamed out like this entire thing just

takes a couple of seconds and the query

itself including the merging is less

than a second. So that is a wrap for our

hybrid search agent and please use this

as a template to get started or just

take the concepts here if you want to

apply it to a different text stack. But

I really do like what we're working with

here. MongoDB, Pideantic AI and

Dockling. You're going to see more

content on these soon as well. And a

special thanks to MongoDB for working

with me on this video. I always love

working with the teams behind products

that I genuinely use as a part of my

tech stack. And so with that, if you

appreciated this video and you're

looking forward to more things on

building AI agents and leveraging AI

coding assistants, I'd really appreciate

a like and a subscribe and I will see

you in the next video.

The Simplest RAG Stack That Actually Works (Complete Guide)

Cole Medin

81 days ago

24:24

Ai Whitelist

AI Whitelist

Rank #1

Description

Most actually useful AI agents leverage some form of RAG - it's how our agents can search through our documents and data in real time. In this video, I'll show you from the ground up how to build a hybrid RAG agent in Python with a simple and VERY effective tech stack - Pydantic AI + MongoDB + Docling. This agent can ingest all common file formats - PDFs, Word docs, markdown, etc. and immediately search through it all to answer any question we have. It uses both keyword and semantic search so it can handle a wide variety of questions with high accuracy. This is the kind of AI agent you can also use as the foundation for ANY RAG agent you're looking to build, so please feel free to use this as a template as well - link below! ~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to get started building RAG Agents with a simple to use and fast database, check out MongoDB: https://fandf.co/3XKF8jG Thanks again to them for working with me on this video! It's always a pleasure working with the teams behind products I genuinely care about using. ~~~~~~~~~~~~~~~~~~~~~~~~~~ - The Dynamous Agentic Coding Course is now FULLY released - learn how to build reliable and repeatable systems for AI coding: https://dynamous.ai/agentic-coding-course - Pydantic AI: https://ai.pydantic.dev/ - Docling: https://docling-project.github.io/docling/ - GitHub repo for the MongoDB Agent: https://github.com/coleam00/MongoDB-RAG-Agent - MongoDB guide to building RAG AI agents: https://fandf.co/48t6MrB - MongoDB $rankFusion: https://fandf.co/48IAoAd ~~~~~~~~~~~~~~~~~~~~~~~~~~ 00:00 - Introducing Hybrid RAG 00:52 - The Complete Agent Template for the Video 01:44 - Our Tech Stack - MongoDB + Pydantic AI + Docling 04:45 - Pros and Cons of Semantic and Keyword Search 06:41 - Live Demo of Our Hybrid RAG AI Agent 10:43 - Hybrid RAG is a Form of Agentic RAG 11:51 - When to Use Semantic vs. Keyword Search 14:42 - Deep Dive: How Hybrid RAG Works with MongoDB 20:40 - Understanding Reciprocal Rank Fusion 22:49 - Final Overview of the RAG Flow (it's Fast) 23:44 - Outro ~~~~~~~~~~~~~~~~~~~~~~~~~~ Join me as I push the limits of what is possible with AI. I'll be uploading videos weekly - at least every Wednesday at 7:00 PM CDT!

Video Details

Category

Feed

AI Whitelist

Featured Date

December 11, 2025

Quality Rank

#1

AI Recommended