Loading video player...
Today I'm going to show you how to
implement a hybrid search rag agent.
This is one of my favorite rag
strategies. It's really powerful and I'm
going to break it down for you very
simply right now. So the idea here is
that we give our agent the ability to
search our documents and data both
semantically understanding the
relationship between concepts but also
with keywords so we can very accurately
pick out super specific information from
our documents. And so we're going to get
the best of both worlds here because
we're going to use both strategies for
every single search, but it's still
going to be fast. And we have a really
simple tech stack here. And honestly, it
just works. And that's the thing as I've
been evolving my own rag strategies.
I've also been simplifying things, just
putting more and more focus into the
strategies like hybrid search that just
works so well no matter the use case. So
I'm really excited to get into this with
you. And I also built a complete AI
agent for this that demonstrates hybrid
search. So, I'll use this to explain all
the concepts and then this is also a
template that you can feel free to use
for yourself because by the end of this
video, you'll probably be convinced that
hybrid search is the way to go and so
you can use this as a starting point.
And we'll do some live demonstrations in
this video to talk about the different
kinds of queries that we can ask our
agent based on our knowledge. And I have
this Excal diagram to explain our text
stack. And we'll also get into how
hybrid search works specifically with
MongoDB because we're going to use it as
our database and it will essentially
serve as our vector database as well.
And I know this architecture might seem
like a lot, but it is really
fascinating. And don't worry, I'll break
it down nice and simple for you. It's
worth understanding how we've
architected things so our agent can
handle so much data. So let's get into
the tech stack first. Then I'll cover
how hybrid search works and our agent.
Now the first big decision we have to
make for any rag agent is what is our
database going to be? Where are we going
to store our documents for search? And
MongoDB is a platform that I have never
covered on my channel, but I've used it
a lot in the past and there are some
components built into it that will
specifically help us with hybrid search.
And so we can use it as our vector
database. And so it is a NoSQL database.
So we can store our document records and
then also our chunks where we have all
of the embeddings for rag and we can
connect things together. We can perform
our text searches and our semantic
searches very very quickly. And so this
is just a really efficient and easy to
work with option for us. They also have
some rag guides that I'll link to in the
description. I use this one as well as
uh this cookbook right here is a lot of
inspiration for the agent that I built
for this video. So great option for our
database. The kinds of things you can do
here with hybrid search is not the sort
of thing available for a lot of other
vector databases. Now for our AI agent
framework, we're going to be using
Pantic AI. This has been my favorite
framework for the entire year and it
still is. And man, Pideantic AI keeps
pumping out incredible updates. They're
already on version 1.27 and they had
their official version one release only
a couple of months ago. and their
documentation just keeps getting better
and better. They're having more and more
integrations and it's getting easier and
easier to use the framework while still
giving us the flexibility that makes me
really love Pantic AI. This will be the
core for our agent and then all the
tools we give to our Pantic AI agent
will be leveraging MongoDB for rag. And
last but not least for file processing,
we have Dockling because we need a
powerful library in our rag pipeline to
get data into our vector database in the
right format. And Dockling makes it
really easy to extract text from PDFs,
Word documents, markdown documents, even
audio files. And our agent is going to
handle all of those. Plus, we are also
going to be using Dockling for our
chunking strategy because we need some
way to split up our larger documents to
store them neatly in our vector
database. And we're going to be using
hybrid chunking for this. So, I have a
video on my channel where I covered
Dockling and hybrid chunking in more
detail. I'll link to that right here.
This is going to power our rag pipeline.
And out of all of the chunking
strategies, hybrid is by far my
favorite. And so we're going to have
this for all of our chunks that we put
in MongoDB. And so you can see here that
we have really nice starts and ends to
all of our chunks thanks to our hybrid
chunking strategy. And so this is the
kind of thing that like chunking is not
an easy problem to solve, but we're able
to do this very easily just using what
Dockling has for us out of the box. So
that is our tech stack in total. Now I
want to cover how the hybrid search
works and I'll do some live demos with
the agent as well. Now I'm going to be
focusing on how hybrid search works with
MongoDB, but you can apply these
concepts to other databases as well.
We're not limited to MongoDB, but they
have done some things with their
platform that makes it optimal for
hybrid search specifically, even to the
point where I've reached out to their
team to work with them on this video to
make sure that I'm presenting hybrid
search in the best way to you. And they
have features that are in preview right
now that help us with combining our
keyword and semantic search like Rank
Fusion. I will talk about this later,
but right now let's go ahead and focus
on the pros and cons of keyword and
semantic search. So, what I'm about to
explain here is going to help you
understand why we care about combining
both of these strategies together for
our agent. Now, with keyword search, the
big benefit here is pretty obvious.
We're able to find exact terms with very
high accuracy because if I search for a
certain term and that exact word or
phrase appears in my knowledge base, I
am guaranteed to find that chunk. And
you are not actually guaranteed that
with semantic search because when we do
the more traditional rag search, that's
what semantic search is where we work
with the embedding model, it's more of a
conceptual search. And so we're able to
find concepts and related ideas, but we
aren't guaranteed to find exact terms.
And so keyword search might miss some
concepts that semantic search hits on,
and it will fail on synonyms, for
example. But you're always guaranteed to
find when you look up a specific
character in a movie or a legal statute,
like if that exists in the knowledge
base, you will find it. And so that's
why the hybrid search is the solution
here because we're going to be able to
find concepts and exact terms. And so
our search is going to pull in chunks
from both of those strategies and then
we just have to find the best of both.
We need some kind of strategy to merge
things. And that is specifically what we
can do with MongoDB. That's what I'll
dive into in a little bit here. And so
with that, I want to show you hybrid
search in action with the agent that I
have for you as a template. So we'll ask
it a few questions leveraging the
documents that I already have ingested
in MongoDB. So I'm not going to be
showing you things from scratch with the
setup and everything in this video. But
if you want to leverage this template,
which I very much encourage you to do,
just go to the link in the description.
And then I made the setup very
straightforward in the readme here.
getting all your dependencies installed,
setting up MongoDB, getting the
documents ingested, which the documents
that I'm working with in this video, I
have in the repository as well if you
want to use the exact same data set that
I am using. And so going back here,
let's go ahead and ask our first
question. And so this is coming from a
PDF document that I'll show you in a
little bit. What is the revenue
breakdown by service line? And you can
see that it uses the search
knowledgebased tool. The agent defines a
query and then for the search type it is
specifying hybrid because we are
combining keyword and semantic search
for this single tool call. And there we
go. Q4 revenue and then it lists out the
different service lines. And if we go to
our PDF document, which I already have
this pulled up here, the table that you
saw in the response it gave maps exactly
to what we have here. Now I don't
necessarily know did the keyword search
or did the semantic search give us the
right chunk here or maybe both of them
did. This is a more simple example but
you can imagine certain situations where
maybe when we ask for service line it
conceptually understands that like these
are service lines and so it's the
semantic search that pulls that out or
maybe the keyword revenue finds this
specific chunk because we have some kind
of preference before we list out this
table here. Right? If we go to the
document here, revenue, right? Like
maybe the keyword found that and then it
loaded the chunk that has this entire
table. And so I can't really tell you
right now exactly what strategy found
this chunk, but you can really see here
how based on these kinds of questions
that either one of them could be the
savior here that finds the chunks when
the other misses it. That's why it's so
powerful for us to include both
strategies. Now, one kind of question
that semantic search often messes up on
is when you ask it for a value from a
specific year. So, classic rag will
often fail with this because if you have
a lot of different years worth of data
in your knowledge base, it might find
2023 revenue instead of 2025 because
it's more of that conceptual search
versus an exact keyword search. And so,
the agent gives the query of Neuroflow
revenue 2025. And so, it gets the right
answer here. I know this from one of the
markdown documents in my knowledge base,
but if I had many different years worth
of data, I would trust the keyword
search more because it's going to pull
out 2025 and revenue. That's going to be
somewhere in a document. And it doesn't
have to be exact because we're using
what's called fuzzy search here as well.
But yeah, just giving one example of
when keyword search is probably going to
be better. Now, of course, I'll also
give you an example where semantic
search really shines. The kind of thing
that I don't think that a keyword search
would find for us here. So I'm asking
for the timeline for the Converse Pro
launch prep which this information is
coming from a meeting note one of the
word documents that I have in the
knowledge base. Now the thing is I don't
mention timeline explicitly in this
document. So a keyword search is
probably going to fail. I have to
conceptually understand that the launch
plan is what correlates to a timeline
for Converts Pro. And so for example,
early access program February 1st
through 28th, launch the webinar on
March 20th. Let's see if we get this
answer when we send in this query. So it
still will do hybrid. It's going to do
both, but it's probably going to be the
chunks from semantic search that gives
us the correct information. There we go.
Like for example, the early access
program from February 1st to 28th and
then we are launching the webinar on
March 20th. And so it was able to
understand that timeline is equivalent
to the launch preparation. And by the
way, the reason why the search type is a
parameter that the agent can set for the
tool is because it can decide to also
just do a semantic search or just do a
keyword search. Now, for the system
prompt that I've given the agent right
now for demo purposes, I'm telling it to
always do both, to do a hybrid search
type. But sometimes for the sake of
speed or less tokens, you might want to
tell it to only do a keyword search or
only do a semantic search for certain
kinds of questions that you know is
going to be optimal for one or the
other. And that is actually why I don't
include hybrid search explicitly in my
YouTube video where I cover all rag
strategies because I would actually
consider hybrid search a form of a
gentic rag. So I've covered a gentic rag
a lot on my channel before. It's just
generally the idea of you give your
agent the ability to choose how it
explores your knowledge base. In this
case, it's able to do keyword search,
semantic search, or both at the same
time. And so, I've covered a lot of
different kinds of examples of agentic
rag, like being able to do a rag search
or just read a full document. But hybrid
search is just kind of another version
of agentic rag. Now, the question you're
probably asking is, how do we know how
to instruct the agent in the system
prompt whether to do keyword or semantic
search? But let me cover a couple of
examples here. I think it'll become
really obvious when one is better than
the other if it isn't already. And so,
when does vector search or classic rank
search do well? Well, it's when we want
to connect concepts together. Like, if
we search for king, we will find records
that mention queens as well. So maybe
we're more focused on royalty and we
want to stick to that entire domain and
not be limited to just finding
information on kings. Han Solo is going
to find Chewbacca. Berlin will find
Germany. Microservices will find
architecture. Cheap flights will find
affordable airfare. And for example,
searching for slow PC might find these
articles that are talking about how to
make your PC run very quickly. So we can
connect concepts even to the point where
we find opposites because keyword search
would only find things talking about
slow PCs. So it might miss an article in
your knowledge base that's talking about
how to make your PC really fast, for
example. But when a user is searching
for slow PC, they probably care about
that. And so how about keyword search?
When does it do well? Well, a lot of
these examples are very, very specific.
And that's because these are the kinds
of things that we might miss when we're
searching at a higher level more
conceptually. So if we're in code
searching for a 409 error, it'll find
the exact code in the docs that
references this error. Or searching for
a specific product that we have and
maybe an Excel file, for example. A lot
of times semantic search doesn't do well
with that. And we search AAPL, we'll
find the stock and not the fruit. We can
search for specific legal statutes. King
will find King George instead of finding
queens. So like maybe you do really want
to limit to kings and you don't want to
pull chunks with queen as well. That is
also when keyword search is better. And
then Berlin will map to the capital of
Germany because if you have some kind of
like Wikipedia article that's talking
about the capital of Germany, you know
that Berlin is going to be right there
in the same sentence and so it will pull
that entire chunk. And one last really
important thing to mention here for
keyword search is that we have what is
called fuzzy matching. I mentioned this
very briefly earlier, but all it does is
it allows for a certain number of edits
and prefix length differences to make it
so that we can have typos and it will
still be able to find those keywords in
our knowledge base. This is really,
really powerful because if we don't let
the agent come up with the queries and
it's just our own typing, there might be
a typo or the agent might not understand
exactly how to spell something in our
knowledge base or there's a typo in our
knowledge base itself. all these
different ways that a true keyword
search could go wrong. That's why we
want fuzzy matching. And so it still is
pretty flexible even though it can't
understand concepts like a semantic
search. All right, so you understand
keyword and semantic search. Now I want
to get into the pipeline. How does the
actual lookup work for both semantic and
keyword search? And then how do we
combine things together? So I'm going to
talk about the aggregation pipeline that
we have in MongoDB and then get into the
reciprocal rank fusion. This is the
algorithm for us to merge the results
together from both of our search
strategies. And so we're looking at a
bit more of a technical part of the
video. Now I'm going to get into the
code a little bit as well, but I still
want to stay pretty high level. I just
want to show you how we pull the data
from the database, how we transform it,
and how we combine it. So I'm going to
go over to the code now and then I'll go
back to the Excal diagram after we take
a quick look here. So I have my agent
defined with Pantic AI. This is really
standard for all the Pantic AI agents
I've created on my channel. And then we
have our single tool here to search our
knowledge base with that parameter where
the search type can either be just
semantic, just text or keyword search or
hybrid combining both. And then we call
one of these functions that we have in
our tools.py right here. based on the
specific search type. That's pretty much
the logic that we have in the agent.
Most of what we have with working with
MongoDB is now within these functions
that we have right here. So, for
example, right here with semantic
search, we have the context from our
panici agent. The agent generates a
query for the lookup based on what we
told it. And then we have the match
count. How many chunks do we return? And
this defaults to 10. Now, the only thing
that we're doing here, it is just a
two-step process. We define our pipeline
for how we want to pull and transform
data from MongoDB and then we execute
it. It gets a little fancy here. This is
what I'm going to go back to the Excal
diagram to explain. But the real power
of MongoDB comes out because not only
can we pull specific data and search for
it in MongoDB, but we can also transform
it into the exact structure that is
optimal for our AI agent, even including
other metadata so our agent can site its
sources. for example. And so we define
the pipeline and then we simply execute
it on our database. And then it's the
exact same thing for the text search.
It's just the pipeline looks a little
bit different because we're doing that
fuzzy search instead of our semantic
search. And then for the reciprocal rank
fusion, I'll get into this more in a
little bit, but we have a specific
algorithm where we're going to take all
the chunks we got from our semantic
search, all the chunks from keyword
search. Each one of them comes with a
score that was assigned. And so we're
going to use that score to figure out
which chunks we want to take from each
strategy. And then that is finally what
we return to the agent when we are
merging things and doing the hybrid
search. So going back to the diagram, I
want to dive a little bit deeper into
the pipeline with you. So we're going to
cover the pipeline specifically for the
semantic search and then I won't cover
the keyword search pipeline as well
because it is very similar, but you can
definitely go to the code we were just
looking at and analyze it more yourself
if you really want to dive into things,
which of course I would encourage you to
do. And so we've got really four stages
in total with our pipeline. And so we
send in a request and then what we get
at the end of the pipeline is all of the
chunks that we retrieved from our
lookup. So zooming in a little bit more,
let's start with the entry point to our
pipeline. This is where we do our first
lookup. So we create that vector
representation of our query to find the
most relevant chunks. And so what we get
out of this is the top 10 chunks by
similarity. But now at this point we
don't have a score yet. We also don't
have extra metadata. we need to do a lot
more of the pipeline to really enrich
the information that we have here before
we merge it with what we get from the
keyword search as well. And so next we
do a lookup. We are joining with the
documents collection. So in the first
stage we are searching through the
chunks that we have in MongoDB. Now we
want to take the top 10 chunks that we
find or whatever that match count is and
we want to associate them with the
original documents that those bite-sized
pieces of information came from. And the
reason we want to do this is because now
we have all the metadata where this
chunk came from, the file, how long the
file is, when the ingestion date was.
All this information can be really
relevant to the agent. For example,
going back to our terminal here, our
last question, what is the timeline for
the Converse Pro launch prep? I can now
say, where did you get this info? And so
based off the metadata that it got from
calling this tool, it now knows that it
got it from the internal meeting notes
dated January 8th, 2025. And going back
to that Google Doc that we have here,
sure enough, that is exactly where it
got this information from. So it has
that thanks to being able to pull the
document record along with the chunks.
And we're doing that through the join
here. And then next up, we are doing an
unwind. And basically all we're doing
with this is a bit of a data
transformation. So this is an array. We
want to turn it into an object just so
that we're making this more neat to feed
it into the merge and then the agent
after that. Now the last thing that we
need to do is we need to extract our
similarity score. Now there are a couple
of other things that we're doing here.
Basically just making this a really nice
flat object. But the similarity score is
the main thing because this is how we're
going to merge things with the results
that we get from the keyword search
pipeline as well. And so speaking of
that, I didn't cover that in detail in
this diagram, but you can look at the
code here like I mentioned earlier. We
do our simple fuzzy search. We do a
lookup. We also have a limit that we're
including as well. Then we do the same
unwind and the same getting that
similarity score. And so at the end of
both of these pipelines running, that's
when we have to go into our reciprocal
rank fusion algorithm to merge things
together. Try saying reciprocal rank
fusion 10 times in a row. It's
definitely a tongue twister. Now going
back to our diagram here, the reason
that we need an algorithm in the first
place is because the similarity scores
from our two pipelines have a completely
different scale. This is really standard
for traditional rag. Your similarity
score for a vector search is going to be
between 0ero and one. It's always a
decimal value, something like 0.85. But
for a text search or our keyword search,
it's going to be something different
like 15 or 13 or 11. And so the big
question here is like how do we know if
a score of 15 for text search is more
relevant than a 0.85 for vector search?
And yes, this algorithm does get kind of
technical here, but I want to at least
cover it at a high level because we have
this formula where we're going to use
rank positions instead of raw scores.
And if you're really curious how this
works, there are a lot of resources
online to learn about RFF. And also,
this is something that's in preview
right now, but MongoDB is working on
building directly into the platform. So,
we can include Rank Fusion in our
pipelines instead of having to create
the code for it ourselves like I did in
this demo. Now, this is in preview, so
it doesn't work for the free tier of
MongoDB. I want to make it very easy for
you to get started, which is why I'm not
using this and I'm coding it myself. But
just know that this is coming from
MongoDB. It's another reason why MongoDB
is great for hybrid search specifically
because this merging that we do at the
end is very very important right like we
have our semantic search pipeline and
our keyword search pipeline then we have
to combine things and they have a lot of
different parameters we can set to make
this really robust as well. So very very
important and so we have our final
rankings at the end where we have a
third score using this formula so that
we are kind of normalizing these
different values from the different
pipelines so that we truly know like
from the 20 chunks that we got from both
the semantic and keyword search here are
the five or here are the 10 that we
finally want to send to our agent so it
can enhance its context to give us the
final response. That's the goal of rag
overall. And so that leads us to our
complete hybrid search flow. We have the
user query. The agent is going to send
some query that it defines based on this
into both of the pipelines. And then we
use RFF to combine by rank and then send
those to our agent to give us the final
response. And like you saw with our demo
earlier, even though there's a lot going
on under the hood here, we have to do
both pipelines and we have to merge
things. the total latency it's still
really really low and so even just doing
another example here like if I do an
exit close out of this open up again and
I just say like you know what is the
revenue from uh 2025 doesn't really
matter here I just want to show you
again like this is really fast overall
takes more time to side the tool call
than it does to even finish the tool
call and then we get our final response
streamed out like this entire thing just
takes a couple of seconds and the query
itself including the merging is less
than a second. So that is a wrap for our
hybrid search agent and please use this
as a template to get started or just
take the concepts here if you want to
apply it to a different text stack. But
I really do like what we're working with
here. MongoDB, Pideantic AI and
Dockling. You're going to see more
content on these soon as well. And a
special thanks to MongoDB for working
with me on this video. I always love
working with the teams behind products
that I genuinely use as a part of my
tech stack. And so with that, if you
appreciated this video and you're
looking forward to more things on
building AI agents and leveraging AI
coding assistants, I'd really appreciate
a like and a subscribe and I will see
you in the next video.
Most actually useful AI agents leverage some form of RAG - it's how our agents can search through our documents and data in real time. In this video, I'll show you from the ground up how to build a hybrid RAG agent in Python with a simple and VERY effective tech stack - Pydantic AI + MongoDB + Docling. This agent can ingest all common file formats - PDFs, Word docs, markdown, etc. and immediately search through it all to answer any question we have. It uses both keyword and semantic search so it can handle a wide variety of questions with high accuracy. This is the kind of AI agent you can also use as the foundation for ANY RAG agent you're looking to build, so please feel free to use this as a template as well - link below! ~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to get started building RAG Agents with a simple to use and fast database, check out MongoDB: https://fandf.co/3XKF8jG Thanks again to them for working with me on this video! It's always a pleasure working with the teams behind products I genuinely care about using. ~~~~~~~~~~~~~~~~~~~~~~~~~~ - The Dynamous Agentic Coding Course is now FULLY released - learn how to build reliable and repeatable systems for AI coding: https://dynamous.ai/agentic-coding-course - Pydantic AI: https://ai.pydantic.dev/ - Docling: https://docling-project.github.io/docling/ - GitHub repo for the MongoDB Agent: https://github.com/coleam00/MongoDB-RAG-Agent - MongoDB guide to building RAG AI agents: https://fandf.co/48t6MrB - MongoDB $rankFusion: https://fandf.co/48IAoAd ~~~~~~~~~~~~~~~~~~~~~~~~~~ 00:00 - Introducing Hybrid RAG 00:52 - The Complete Agent Template for the Video 01:44 - Our Tech Stack - MongoDB + Pydantic AI + Docling 04:45 - Pros and Cons of Semantic and Keyword Search 06:41 - Live Demo of Our Hybrid RAG AI Agent 10:43 - Hybrid RAG is a Form of Agentic RAG 11:51 - When to Use Semantic vs. Keyword Search 14:42 - Deep Dive: How Hybrid RAG Works with MongoDB 20:40 - Understanding Reciprocal Rank Fusion 22:49 - Final Overview of the RAG Flow (it's Fast) 23:44 - Outro ~~~~~~~~~~~~~~~~~~~~~~~~~~ Join me as I push the limits of what is possible with AI. I'll be uploading videos weekly - at least every Wednesday at 7:00 PM CDT!