Make Your AI Agents 10x Smarter with Hybrid Retrieval (n8n) | DailyDevLists

Loading video player...

Full Transcript

5,080 words • EN

There's a myth in the AI world that

vector search is the silver bullet to

ground your AI agents in your private

company knowledge. But the reality is

quite different. If you're building AI

agents that rely solely on semantic

search, you're leaving massive gaps in

your retrieval. Gaps that can lead to

hallucinations, incomplete answers, and

unreliable results. That's not to say

vector search is bad. It's brilliant for

conceptual semantic style queries. But

there's an entire subset of other

queries that require different retrieval

strategies to output the right answer.

And in essence, this is retrieval

engineering, designing your retrieval

strategies based on the specialized

scope and capabilities of your system.

We've worked with hundreds of community

members who are building production

grade rag agents and we are seeing the

same patterns over and over again. So

in this video, I'll show you nine

realworld examples of types of questions

where vector search fails, and I'll

demonstrate the retrieval strategies

that actually work in these examples. As

I said, vector search isn't bad. It's

just one tool in the toolbox. There's a

lot more to rag, so let's get into it.

Building AI agents in NAD is incredibly

easy. It's a simple canvas. You add an

AI agent node. You connect up a model

give it a system prompt, a couple of

tools, and away you go. And that's great

for a quick proof of concept, but in

reality, building accurate and reliable

AI agents in any platform is hard. And

the reason for this is their natural

language interface. You can literally

ask an agent any possible question in

any number of different ways. And it

needs to figure out what you need. And

some questions can be answered directly

from the model's training data, while

other questions require diving into a

knowledge base. What was our Q3 revenue

for example? Clearly requires diving

deep into a financial system. And this

is where things can get tricky when

you're trying to create an accurate and

reliable system because questions asked

of the agent can get quite complex. Some

questions require synthesizing

information across multiple documents

comparing information from different

data sources, interpreting and analyzing

data extracting summarizing

inferring, evaluating, and the list is

endless. It's infinite. And the critical

thing is each type of question might

require a completely different approach

to retrieve the right information to

generate an accurate answer. And on the

one hand, this is what makes AI agents

so powerful. But on the other, this is

why creating accurate and reliable

agents can be quite difficult. And in

essence, this retrieval strategy will

ultimately determine whether your

project succeeds or fails. And if you

strip an AI agent back to its core

there's essentially a simple decision

loop that's at play. So when a user's

question comes in, it hits an AI agent

powered by an LLM in the context of a

conversation. So there's memory and it

needs to reason and decide does it have

all of the information it needs in

context to answer the question or do I

need to do something to retrieve the

information or carry out an action so

that I can then answer the question and

it's a loop. So this can happen multiple

times. Now of course this is a little

bit simplified but you get the idea. And

when we think about retrieval, retrieval

within the context of an AI agent is

simply a tool call. No different to an

API call to create a calendar entry or

draft an email for example. So querying

a vector store is just another tool

call. But it is important to state that

there's more to rag than just vector

search. And don't get me wrong, I would

love if vector search was the silver

bullet for chatting to your data. But

here's the major problem. Vector search

operates using a similarity algorithm.

When you query a vector store, you are

getting the most similar results back.

But similarity is not the same as

relevance. Relevance is highly

subjective depending on the question

that was asked. If you queried a vector

store, for example, looking for

information on error code 221, the

vector store would happily send you back

information on error 220, 221, and 222

because they're all relatively similar.

Clearly, the most relevant result you're

looking for is this one, error 221. This

lack of exactness around vector search

is both its major strength because you

can find the right information even

though you don't have the right words in

the exact query, but then it's also a

major weakness if you need exactness

like in this case. And this is why there

are an array of retrieval methods that

you can leverage for different types of

questions. Vector search we've spoken

about, but if you need exactness

keyword search, pattern matching is a

great approach. SQL queries for

structured data searches, graph

databases for relationships and

concepts, API calls if you need to

retrieve information from other software

systems, file system scans if you need

to grab information from disk. And this

is why I would define all of these

including API calls as rag because if

through these methods the retrieved

information is fed into context and it's

used by the model to synthesize the

answer then that is retrieval augmented

generation. This entire video was

inspired by an article I read from Amit

Verma who's the head of engineering at

neuron 7 where he described retrieval

engineering as a distinct discipline

that's going to emerge over the coming

years. And just as how machine learning

ops has matured, so too will practices

around hybrid ranking, graph

construction, and more. And all of this

leads to the nine common question types

where vector search may fail, and you

need other retrieval strategies to

answer those questions. And it's worth

calling out two sources I used when

researching this video. One is IBM's

know your rag research as well as the

comprehensive rag benchmark. I'll leave

links for these in the description

below. First up are summary questions.

And this is the most common question

type that trips up our members when

building out systems. In this example

we have a database of meeting

transcripts. And the question is, what

decisions were made in the leadership

meeting? The key thing about this

question is to answer it accurately, you

need to analyze the full transcript of

the specific meeting. That way you can

extract out the various decisions that

were made. So it has multiple units of

information. And if we look at an

example knowledge base here, you can

have different documents that represent

different transcripts. You'll have

different chunks because meetings can go

on quite long. They may be chunked and

decisions are sprinkled across the

transcript. And the other thing is

they're not specifically called out as a

decision in a lot of cases. So with the

standard vector search without metadata

filtering for example, you would just

search for the word decision and anytime

it was mentioned in any of the

transcripts it would be pulled back. So

at the very least you would need to use

metadata filtering to narrow in on the

specific meeting that the user is

talking about. But even then searching

for decisions across that meeting

transcript isn't really going to yield a

complete answer. Now there is a case

where it will. Whereas maybe at the end

of the meeting if someone was taking

notes they could summarize all of the

decisions and in that case vector search

will return the right result and this is

the greyness of these questions.

Sometimes vector search will actually

output the right answer depending on the

content that's in the vector store or

depending on the iterative retrieval

that an agent will undertake to actually

research the topic. So we have a number

of possible retrieval strategies here.

agentic rag as I mentioned which is

iterative retrieval so the agent can

search across the vector store multiple

times but again it doesn't really know

what to search for because it doesn't

know what decision it's looking for if

it's not called out as a decision but it

can apply metadata filtering to at least

narrow the scope query expansion for a

more traditional rag system that's not

agent-based but again you have the same

problem here because you don't know what

you're looking for context expansion is

a technique I went into in a previous

video And that allows you to look at the

structure of a document and load up the

parent section, for example. But this

isn't really a document that has a

formal structure. But in reality, to get

the most comprehensive answer to this

question, you need to load the full

transcript. And if the transcript is too

large, you need a way of batch

summarizing the transcript to extract

out decisions. And this idea of loading

a full document is covered in that

context expansion video that you see

here. And something else to call out

here is if someone is looking for

decisions that were made at a leadership

meeting, they should have the right

level of access to actually retrieve

that information. I have a full video on

zero trust rag where I go deep into how

you secure AI agents to make sure people

have the right access privileges when

requesting the information. And that's

our first example of a summary question

where in reality you need to process the

entire document or the entire transcript

to output a comprehensive answer. If

you're enjoying the video, make sure to

give it a like below and subscribe to

our channel for more deep AI and NAND

content. It really helps us out. Another

example is let's say we have a knowledge

base of documentation for a cloud

storage service and the question might

be what are the main features of the

service? The answer could be embedded in

features scattered across the

documentation. So again, multiple units

of information spread out and not

specifically called out as features. You

could have version control, encryption

device syncing, organization. And if you

carry out a vector search, you might

pull some of these, but it's unlikely

that you're going to pull everything.

And that can yield a partially correct

answer, not necessarily a hallucination

but just not comprehensive. And vector

search can work here if for example

there's a page in the documentation that

lists the features and that's similar to

the example of decisions made at the

leadership meeting. If someone goes to

the effort of summarizing the decisions

then it's just raw retrieval to fetch

the list and it's the same here. So the

heart of the issue here is with vector

search you're talking about raw

retrieval of information that's there in

a knowledge base whereas these types of

questions require document processing.

So if someone has already listed the

features of cloud storage that

processing has already taken place and

it's there in the vector store to be

retrieved. But if it hasn't you're going

to need to do the same thing again.

you're going to need to load the full

documentation or iteratively process the

documentation to retrieve out features.

And this leads us to our third example

of a summary question, which is, let's

say, a database of internal reports. And

someone is asking for a comprehensive

summary of a specific report. And to

fetch this, you need to synthesize

information from every single section of

that document. And if you miss any

section, it's potentially misleading or

incomplete. So it's the same problem

again. Vector search will retrieve a

limited number of chunks of the

document. A gentic retrieval will fetch

more chunks, but you are only

essentially retrieving segments or

pieces of the document. And the document

might have an executive summary at the

front. So that might be the first chunk

that's pulled back. And that's how

vector search can produce an answer, but

just not a comprehensive answer.

Approaches to create a comprehensive

summary of a report include loading the

full document into context or instead

actually have a document processing sub

agent that you delegate the task to load

the full document into context and that

way you don't pollute the full context

window of your main agent to answer the

question. But if you are dealing with

documents that are just too big for an

LLM's context window, you need to look

at techniques like map reduce

summarization or hierarchical

summarization. These are top down and

bottom up approaches to summarizing

large documents. If you'd like me to do

a dedicated video on this, then just

drop me a note in the comments below. We

have other videos on our channel where

we go through some of these retrieval

strategies. In our RAG design patterns

master class, I go deep into that idea

of sub agents and delegating tasks to a

sub agent. While my CAG versus rag video

goes into this idea of cache augmented

generation where you load full documents

into context and you're using prompt

caching as a result. We have nodn

workflows for the vast majority of

retrieval strategies that you see in

today's video. If you'd like to get

access to those, then check out the link

in the description to our community, the

AI Automators, where you can join

hundreds of fellow builders, all

creating production grade rag agents.

So, summary questions are notoriously

difficult for vector search to actually

answer comprehensively. But even simple

questions can trip up vector search. So

with the example of a company knowledge

base, for a question like when was our

company founded, vector search should

perform pretty well here. This answer

has one unit of information. It cannot

be partially correct. It has to be fully

correct. And here within example

documents, vector search finds within an

about company page that the company was

founded in 1972. So here the answer

appears verbatim in the documents. The

query embedding when was our company

founded largely matches the embedding of

the chunk which includes founded in

1972. So that should perform pretty

well. But if you have queries that

include rare terms that aren't actually

in the training data of the embedding

model, you can run into problems. So for

this example query, who created the blue

sheet system? Blue sheet is a company

term. It's a system that the company

created and they just dreamt up a name.

And the blue sheet project might be

contained within the knowledge base. But

the problem here is vector embeddings

are going to struggle on this domain

specific term because it was totally

absent or is under represented in the

embedding models training data. And this

is where approaches like lexical search

and hybrid search make a lot of sense

because you can get an exact match on a

term like blue sheet which has very

little meaning in a semantic sense. For

those types of terms, you could also

have a company glossery and have a more

structured data lookup for example.

Another simple question example is this

type of query explain and then you just

have a random code that's specific to

that company. So explain 15 CFR 744.21.

So this could be a company regulation or

a policy or something like that and

it'll be contained in those documents.

It has one unit of information. It

cannot be partially correct. But the

embedding model has very little chance

of representing this identifier as it

was totally absent from its training

data. It means nothing really. So here

you would need to use the likes of

pattern matching to actually find this

code because even hybrid search can

actually fail with this. And the reason

being hybrid search tokenizes the actual

information. So this code 15 space CFR

space 744 these spaces will result in

this code being split. So there is no

exact match possible using lexical

search because this is now four tokens

instead of one. So hybrid search or

lexical search can fail here and that's

why you might need pattern matching

which is wild cards or reax or again

that idea of a structured glossery or

structured data lookup. I've gone deep

in these topics on this channel. I have

a hybrid search video where I show you

how to set up hybrid search on superbase

and pine cone. And I also have a lexical

and pattern matching search video called

high precision rag. So as you can see

even simple questions can trip up vector

search. But not all simple questions are

simple. They can have conditions. Here

we have what looks like a simple

question. Who is the CEO of our company?

So we should be getting an answer with

one unit of information. It can't be

partially correct. But there's a catch

which is it's recency dependent. This

company might have had 10 CEOs over the

last 20 or 30 years. Standard vector

search is going to search for the word

CEO or chief executive officer and it'll

pull out lots of different names from

lots of different documents over the

last 20 years. So, how does it know who

is the current CEO? Because that's the

implication here. Who is the CEO of our

company? The person's most likely

looking for the current CEO. Now, that

could be clarified by the agent, but I

think that's the implication here. So a

possible issue would be that for

whatever reason the current CEO Sarah

Patel might not actually be returned at

all by vector search. Maybe the vector

search will return previous CEOs from

older documents because generally

speaking vector search doesn't favor

newer documents over older ones. It's

all about similarity. But a good

approach would be within the metadata of

the chunks to at least have let's say

the document title. So, if 2025 merger

was in the metadata that was fed to the

agent and Sarah Patel was returned, then

the LLM will more likely describe

Sarah's the current CEO, James was the

previous Michael in 2015. And that's

where metadata can be used to influence

the AI when generating the response, but

it could also be used for filtering.

That's where you could tag and filter

documents, maybe based off publication

date. So you could check all documents

in 2025 and if nothing is returned then

try 2024. Or you could filter documents

by type. So maybe only look at employee

records or org charts. Hybrid search

might work here because CEO is an

abbreviation not full natural language.

Although I think CEO is pretty well

represented in the training data of

these embedding models. A structured

data lookup could work of an org chart

that could be in a knowledge graph or in

a database table. And re-erranking is an

approach here as well. There are new

re-ranking models that are actually

promptable. So the AI agent could prompt

a re-ranker to say, "Rank these chunks

based off recency." Now, you just need

to make sure that the actual date of the

document is dropped in there as well. If

you'd like me to do a video on that type

of reranker, again, let me know in the

comments below. So, you can see that

even simple questions with conditions

can actually cause problems for vector

search. And here's another one. What was

our revenue in Q2 2024? It it is a

simple question. However, it's tabular

by nature and that's where vector search

falls down. So there could be reports

PDF reports in the knowledge base that

talk about the different financial

returns of the different quarters for

the company and they might have

different revenue figures returned. But

again, it's a little bit of a gamble on

what actually will be returned by the

vector store. And this ties into the

need to build accurate and reliable

agents because an AI agent might

actually get this right. It might

actually pull the correct report based

off the data queried, but then it might

not. It might pull a different report.

So, it's that lack of reliability that

then ties into the trust that people

have with the system and whether they

can actually take the information at

face value. So here answer appears

embedded in tables and documents. So

definitely actually markdown OCR makes a

lot of sense here. If we are only

relying on PDF financial reports, at

least get them into a format that is LLM

friendly. So the likes of Mistral or

Docklane will work well here. But yeah

I think structured data makes a lot of

sense. A direct lookup of a database

table that actually contains these

results would be a lot more reliable.

Again, the reranking approach could work

here. Metadata filtering could work or

even an API call to a financial system

that actually contains this information.

Again, we have lots of videos on these

strategies on our channel. Allan has

gone deep on database agents and

spreadsheet agents. So, there's a few

videos there. We have a metadata

filtering video, how to extract markdown

tables using the likes of Docklane

Llama Parse, and Mistral. And I also

have a re-ranking video which explains

what actually it does. Another type of

question that can trip up vector search

are aggregation questions. So here for

example, if someone asks how many

customer support tickets were closed

last month, the answer requires

computing or counting across lots of

different documents. So there is a

quantitative output and it's not

necessarily embedded in the text of the

documents. So for example, each support

ticket might be embedded in a vector

store so that it can be searched across.

But if you do need to know the number of

tickets closed in a specific month, this

naturally aligns to SQL queries. So

structured data is the best approach

here by far. An API call or an MCP call

could also work if it's a support ticket

software that has an API endpoint that

actually answers this question. There's

a category of questions that I would

describe as global questions that

essentially span the entire knowledge

base. So for example, what are the

recurring operational challenges

mentioned across all team

retrospectives? So here we need to

identify patterns and themes across

massive document collections. So there's

no single document that contains the

answer. Within our example documents, we

just have all of our various team retros

where various team issues are talked

about. Vector search would essentially

return a random subset of

retrospectives. So it would be able to

provide a partial answer that reflects

some of the retrospectives, but it can't

really speak to the recurring

operational challenges because there's

just too much documentation to actually

work through. So possible retrieval

strategies here, graph rag is probably

the best approach because graph rag

extracts out these global concepts as

entities and it can interlink everything

and create community summaries. So the

likes of deployment issues or

communication, gaps,, for example,, might

be interlin multiple times. If you

didn't want to go down a knowledge graph

approach, you could use the map reduce

summarization method where you would

process documents in batches, extract

themes, and then aggregate everything.

But that would be a long running job to

actually undertake. Whereas with graph

rag, that's all premputed up front. I

have a graph rag video on our channel

where I talk through setting up light

rag which extracts out entities and

relationships. There's an entity

resolution process, LLM summarization

and merges and you can query light rag

to inject the most relevant entities and

relationships into context to generate

an answer. With global questions, we're

talking about aggregating knowledge

across lots of different documents in a

corpus. With multihop questions on the

other hand, we're talking about chaining

information across documents. So in this

example question, what projects will be

affected if Sarah goes on maternity

leave? We need to chain information to

generate the answer. So we need to

figure out Sarah's current role, what

projects she's on, what's the status of

the various dependencies in that

project. So in our example knowledge

base, we could have an employee

directory. Maybe we have projects and

team structures. To come up with this

answer, we need to be able to traverse

through this chain to figure out that

Sarah is on the API team, which is a key

part of project Phoenix, which has a

certain project status and is to go live

at a certain date. So, if she went on

maternity leave, there would be a major

impact. And simple vector search can

return these isolated chunks, but it

isn't able to return the connections

between these entities. And in fairness

agent gra can actually get there. It can

perform multi-step reasoning. It could

look for Sarah's projects, try to figure

out the timelines, try to figure out the

dependencies to come up with the answer

but it lacks the reliability that the

likes of a knowledge graph would have

where the data is actually modeled

correctly. That way, you're able to

traverse the graph to come up with the

right answer. So this is one of those

cases where smart reasoning models using

vector search can actually get there but

there may be a trust issue because you

don't know can you actually trust the

answer that you're getting whereas with

the knowledge graph it's a lot easier to

actually stand over the data. A lot of

our members need images to be returned

in line in the responses of the AI

agent. So with this example question how

do I replace the toner cartridge in the

third floor printer? Show me the

diagram. The answer requires retrieving

and returning visual information. And in

the example documents here, there might

be equipment photos as well as an

inventory of the office equipment and

printer manuals, for example. So, this

is multi-stage retrieval and it is

actually suited to vector search if the

images are embedded in the chunks or

they're in the metadata so that they can

be injected into the chat. So that's

where multimodal rag kicks in where you

extract out images from source documents

and make them accessible within the chat

widget. Metadata filtering would also be

important here because you might need to

filter by the equipment model. That way

you're providing the right image. And

aent rag would also be important for

that iterative multi-stage retrieval and

maybe even generating signed image URLs.

Alan has a multimodal rag video on our

channel which is definitely worth a

watch. And I also have a video on how to

create a Slack agent because internal

staff in a company might be using an

instant messaging platform like Slack

when engaging with a system like this.

Some of the questions that we come

across require heavy post-processing to

actually come up with the answer. For

example, is our customer churn rate

trending up or down over the past 6

months? If this information has not been

calculated before, the answer would

require significant reasoning and

analysis to actually figure out. And for

this then you need to load lots of raw

data to calculate the trends and compare

values. So ideally you might have

monthly reports and you'd be able to

pull everything together. But even

vector search might struggle to find

those. Whereas in reality you'd likely

need some sort of SQL tool where you can

pull structured data and then have a

calculator tool so that the LLM could

generate accurate calculations. So a

gentic rag with reasoning is important

here. But for a question like this, if

you really want to trust the answer

you're getting back, you're better off

to have premputed the actual answer. So

that way, the agent is simply just

retrieving the answer. It doesn't need

to calculate anything. Or again, it

might be an API call that you can make

to a CRM or to a financial system to get

the premputed answer back. And finally

there's a category of questions where

the actual starting premise is false.

And it's important that the LLM doesn't

go off track and stays honest to the

data. So for example, which VP led the

Berlin office before it closed? In this

case, the question contains a false

assumption. The company never had a

Berlin office. So the AI agent needs to

recognize and correct the premise rather

than hallucinating the answer to fit

into the user's question. So example

documents here, we might have office

locations and Berlin is not mentioned.

We might have a leadership history

section where the VPs are mentioned and

there may have been expansion plans to

Berlin but there was no timeline set. So

it all comes back to the chunks that are

returned from the vector store. It's

important that the agent stays true to

the information that is in these chunks.

Because if the retrieved documents

mentioned Berlin and VP along with

expansion plans, it could quite easily

hallucinate the connection Sarah Chen

led operations before the Berlin office

closed. So strategies here carrying out

an exhaustive search is important.

Context expansion could help here as

well. That video I mentioned earlier

having agentic rag with verification

could work here because I went through

the verify answer pattern in our rag

design patterns video. And this would

allow a verification step to check the

answer versus the retrieved context to

make sure it's staying true to what it

actually learned. But it all comes back

to eval and ground truth testing. You

need to keep your specialized agents on

track. They need to stay true to the

data and not dive into their training

data or infer connections from

disconnected information. I have a full

video on evaluating AI systems using the

open- source library DP val. So that one

is definitely worth a watch. And if

you'd like to learn more about rag

design patterns, including this verify

answer pattern that I just described

then click on this video here to learn

all about it. Thanks for watching and

I'll, see you, in, the, next

Make Your AI Agents 10x Smarter with Hybrid Retrieval (n8n)

The AI Automators

96 days ago

28:17

RAG & Vector Search

Rank #1

Description

👉 Get access to our Hybrid Retrieval n8n workflows in our community https://www.theaiautomators.com/?utm_source=youtube&utm_medium=video&utm_campaign=tutorial&utm_content=retrieval-engineering Vector Search isn't the silver bullet everyone thinks it is. After working with hundreds of community members who are building production-grade RAG agents, we keep seeing the same failures. Questions that should be simple get hallucinated answers. Queries that need exact matches return "similar" results. Systems that work in demos break in production. Vector search is brilliant for semantic queries—but there's an entire category of questions that need completely different retrieval strategies. In this deep-dive, I'll show you 9 real-world examples where Vector Search fails, and demonstrate the retrieval engineering strategies that actually work in production. 🎯 What You'll Learn: ✅ Why similarity ≠ relevance (and what to do about it) ✅ 9 types of queries where vector search breaks down ✅ Exact match strategies for error codes and IDs ✅ Structured data approaches for tabular queries ✅ Aggregation techniques for counting and computing ✅ GraphRAG for global knowledge patterns ✅ Multi-hop reasoning with knowledge graphs ✅ Multimodal RAG for image retrieval ✅ Handling false premise questions without hallucinations ✅ Complete evaluation strategies using DeepEval 🔗 Useful Links: Amit Verma Article: https://venturebeat.com/ai/from-shiny-object-to-sober-reality-the-vector-database-story-two-years-later IBM Know Your RAG Research - https://arxiv.org/html/2411.19710v1 CRAG Benchmark - https://arxiv.org/html/2406.04744v2 ⏱️ Timestamps: 00:00 - The Vector Search Myth 05:20 - #1 Summary Questions 12:27 - #2 Simple Questions 15:25 - #3 Simple Questions with Conditions 19:55 - #4 Aggregation Questions 20:41 - #5 Global Questions 22:21 - #6 Multi-Hop Questions 23:59 - #7 Multi-Modal Questions 25:12 - #8 Post-Processing Questions 26:18 - #9 False Premise Questions 💬 Questions or Comments? What retrieval challenges are you facing in your RAG systems? Which of these 9 problems have you encountered? Drop your thoughts below!

Video Details

Category

RAG & Vector Search

Featured Date

January 8, 2026

Quality Rank

#1

AI Recommended