Loading video player...
Rag agents are incredibly useful, but
only when you set them up correctly. Do
this wrong and you'll get mistakes,
hallucinations, and bad responses.
That's why I've put together this video,
which will show you the 80/20 of what
you need to know to use rag agents in
the right way inside NAN. Let's get
straight into it. So, the floor is
actually in traditional rag itself. It's
designed to make decisions with
incomplete information. So imagine
you're feeding your rag pipeline a tech
manual like these assembly instructions.
And a standard rag process might take
this sentence here, mark the points on
the wall where the holes are to be
drilled for fastening the base unit. And
actually when it's storing it, it might
just take that sentence and separate it
into multiple chunks that are stored
separately with no context. So the first
chunk might say mark the points on the
wall and then the rest of the sentence
might be segmented into a separate
chunk. So this process of chunking data
starts by shredding your documents that
you upload into these isolated
meaningless chunks of text. And then
when your vector search goes to actually
try and find it, all it's got is these
fragmented pieces to look for. But it
has no idea what the rest of the context
should be around it. So it might find
the chunk about marking the wall but
have zero context about why we're
marking the wall and where we're marking
the wall anyway. And this is really
difficult to spot right because this is
actually a silent failure. It seems to
work when you're trying it with simple
queries. So if you've tried rag on
simple things, it often works. But then
when you're uploading complex documents
or significant amount of data, it just
completely breaks down and destroys
anyone's trust in the system. And the
result is ultimately like fragmented
context, poor retrieval, and answers
that are sounding confident but
completely incorrect, which is worse
than it just returning something and
saying it can't find the right context.
So this chain of failure starts right at
the beginning of your rag pipeline in
how you ingest your data because if you
put rubbish into the system, you're
obviously going to get rubbish out as
well. So most workflows I've seen use
this basic text splitter, extract from
file. So, I'll show you an example of
the Netflix earnings report, which is
this 84page document that has multiple
headings and lots of hierarchy of
information. We've got bullet points in
there. Later on in the document, we've
got tables here, graphs with context and
more formats. Basically, a hierarchy of
information. And if you put it through
this standard extraction technique or
the traditional ingestion, extract from
file, this is exactly what happens. So,
we're going to download that Netflix
earnings report from Google Drive, put
it through the extract from file, and
what we'll be able to see is one giant
blob of text that ignores all hierarchy.
So, normally if we're able to extract it
to something like markdown, we be able
to see in the formatting what is a
header, what's a subheader, but in this
giant blob of text, we actually lose all
of that context. And we also don't
always work with just PDFs, right? We're
working with images, we're working with
Excel docs, we basically want to work
with any format. So instead, we're going
to use this improved ingestion which is
using a dedicated passing tool like
Llama Pass, which is all about
transforming messy documents into AI
ready data at scale. And that's really
important because what we're doing is
effectively taking unstructured data or
structured data in PDF or image format
and actually outputting the results in a
format that's ready to be ingested and
contain some context about the structure
in which it had in the first place. So
if we were for example to connect this
instead to our Llama cloud account and
all we've got here are a few different
HTTP requests, one to the upload
document here. But the key thing here is
it's super simple because we don't even
have to specify the input file type.
Llama pass or Llama cloud is going to be
able to determine that for us. And this
is all available on the free plan. You
just need to enter your API key as a
header or in here. So what we're
basically doing is uploading the
document to this passing endpoint and
we're going to wait and then get the job
status and pull that until we can
actually get the export itself. So we're
pulling the job ID in there. And then
once we get the export, you're able to
see all of the actual formatting
structure applied in here. So whenever
we've got any tables, we've actually got
this markdown table format, which it
might seem like it actually is just a
text blob to you as well, but this is
now AI ready for ingestion because
actually an LLM can read this structure
and understand what fields or what
columns and what data sits within each
column in a table, for example. And the
same with any images, it digests those
images and breaks out the structure and
text from the images. So we're able to
retain the structure of the document
including things like key section
headers and subsection headers and
bullet points etc as well which is
ultimately super important because it
retains all the context for when we
actually go to try and search that data
later. So now the data is ingested
correctly with its original context and
you can see some examples of what Llama
pass works with tech documentation,
insurance claim, papers, healthcare
forms, invoices, PDFs and we didn't even
have to specify in our upload what
document type. It automatically
determined that and therefore use the
correct passer. So this is the simplest
way to do it. You can of course use
other passers. However, this is the only
one I've seen where you don't have to
specify the document type. So, it really
fits the 80/20 here. And we'll run
through another example as well quickly
with some lawn mower instruction manual.
And you can see that this has a huge
content hierarchy where we've got
multiple pages, multiple headings and
diagrams, etc. as well. And you can see
in the outputs here that we've got the
different hashtags for the different
levels of header and we've got all of
the information that's been ingested
here including tables and different
steps there on instruction pages as
well. So that was actually very simple,
right? You now have clean structured
markdown that preserves that document
hierarchy, but you still have to split
that document into chunks to put it into
your database. and doing it wrong will
just reintroduce that same problem where
you have those fragmented vectors with
no actual context to go with them. So
when we're trying to retrieve them
later, it's still going to make no sense
and it will pull out incorrect
information. And standard chunking
methods actually just make this problem
worse to be honest because they don't at
all split documents by meaning. They
just arbitrarily split documents by
number of characters. So if we to pin
all this data here and we feed that into
our traditional chunking, what we've got
is basically we're defining the markdown
content that's being fed in which is our
instruction manuals for the lawn mower
and then we are using this Postgress
vector store connecting to a superbase
vector database and we're inserting
those documents into a table named
documents_pg
test and the benefit of using this
Postgress node instead of the superbase
node directly is that we don't have to
use it with superbase. We can use it
with any vector database that's
Postgress there and it will
automatically create a table with this
table name. So we don't have to go and
set it up in superbase for example. But
we do need to connect to our
credentials. And what we're basically
saying is embed or separate that context
that's been retrieved in into our
individual chunks. So if we go into the
results of this, we've effectively got
all these different chunks. So, it's
chunked from lines 753 to 834
and it's the instruction manual. And
this one does end on the end of a
sentence. But if we keep reading down
these, this one's cut out some random
bullet points from the instruction
manual. This one's a really short
section. While mowing, always wear
suitable footwear and long trousers. So,
if we're trying to retrieve this, if
we're asking a specific question, it's
going to pull out content that might not
be relevant at all. And it's also
missing all of the previous context. So
it basically means that all the core
ideas that are in that context are being
artificially separated just by some
arbitrary character count. Which leads
us to this kind of scenario where chunk
one could depend on chunk three or vice
versa. But actually they have no
context. So the better approach is a
gentic chunking. Instead of counting
characters, it looks for logical breaks
like our paragraphs, our sections or
complete concepts. And it's designed to
keep those thoughts completely together.
So we can have something that emulates
more this style on the right hand side
where we have some context that's
associated with a chunk, but also the
chunk itself is a section that is a
logical break in whatever document it's
receiving. And it's designed that way to
keep all of the thoughts together. So
let me show you how it actually does
that. And we're using this next section
which is the agentic chunking section.
And by the way, if you want to access
any of this information or more
resources on Rag, then I'll leave a link
to the community down in the description
where we've got all these templates. You
can just plug and play, mess around with
these here. So, if you've never used the
lang chain code node before, it's
basically just a code node combined with
an LLM chain. So, we can request
specific outputs, but also input a
prompt. So, you'll see this looks fairly
complicated, right? We've got a bunch of
JavaScript code to be executed here all
around taking sections and actually
chunking them into certain sizes, but
also we've input a prompt text. So you
are analyzing a document to find the
best transition point to split into
meaningful sections. So this is the LLM
and this is why it's agentic chunking
because we can actually pass dynamically
a prompt with the context on maximum
chunk size as well as the original text
to analyze into this LLM or lang chain
code node. And what it's going to
basically output for us is a chunk
that's actually logically broken up. And
it's not going to be perfect, but this
is significantly better than our general
recursive character splitting chunking
method because it recognizes that an
entire sentence or an entire paragraph
is one complete idea and aims to keep
that in a single chunk. And I just want
to take a moment here to shout out to
Cole Medine, who I actually learned this
method from. So definitely check out his
channel as well. He's got a ton of great
resources on agentic rag, building out
rag agents, and knows far more than me
on this topic. So, we've just run it
through here, and the langen code node
basically separates those into relevant
chunks and also stores context, which
we'll talk about afterwards and
metadata. And you can see this time
we've got chunks which are more
representative of holistic concepts. So,
for example, we've got the safety
instructions here, which keeps the whole
lot of safety instructions in one chunk,
and we're basically saying, don't exceed
a chunk size of a,000 because it's going
to be hard to search through that, but
also don't put in one that's less than
400. So, the lang chain code just spits
out the chunks and then we're still
using this Postgress vector store to
actually store those chunks with all the
context as well. And the one additional
thing we're doing here as well inside
this default data loader is giving it
certain metadata so that we can come
back and identify information around a
specific document if we've got multiple
documents but also some context which
we'll talk about afterwards. But we're
pulling in here the doc ID which we're
actually just using the doc name from
the original Google Drive file here. And
that's in case when we pull it later,
when we search for it, we need to know
what document it came from or need to
search a specific document. So metadata
stored with the vector is absolutely
critical to improving your rag pipeline.
And you can see some of the examples
that I've used in a real life use case
for internal linking between SEO content
blogs. We've stored things like record
ID, the article title, the blog ID, the
article URL. Once we know the terms that
are relevant to internally linked to,
we're actually going to have to retrieve
things like the article URL to apply to
the anchor text. So, it's really
important to store that metadata as
well. If you're thinking about the 20%
that you need to do is splitting
documents by meaning, and that's using
not the inbuilt traditional chunking
method. It's using agentic chunking
which is just using those LLMs to help
you split the content based on concepts
which helps you get further away from
those fragmented puzzle pieces that you
need to bring together. So your data is
ingested now correctly and it's chunked
by concept which is fantastic. This is a
really good start. But now you've hit
the massive blind spot. The problem
isn't just how you search your vector
database. It's what you're searching
with. You might enter the most perfect
query to retrieve a certain bit of
information, but a user on the front end
is never going to ask a perfect database
ready question. They're probably going
to ask something terrible like, "How do
I fix my broken workflow?" And it's a
terrible search query because it
actually has no specific keywords. And
the vector search has no idea what to do
with the words broken workflow. Even
though it's searching for semantically
relevant terms, it still might struggle
with this. and therefore struggle to
retrieve the actual answer. And the
actual answer might be stored for
example in the database as something
like uh troubleshooting instructions.
And with this search query, you might
never find it unless you get lucky. So
this is where query expansion comes in.
And we're using an AI agent to do this.
And it's a really well-known concept,
but not very well utilized. And it's
basically supercharging with an LLM your
classic query. So, say we input that
query. How do I fix my broken workflow?
What it's going to do is just think
about three hypothetical search queries
that are optimized for a vector database
search focused on keywords and
semantically relevant queries. And it
might come out with something like N
workflow troubleshooting guide, how to
debug N workflow errors, common NAN
workflow execution failures. So, you can
see how those are much more optimized
versus that one query. Plus, we're
actually giving it three chances to go
and find the right context. So, think of
this as like a query rewriter that
actually outputs multiple times. So,
we'll open up the chat window and we'll
do safety instructions. And if we just
hide that window and see what exactly
comes out in this query expansion, it's
general safety instructions guide,
essential safety tips and precautions or
common safety measures and protocols.
So, it's taken my rubbish query and
actually expanded that. So, this is
query expansion. So then query
expansion's fired into our AI agent
which is going to search the vector
database and that's going to come back
with a ton of results. So for example,
we've set in our Postgress vector store
here to return from the lawn mower table
which we're using as an example to
return 25 chunks which is actually a
significant amount. Now we're doing this
three times because we've got three
queries. So, we're returning 75 chunks.
And effectively what we're doing there
is putting the 75 chunks back into our
AI agent, which might sound great, but
actually it's stuffing them all into the
LLM prompt. And you'll know if this is a
problem if you've ever written a long
comprehensive prompt and the prompt
hasn't followed the instructions you
actually want it to. It's because it
often gets confused with all of the
context that it's gone. It doesn't know
what to prioritize. And the same is
going to happen here. It's also a recipe
for high costs because we're putting so
many LLM tokens in every single query we
make. And it's not even going to be able
to find your question because your
question is going to be inserted right
at the very end underneath all the
chunks and the context from those
chunks. Which comes to the second fix
for this, which is called a reranker.
And I only found out about this
recently, but it's incredibly powerful.
And we're using a specialized
lightweight cheap model from Caher. And
you can go to cohhere.com to read more
about their products, but you just need
an API key. You can get an account
completely for free to test this out.
And the way you can imagine this is
imagine you're in a car parking lot and
you're comparing two cars that are next
to each other. Cosign similarity is
basically saying if those cars are
facing the same way, they're similar or
they are the same car nearly. And if
they're facing opposite ways, they're
not similar, which we know is not a very
accurate way to perceive things. But
reranking is like having a friend there
who listens to the kind of car that you
want, i.e. our original query, and then
looks at all of the cars and finds the
one that matches your words and needs
the best, not just one that's facing the
same direction. So, it basically does
one job extremely well. It takes your
large sets of results and reorders them.
So you pass in the 25 chunks or the 75
in this case and it returns only the top
n or top four in this case using its
lightweight model. So it's basically
saying out of the 25 chunks I've
received these are the ones that you
should pass into your LLM context and it
means that your prompt becomes much
cleaner because actually the context
that's being passed back in every time
are just four chunks and not 25. So it
can actually find your query under all
of that information. So just using a
reranker is one of the most powerful
things you can do to improve your rag
responses because it's giving the LLM
only the most signal rich information to
give the right answer. And then there's
one more thing to be said about giving
your LLM the full story. And this is the
final and most important step that ties
everything together. After your
re-ranker identifies the single best
chunk or the four best chunks that are
most likely to give a good answer, the
biggest mistake you can make at that
point is actually just sending that
chunk to the LLM. Because if you
remember our diagram earlier, we might
have a great chunk because we've
actually put as much context into that
chunk as possible, but we still don't
have that chunk context that surrounds
it and comes with it. and therefore we
don't have the full story to make an
informed decision or our agent doesn't
have the full story because it doesn't
have all the context surrounding that.
So you can actually use a few different
strategies for this, right? And it's
called context expansion. So there are
multiple ways you can do this, right?
One's neighbor expansion, which is
pulling before and after. The other is
full document ingestion. Now with the
types of documents that we're using,
they are 30 plus page documents. often
we're not going to pull the full context
of the document inside there because
we're actually going to spend a lot more
money on the LLM token usage by doing
so. So, we're going to opt for the
neighbor expansion technique, which will
be really good in most use cases, which
is actually just giving it enough
context so that it can review the
information that just came before it,
the information that came after it. Now,
you can overengineer and make this
really specific to pull exact sections,
but what we're going to do is just try
and do the 20% that gives us the 80% of
the results, right? So, what we're
actually doing and what we didn't show
you earlier is inside this langen code
node, we've extended the logic here to
actually output things like the chunk
size, the chunk number, the content that
comes after it, i.e. the chunk that
comes after it, and the chunk that comes
before it. But not only that, some
additional summary information around
that context. So that when we actually
output any chunk inside here, we can see
all of this information in the JSON. So
it's got no chunk before because this
was the first chunk we pulled, chunk
number one. But after it says may result
in electric shock, fire, and or serious
injury, which we can see was the chunk
from afterwards. And then in the
summary, it's also given us this
additional context. Read all safety
instructions and warnings before
assembling and using your petrol lawn
mower. So that is what this text chunk
is about. Basically, it's taken that
text chunk and it's used an LLM to say
summarize what this is about and apply
that as context. And what that means is
when we upload or ingest our data in the
earlier stages, we've actually uploaded
all this additional context in our
metadata. the chunk number, the context
before, the context after, and the
context summary. So that when we do
actually come to retrieve the
information, we not only have the
semantic search from the vectors that
pulls back the most relevant
information. Then we rank those chunks
using the re-ranker. We apply one final
layer which is actually bringing in all
the additional context from before and
after that chunk to give the LLM that
we're using the best chance to give us
the most precise answer with the
information it's got. And we can see
that in action when we go back to the
chat window. We run our query for safety
instructions through and actually it's
then able to apply all those three
layers to find the right information
from the document and actually return
that to us. And we can see here here are
the consolidated safety instructions
extracted from the provided manual text.
Follow these every time you assemble,
operate, transport, maintain or store
the lawn mower. And it's got a bunch of
information that it's pulled exactly
from that query. And that's a
combination of everything we've done so
far in the ingestion phase, the query
expansion, the reranking, and then
actually consolidating all of that
context to give us the best chance of
finding the right context to give us an
accurate answer. So hopefully you can
see how powerful that is and you've
completed now the three-step blueprint
for our advanced rag agent. So you've
seen all these distinct components and
now they need to be connected into that
robust pipeline for your rag agent. And
as you know this is exactly the kind of
challenge that workflow automation is
designed to solve. And that's why we're
using nan. But before you can assemble
complex systems like this you need to be
fluent in the core building blocks that
you'll be using. And that's why in this
next video, we're going to focus on the
8020 and I'm going to show you the 13
most essential nodes within NAN that
you'll need to master for building
advanced AI systems like the one we just
designed.
🚀 Grow your business with AI & Automation: https://skool.com/scrapes 💻 14-day free trial with n8n: https://n8n.partnerlinks.io/scrapesai Stop struggling with hallucinations and "silent failures" in your n8n workflow. This n8n rag tutorial reveals the 80/20 of building a professional rag agent in n8n, moving beyond basic text splitters to master n8n ai agents for beginners. You’ll learn how to use ai nodes n8n for agentic chunking, implement supabase n8n rag for better data storage, and use query expansion to supercharge your n8n automation workflow. Whether you're exploring n8n nodes explained or looking for a complete n8n ai masterclass, this guide covers everything from Llama Parse ingestion to Cohere reranking to ensure your no code workflows deliver precise, context-aware answers every time. 00:00 - The Hidden Flaws in Your RAG Agent 00:25 - 1: Ingest & Chunk Data Correctly 06:12 - 2: Find The Right Information 12:56 - 3: Give The LLM The Full Picture 18:42 - Bonus - Context Expansion ABOUT THE CHANNEL Hey there, welcome to the channel! I love helping business owners build AI agents and automation systems that actually work. Over 100,000 people have learned AI & Automation through my courses, and I keep things focused on what you can use today - no fluff, just practical implementation. Whether you're automating your own business or helping others do the same, glad you're here. #n8nrag #ragmaster #n8ntutorial