Loading video player...
Retrieval augmented generation is the
go-to way for giving your AI agents
access to your knowledge base. And over
the past few months, I've been
experimenting with every rag strategy
under the sun and combining the best
together into a single N8N agent
template for you. And I've been evolving
this for a long time now, starting with
super basic rag as an introduction all
the way to what we have now with
multiple rag strategies combined
together. I've been making some big
upgrades to this template that I want to
show you today and I want to show you
how you can use this template for
yourself. Now, the reason I'm putting so
much effort into these different
strategies in the first place is a basic
rag implementation is just not enough.
If you aren't strategic about the rag
strategies that you use, it's just going
to seem like rag is fundamentally
flawed. So, what do I mean by that?
Well, I have the very first version of
my NANN rag template so that I can
explain the fundamental flaws that we
have with a very basic implementation.
Now, let me be clear. This template is a
good introduction to Rag, but there is a
reason I've been evolving it so much
over the last year. So, every RG
implementation has two components. We
have our rag pipeline where we convert
documents from our document store into
the format that we're going to store in
our knowledge base. So, we chunk things
up into bite-sized pieces for our LLM.
And then we give tools to our agent.
This is the second component to search
through our knowledge base. And so, I'll
just give you an example here. I'll ask
it a question where it's going to have
to search through the documents that
originate in my Google Drive. Going
through the knowledge base to find the
information to answer my question. Now,
both the rag pipeline and Asian Tools,
there is a lot of risk here that they
miss key context and related
information. When we do these very
direct searches, a lot of times we're
just not getting what we need from the
knowledge base. If you're not very
strategic about how you are chunking and
curating the knowledge for your
knowledge base, it doesn't matter how
effective your agentic search strategies
are, your agent will fall apart. And the
same thing applies if we're curating our
knowledge really well, but then we don't
have effective search strategies. And so
we want to solve for all of this. And
this is what I have for you in the
latest version of the N8N rag template.
So this is what has evolved to now. It
looks a lot more complex, but I'll break
it down in this video. We have
strategies for effective knowledge
creation and effective knowledgebased
search. And also, I want to hear from
you what other strategies you want me to
add into this template as I continue to
evolve it. I could add in knowledge
graphs or contextual embedding. So many
different ways to build on top of this.
Let me know in the comments what you
think. So, I just wanted to say that
really quick, but yeah, let's get into
it now. And really, these rag strategies
are going to be super helpful for you no
matter the rag agent you want to build.
So, you can use this template as a
starting point. I'll speak to how you
can do that and adapt it to your use
case throughout this video as well. So,
here are the three big strategies that
we're going to be covering in this video
that I've added to the template. First
of all, we have a gentic chunking. Then,
I want to dive into a gentic rag. And
then, finally, we will cover re-ranking.
Now, a gentic rag is something that I
added in the last evolution of this
template. So, check out this video right
here if you really want to dive deep
into that. But what I've added for the
first time now is a gentic chunking and
reranking. And man, do these strategies
make a big difference. And so these are
the huge upgrades that I'm talking
about. Really excited to be covering
this with you right now. Now the first
rag strategy that I want to dive into
with you that I added is a gentic
chunking. And this one is really
powerful because what we're doing is
leveraging the intelligence of a large
language model to help us determine how
to chunk our documents. In a more
traditional rag implementation, like in
this template right here, we're using a
more deterministic approach where we're
splitting documents every 1,000
characters or 400 characters. And the
problem with this is we're now splitting
ideas in our document between different
chunks that we'd want to actually have
grouped together. And it even goes so
far as to split in the middle of words
and sentences with a very basic
implementation. And we can get a little
bit more elaborate and try to respect
sentence and paragraph boundaries, but
you're still going to be splitting in
the middle of ideas that you want to
have grouped together. And so this just
leads to a lot of context loss when we
search through our knowledge base and
we're finding these chunks. Well, maybe
what we have in this chunk, we actually
want to have part of this chunk to
really have that complete thought to
answer the user's question. And so with
a gentic chunking, this is the power
here is we're giving the document to the
large language model. We're telling it
like based on the need that we have to
keep ideas together, how should we split
up this document? And so now going into
my database. So this template uses
Postgress. I've actually evolved it. So
you can use any Postgress database like
Superbase or self-hosted Postgress. I'm
using Neon here for my database. We can
see that with a gentic chunking. If I
click into any of the chunks that we
have here, we can see that we start with
the beginning of a sentence and we go
all the way through all these bullet
points. So, like we're keeping this full
idea, like this list of bullet points
together. And you'll see that same kind
of thing as I go through the different
chunks here. We're not splitting in the
middle of words or sentences that often.
That's what we want to preserve here.
Now, we do have to chunk documents. Like
we can't just store every single
document as a single record here because
that's way too much to pull into the
large language model and it's going to
make the embeddings really inaccurate.
So we need to have these bite-sized
pieces of information. But as much as we
can, we want to keep the concrete ideas
together splitting the document with the
help of a large language model. That's
what agentic chunking gives us. And the
rest of this rag pipeline is really the
same as the previous iteration of this
template. So again, check out the Aentic
rag video that I linked to earlier if
you want to dive more into this. So I'm
keeping things concise here focusing on
the newer strategies as well. But just
for a quick recap, we have our document
store, which in my case is Google Drive.
You can easily update this though to be
something like Dropbox or SharePoint
instead. So we're watching for files
that are created or updated. And if a
file is updated, we also want to delete
the old rows so we can insert just the
new ones. And then we download the file
from Google Drive. And then based on the
file type, we have these different nodes
in N8N that extract the text from that
file format. So we even support tabular
data with Excel and CSV files. I'll get
into that a bit more when I talk about
agentic rag. And then we have our
textbased files. So Google Docs, text
documents, markdown documents, PDFs. We
extract the text from there. And then
for these types of documents, this is
where we feed into our agentic chunking
system. And because this is a lot more
of a custom implementation, there's not
really a way to do this without code.
But this is where the beauty of the lang
chain code node comes in. So we can
attach a large language model that we
can use with the lang chain library
right in nadm. This is a thing of beauty
and I actually use cloud code to help me
build this out. So basically the
important things here is we have this
prompt that we feed into our large
language model that describes its role
and the instructions for how we want it
to intelligently split documents to keep
the core ideas together. And you can see
here that like based on some of the
splits it's not ideal but for the most
part it very much is starting the chunks
and ending them with a key idea kept
together. That's what we're trying to
accomplish here. And this fixes a lot of
problems that we have with rag where we
just have so much fragmented context
between the different chunks that we
have for our document. So I send this
prompt into the LLM and basically the
LLM is going to output a word to split
two chunks on. That's how we create
these chunks over time. And all of this
is going to be stored in our vector
database. And so you could very much
swap this out to use another vector
database like Quadrant or Pine Cone if
you want. Generally I love using
Postgress. So I also evolved this
template to work not just with Superbase
like I had before but also any Postgress
database. Right now I am using Neon
which I have been loving recently. It's
blazing fast serverless Postgress free
to get started and I'll show you really
quick. If you go into the dashboard,
click on the connect button. This gives
you all the information in the
connection string here for you to
connect with your Postgress credentials
in N8N. So, super easy to get started.
And really, that's everything for a
gentic chunking. The last thing that I
want to say here is that the other
beautiful part of a gentic chunking is
how flexible it is. Because you can
tweak this prompt based on your use case
to get really specific with how you want
the LLM to split your documents. And so
I've tried a lot of other strategies for
chunking like recursive character text
splitting, working with markdown
documents and there are different
subsections. I tried semantic chunking
which is actually using embedding models
instead of large language models to
determine these boundaries. But man,
agentic chunking is just the most
flexible and the most powerful. So I
absolutely love it and that is really
the big thing that improves our
knowledge curation, our rag pipeline for
this template. The sponsor of today's
video is depot and their brand new
remote agent sandboxes for cloud code.
I'll get into that in a little bit. But
what depot has built is insanely fast
globally distributed cloud
infrastructure and persistent caching to
make for extremely fast application
builds. Things like remote container
builds and GitHub action runners. And
companies that have switched to depot
have gotten up to 55 times the
performance increase for their builds.
And there are a ton of integrations with
things like GitHub actions, CircleCI,
Docker, and GitLab. And depot's cloud
infrastructure has positioned them
perfectly to build what I've always
wanted for cloud code, remote agent
sandboxes. And now they are here. It is
a beautiful thing. Basically, what we
can do is kick off a ton of remote Cloud
Code sessions in parallel, all working
on different features and issues in our
GitHub repositories, and it's never
running on our own machine. It's all
running on Depot's infrastructure and
getting started is super easy. So, I'll
link to this quick start in the
description as well. You just have to
follow these steps to install the depot
CLI, get your anthropic credentials
connected and then we just operate in
the terminal just like we do with cloud
code. So, for example, I am asking it to
summarize the archon repository linking
to my repo. This is kicking off a
session which we can view all of our
sessions in the dashboard here. I can
click into this and we can see the logs
just like we're working with cloud code
on our machine, but this is all running
remotely. We could even kick this off
from a mobile device if we want. The
possibilities are endless for this and
the power that it gives us. So, if
you've always wanted to be a cloud code
mastermind, kicking off a ton of remote
sessions in parallel from anywhere,
definitely check out. I'll have a link
in the description. Now, the next rag
strategy that I want to hit on is a
gentic rag. This one is a gamecher and
the implementation can differ a lot
depending on your use case. But the
general idea that I convey with this
template is you want to give your agent
the ability to explore your knowledge in
different ways depending on what works
best for the specific document and user
question. And the way that the agent
determines where to look is just all
based on the system prompt. So you give
it the ability to search in different
ways and then in the system prompt you
describe what that exactly looks like
and you can tune this to your use case.
So we want to be flexible here. That's
kind of the theme for a lot of these
strategies. So going back to our
original template, there are two reasons
why it is inflexible. First of all,
we're only giving our agent a single
tool to search our knowledge base and
we're handling all file types in our rag
pipeline the exact same way. But it
works a lot better if we actually treat
each of the different file types with
respect. Like going back to the most
up-to-date rag pipeline, for example,
we're handling tabular data in a much
different way where we're storing the
records individually. Take a look at
this. If I go back to my Postgress
database in Neon, we have this special
table called document rows specifically
to store the rows for the tables that we
are ingesting from Google Drive. like I
have this uh fake mock data generated
for a revenue spreadsheet here. We're
storing each one of these records as an
individual row in this document rows
table. And the really cool thing with
this, and again I dive into this a lot
more in my agentic rag specific video is
we're giving a tool for our agents
specifically allowing it to generate SQL
queries to calculate things like sums
and averages and maximums over this
tabular data. That's the kind of thing
that rag normally completely falls on
its face with. So, we can't just search
tabular data just like we search text
data. We can't ingest it the same way.
And this can apply to a lot of different
file types as well. So, I hope you're
starting to see the idea here, the
flexibility that we're adding with a
gentic rag. Another really good example
is sometimes if your documents are short
enough to fit into the context for the
LLM comfortably, you actually want to
view the entire document instead of just
viewing a couple of chunks. And so
that's another thing that we're doing
here within the most up-to-date rag
pipeline. We're inserting what's called
the document metadata. So essentially we
have this separate table here where
instead of storing all of the chunks
split up we just have one record per
document and then if our agent decides
like oh this chunk is useful let me view
the entire document it can actually do
that. So we give it one tool to list all
the documents that are available in the
knowledge base. So, it's going to query
this document metadata table and then
when it finds a document like, oh, I
should look in the research brief here.
It can actually combine all of the
chunks together to grab the complete
document to look at it with a much more
holistic picture. So, we have this tool
as well, specifically to get all the
file content for a specific file just
based on the file ID that we have coming
in from Google Drive. So, now I'll
actually give you a couple of demos here
so we can see a gentic rag in action.
And so I'll open up the chat and I'll
say what is the average revenue uh in
August of 2024. And so this is going to
query the sheet that we have right here.
And so I'm going to go back. We can see
that it first listed the documents that
we have available to us. So it's
querying the metadata. It's seeing that
we have these specific columns here so
that it can write that SQL query to
query the rows that we have in document
rows. So here is our query that it
created. We got back the answer of
309.5,
which is what it gave us. And that is
looking really good. If I actually go in
the sheet here and do an average of B2
to B. Sure enough, we've got 309.5. So,
this is looking great. And then I can
ask another question here. Like, let's
just pull up the marketing strategy
meeting. So, I'll pull this up and I'll
just ask it to view the entire marketing
strategy meeting doc and give me a
summary. So, I'm just being explicit
here that I wanted to use the agentic
rag tools to get the full contents of a
file, which is what it did right here
for the marketing strategy meeting. And
then we'll go back to our chat. And yep,
here is our summary based on this
document right here. So, sometimes you
want to do that kind of thing where you
want to pull the entire document when
it's short enough like this to do
something like a full summary that we
wouldn't necessarily be able to do if we
just returned a couple of chunks with a
classic semantic search rag lookup. So
that is everything for a gentic rag and
it's pretty neat like with a gentic
chunking it was just touching the rag
pipeline. Then with a gentic rag we are
hitting on both the rag pipeline and the
agent search. And now last but not least
we have re-ranking which is specific to
the search part of our rag system. So
what even is a reranker? Well, you can
think of it as a special kind of model.
Not a large language model, but a
re-ranker model where its sole job is to
take in a massive amount of chunks from
our knowledge base from our search. Like
in this case, I have it set to 25 and
it's going to rerank and filter those
out only returning the top four. And you
can adjust these numbers based on your
use case as well. Now, the reason this
is so powerful is if we were to return
25 chunks straight to the large language
model, it would completely overwhelm it.
It's going to make our agent more
expensive, a lot slower, and 25 chunks
leads to a very high risk of
hallucinating because there's just so
much information coming in. But
re-rankers are designed to handle this
much information, and they're a lot
cheaper and faster. So, they don't have
general intelligence like a large
language model. can't build agents
around re-rankers, but they're made to
take in all these chunks and do that
kind of reranking and filtering. And so,
going back to the very first version of
our template with Rag, we're only ever
picking four chunks at most from our
knowledge base, but that's pretty
limiting. What if we want to actually
deal with dozens of chunks and then just
pick the top ones? Well, that's what
re-ranking allows us to do. So, I'll
give you an example here. I'll open up
the chat and I'll say, "Give me an
overview of Neuroverse." This is just
one of the uh fake companies that I had
claw generate for mock data here. So,
we're using the semantic similarity
search tool. That's the classic rag
search that we've had throughout all the
iterations of our template here. And if
I click into this, we can see that we're
only returning four chunks even though
our limit is 25 because take a look at
this. If I click into the re-ranker
model and we look at the input that came
in from our knowledge base, there is a
lot of information here. We are taking
25 chunks. Take a look at this. It's a
zero index. So this means there are 25
chunks in total. And then we are
spitting out the top four after we use
this reranker to figure out the ones
that are the most relevant based on our
question of give me an overview of
neuroversee. Really really nice. And so
the large language model if we take a
look at the prompt that came in. This is
the full prompt to the large language
model. It's not that big overall. We
have one chunk, two chunks, three
chunks, four chunks as a part of our
prompt. So that's an added context for
retrieval augments generation. This is
much better than if we actually gave 25
chunks, but we still got to sift through
that many using the reanker to deal with
all that information before it goes to
our LLM. So that's re-ranking and
there's only one option for a reranker
right now in N using the cohhere models.
But cohhere is actually fantastic. So
you just have to go to cohhere.com. you
can create a free API key. I'm just on
the free tier right now. So, I'm not
paying anything for this, just like I'm
not with Neon. So, that's how you can
add re-ranking into your N8N agents. So,
that is everything that I have for this
massively upgraded version of the N8N
rag agent template. And I hope that I
have covered really well how you can
adapt this to your specific use case,
picking and choosing some of the rag
strategies, adjusting certain
parameters, even adding in your own
strategies. And there's a lot more that
I could cover as well. So definitely let
me know in the comments if you want me
to cover knowledge graphs in this or
contextual embedding. So many other
strategies that I could include as well.
And rag is just so important for AI. So
I'm always down to cover more with my
content. So if you appreciated this and
you're looking forward to more things AI
agents and rag, I would appreciate a
like and a subscribe. And with that, I
will see you in the next
RAG is the go-to solution for giving AI agents access to your knowledge base, but traditional RAG has major limitations - it misses context by splitting key ideas/concepts, can't connect information across documents, and lacks dynamic analysis capabilities. This ultimate n8n RAG template solves these issues using three key strategies: - Agentic RAG - Gives the agent intelligence to reason about how to explore your knowledge base (view full documents, semantic search, or query structured data based on the specific question) - Reranking - Expands initial search results then uses a specialized model to return only the most relevant information - Agentic Chunking - Uses an LLM to intelligently determine document split boundaries, preserving important context that traditional chunking destroys The result is a RAG agent that actually works the way you'd want it to - understanding your documents holistically, connecting related information, and giving you the comprehensive answers you're looking for. ~~~~~~~~~~~~~~~~~~~~~~~~~ - Try Depot today for blazing fast cloud infrastructure for application builds: https://fandf.co/4247Qys - Give the remote agent sandboxes a try for free: https://depot.dev/docs/agents/claude-code/quickstart ~~~~~~~~~~~~~~~~~~~~~~~~~ - If you're looking to join a community for early AI adopters to master AI Agents & RAG and transform your career or business, check out Dynamous: https://dynamous.ai - Neon (serverless Postgres I used in this video): https://get.neon.com/NsowhUM - Newest n8n RAG AI Agent Template: https://github.com/coleam00/ottomator-agents/tree/main/ultimate-n8n-rag-agent NOTE: The Langchain code node used in this template is only available when you self-host n8n. I found this out after posting this unfortunately since I do self-host my own n8n! ~~~~~~~~~~~~~~~~~~~~~~~~~ 00:00 - Introducing the Ultimate n8n RAG Agent Template (V4!) 00:48 - The Flaws with Traditional (Basic) RAG 02:08 - The Evolution of Our RAG Agent Template 02:54 - Our Three RAG Strategies 03:28 - RAG Strategy #1 - Agentic Chunking 09:14 - Depot (Remote Agent Sandboxes with Claude Code) 10:56 - RAG Strategy #2 - Agentic RAG 14:05 - Agentic RAG in Action 15:39 - RAG Strategy #3 - Reranking 18:37 - Final Thoughts ~~~~~~~~~~~~~~~~~~~~~~~~~ Join me as I push the limits of what is possible with AI. I'll be uploading videos every week - Wednesdays at 7:00 PM CDT!