Master RAG Workflows in N8N By Learning These 4 Nodes | DailyDevLists

Loading video player...

Full Transcript

4,128 words • EN

Rag agents are incredibly useful, but

only when you set them up correctly. Do

this wrong and you'll get mistakes,

hallucinations, and bad responses.

That's why I've put together this video,

which will show you the 80/20 of what

you need to know to use rag agents in

the right way inside NAN. Let's get

straight into it. So, the floor is

actually in traditional rag itself. It's

designed to make decisions with

incomplete information. So imagine

you're feeding your rag pipeline a tech

manual like these assembly instructions.

And a standard rag process might take

this sentence here, mark the points on

the wall where the holes are to be

drilled for fastening the base unit. And

actually when it's storing it, it might

just take that sentence and separate it

into multiple chunks that are stored

separately with no context. So the first

chunk might say mark the points on the

wall and then the rest of the sentence

might be segmented into a separate

chunk. So this process of chunking data

starts by shredding your documents that

you upload into these isolated

meaningless chunks of text. And then

when your vector search goes to actually

try and find it, all it's got is these

fragmented pieces to look for. But it

has no idea what the rest of the context

should be around it. So it might find

the chunk about marking the wall but

have zero context about why we're

marking the wall and where we're marking

the wall anyway. And this is really

difficult to spot right because this is

actually a silent failure. It seems to

work when you're trying it with simple

queries. So if you've tried rag on

simple things, it often works. But then

when you're uploading complex documents

or significant amount of data, it just

completely breaks down and destroys

anyone's trust in the system. And the

result is ultimately like fragmented

context, poor retrieval, and answers

that are sounding confident but

completely incorrect, which is worse

than it just returning something and

saying it can't find the right context.

So this chain of failure starts right at

the beginning of your rag pipeline in

how you ingest your data because if you

put rubbish into the system, you're

obviously going to get rubbish out as

well. So most workflows I've seen use

this basic text splitter, extract from

file. So, I'll show you an example of

the Netflix earnings report, which is

this 84page document that has multiple

headings and lots of hierarchy of

information. We've got bullet points in

there. Later on in the document, we've

got tables here, graphs with context and

more formats. Basically, a hierarchy of

information. And if you put it through

this standard extraction technique or

the traditional ingestion, extract from

file, this is exactly what happens. So,

we're going to download that Netflix

earnings report from Google Drive, put

it through the extract from file, and

what we'll be able to see is one giant

blob of text that ignores all hierarchy.

So, normally if we're able to extract it

to something like markdown, we be able

to see in the formatting what is a

header, what's a subheader, but in this

giant blob of text, we actually lose all

of that context. And we also don't

always work with just PDFs, right? We're

working with images, we're working with

Excel docs, we basically want to work

with any format. So instead, we're going

to use this improved ingestion which is

using a dedicated passing tool like

Llama Pass, which is all about

transforming messy documents into AI

ready data at scale. And that's really

important because what we're doing is

effectively taking unstructured data or

structured data in PDF or image format

and actually outputting the results in a

format that's ready to be ingested and

contain some context about the structure

in which it had in the first place. So

if we were for example to connect this

instead to our Llama cloud account and

all we've got here are a few different

HTTP requests, one to the upload

document here. But the key thing here is

it's super simple because we don't even

have to specify the input file type.

Llama pass or Llama cloud is going to be

able to determine that for us. And this

is all available on the free plan. You

just need to enter your API key as a

header or in here. So what we're

basically doing is uploading the

document to this passing endpoint and

we're going to wait and then get the job

status and pull that until we can

actually get the export itself. So we're

pulling the job ID in there. And then

once we get the export, you're able to

see all of the actual formatting

structure applied in here. So whenever

we've got any tables, we've actually got

this markdown table format, which it

might seem like it actually is just a

text blob to you as well, but this is

now AI ready for ingestion because

actually an LLM can read this structure

and understand what fields or what

columns and what data sits within each

column in a table, for example. And the

same with any images, it digests those

images and breaks out the structure and

text from the images. So we're able to

retain the structure of the document

including things like key section

headers and subsection headers and

bullet points etc as well which is

ultimately super important because it

retains all the context for when we

actually go to try and search that data

later. So now the data is ingested

correctly with its original context and

you can see some examples of what Llama

pass works with tech documentation,

insurance claim, papers, healthcare

forms, invoices, PDFs and we didn't even

have to specify in our upload what

document type. It automatically

determined that and therefore use the

correct passer. So this is the simplest

way to do it. You can of course use

other passers. However, this is the only

one I've seen where you don't have to

specify the document type. So, it really

fits the 80/20 here. And we'll run

through another example as well quickly

with some lawn mower instruction manual.

And you can see that this has a huge

content hierarchy where we've got

multiple pages, multiple headings and

diagrams, etc. as well. And you can see

in the outputs here that we've got the

different hashtags for the different

levels of header and we've got all of

the information that's been ingested

here including tables and different

steps there on instruction pages as

well. So that was actually very simple,

right? You now have clean structured

markdown that preserves that document

hierarchy, but you still have to split

that document into chunks to put it into

your database. and doing it wrong will

just reintroduce that same problem where

you have those fragmented vectors with

no actual context to go with them. So

when we're trying to retrieve them

later, it's still going to make no sense

and it will pull out incorrect

information. And standard chunking

methods actually just make this problem

worse to be honest because they don't at

all split documents by meaning. They

just arbitrarily split documents by

number of characters. So if we to pin

all this data here and we feed that into

our traditional chunking, what we've got

is basically we're defining the markdown

content that's being fed in which is our

instruction manuals for the lawn mower

and then we are using this Postgress

vector store connecting to a superbase

vector database and we're inserting

those documents into a table named

documents_pg

test and the benefit of using this

Postgress node instead of the superbase

node directly is that we don't have to

use it with superbase. We can use it

with any vector database that's

Postgress there and it will

automatically create a table with this

table name. So we don't have to go and

set it up in superbase for example. But

we do need to connect to our

credentials. And what we're basically

saying is embed or separate that context

that's been retrieved in into our

individual chunks. So if we go into the

results of this, we've effectively got

all these different chunks. So, it's

chunked from lines 753 to 834

and it's the instruction manual. And

this one does end on the end of a

sentence. But if we keep reading down

these, this one's cut out some random

bullet points from the instruction

manual. This one's a really short

section. While mowing, always wear

suitable footwear and long trousers. So,

if we're trying to retrieve this, if

we're asking a specific question, it's

going to pull out content that might not

be relevant at all. And it's also

missing all of the previous context. So

it basically means that all the core

ideas that are in that context are being

artificially separated just by some

arbitrary character count. Which leads

us to this kind of scenario where chunk

one could depend on chunk three or vice

versa. But actually they have no

context. So the better approach is a

gentic chunking. Instead of counting

characters, it looks for logical breaks

like our paragraphs, our sections or

complete concepts. And it's designed to

keep those thoughts completely together.

So we can have something that emulates

more this style on the right hand side

where we have some context that's

associated with a chunk, but also the

chunk itself is a section that is a

logical break in whatever document it's

receiving. And it's designed that way to

keep all of the thoughts together. So

let me show you how it actually does

that. And we're using this next section

which is the agentic chunking section.

And by the way, if you want to access

any of this information or more

resources on Rag, then I'll leave a link

to the community down in the description

where we've got all these templates. You

can just plug and play, mess around with

these here. So, if you've never used the

lang chain code node before, it's

basically just a code node combined with

an LLM chain. So, we can request

specific outputs, but also input a

prompt. So, you'll see this looks fairly

complicated, right? We've got a bunch of

JavaScript code to be executed here all

around taking sections and actually

chunking them into certain sizes, but

also we've input a prompt text. So you

are analyzing a document to find the

best transition point to split into

meaningful sections. So this is the LLM

and this is why it's agentic chunking

because we can actually pass dynamically

a prompt with the context on maximum

chunk size as well as the original text

to analyze into this LLM or lang chain

code node. And what it's going to

basically output for us is a chunk

that's actually logically broken up. And

it's not going to be perfect, but this

is significantly better than our general

recursive character splitting chunking

method because it recognizes that an

entire sentence or an entire paragraph

is one complete idea and aims to keep

that in a single chunk. And I just want

to take a moment here to shout out to

Cole Medine, who I actually learned this

method from. So definitely check out his

channel as well. He's got a ton of great

resources on agentic rag, building out

rag agents, and knows far more than me

on this topic. So, we've just run it

through here, and the langen code node

basically separates those into relevant

chunks and also stores context, which

we'll talk about afterwards and

metadata. And you can see this time

we've got chunks which are more

representative of holistic concepts. So,

for example, we've got the safety

instructions here, which keeps the whole

lot of safety instructions in one chunk,

and we're basically saying, don't exceed

a chunk size of a,000 because it's going

to be hard to search through that, but

also don't put in one that's less than

400. So, the lang chain code just spits

out the chunks and then we're still

using this Postgress vector store to

actually store those chunks with all the

context as well. And the one additional

thing we're doing here as well inside

this default data loader is giving it

certain metadata so that we can come

back and identify information around a

specific document if we've got multiple

documents but also some context which

we'll talk about afterwards. But we're

pulling in here the doc ID which we're

actually just using the doc name from

the original Google Drive file here. And

that's in case when we pull it later,

when we search for it, we need to know

what document it came from or need to

search a specific document. So metadata

stored with the vector is absolutely

critical to improving your rag pipeline.

And you can see some of the examples

that I've used in a real life use case

for internal linking between SEO content

blogs. We've stored things like record

ID, the article title, the blog ID, the

article URL. Once we know the terms that

are relevant to internally linked to,

we're actually going to have to retrieve

things like the article URL to apply to

the anchor text. So, it's really

important to store that metadata as

well. If you're thinking about the 20%

that you need to do is splitting

documents by meaning, and that's using

not the inbuilt traditional chunking

method. It's using agentic chunking

which is just using those LLMs to help

you split the content based on concepts

which helps you get further away from

those fragmented puzzle pieces that you

need to bring together. So your data is

ingested now correctly and it's chunked

by concept which is fantastic. This is a

really good start. But now you've hit

the massive blind spot. The problem

isn't just how you search your vector

database. It's what you're searching

with. You might enter the most perfect

query to retrieve a certain bit of

information, but a user on the front end

is never going to ask a perfect database

ready question. They're probably going

to ask something terrible like, "How do

I fix my broken workflow?" And it's a

terrible search query because it

actually has no specific keywords. And

the vector search has no idea what to do

with the words broken workflow. Even

though it's searching for semantically

relevant terms, it still might struggle

with this. and therefore struggle to

retrieve the actual answer. And the

actual answer might be stored for

example in the database as something

like uh troubleshooting instructions.

And with this search query, you might

never find it unless you get lucky. So

this is where query expansion comes in.

And we're using an AI agent to do this.

And it's a really well-known concept,

but not very well utilized. And it's

basically supercharging with an LLM your

classic query. So, say we input that

query. How do I fix my broken workflow?

What it's going to do is just think

about three hypothetical search queries

that are optimized for a vector database

search focused on keywords and

semantically relevant queries. And it

might come out with something like N

workflow troubleshooting guide, how to

debug N workflow errors, common NAN

workflow execution failures. So, you can

see how those are much more optimized

versus that one query. Plus, we're

actually giving it three chances to go

and find the right context. So, think of

this as like a query rewriter that

actually outputs multiple times. So,

we'll open up the chat window and we'll

do safety instructions. And if we just

hide that window and see what exactly

comes out in this query expansion, it's

general safety instructions guide,

essential safety tips and precautions or

common safety measures and protocols.

So, it's taken my rubbish query and

actually expanded that. So, this is

query expansion. So then query

expansion's fired into our AI agent

which is going to search the vector

database and that's going to come back

with a ton of results. So for example,

we've set in our Postgress vector store

here to return from the lawn mower table

which we're using as an example to

return 25 chunks which is actually a

significant amount. Now we're doing this

three times because we've got three

queries. So, we're returning 75 chunks.

And effectively what we're doing there

is putting the 75 chunks back into our

AI agent, which might sound great, but

actually it's stuffing them all into the

LLM prompt. And you'll know if this is a

problem if you've ever written a long

comprehensive prompt and the prompt

hasn't followed the instructions you

actually want it to. It's because it

often gets confused with all of the

context that it's gone. It doesn't know

what to prioritize. And the same is

going to happen here. It's also a recipe

for high costs because we're putting so

many LLM tokens in every single query we

make. And it's not even going to be able

to find your question because your

question is going to be inserted right

at the very end underneath all the

chunks and the context from those

chunks. Which comes to the second fix

for this, which is called a reranker.

And I only found out about this

recently, but it's incredibly powerful.

And we're using a specialized

lightweight cheap model from Caher. And

you can go to cohhere.com to read more

about their products, but you just need

an API key. You can get an account

completely for free to test this out.

And the way you can imagine this is

imagine you're in a car parking lot and

you're comparing two cars that are next

to each other. Cosign similarity is

basically saying if those cars are

facing the same way, they're similar or

they are the same car nearly. And if

they're facing opposite ways, they're

not similar, which we know is not a very

accurate way to perceive things. But

reranking is like having a friend there

who listens to the kind of car that you

want, i.e. our original query, and then

looks at all of the cars and finds the

one that matches your words and needs

the best, not just one that's facing the

same direction. So, it basically does

one job extremely well. It takes your

large sets of results and reorders them.

So you pass in the 25 chunks or the 75

in this case and it returns only the top

n or top four in this case using its

lightweight model. So it's basically

saying out of the 25 chunks I've

received these are the ones that you

should pass into your LLM context and it

means that your prompt becomes much

cleaner because actually the context

that's being passed back in every time

are just four chunks and not 25. So it

can actually find your query under all

of that information. So just using a

reranker is one of the most powerful

things you can do to improve your rag

responses because it's giving the LLM

only the most signal rich information to

give the right answer. And then there's

one more thing to be said about giving

your LLM the full story. And this is the

final and most important step that ties

everything together. After your

re-ranker identifies the single best

chunk or the four best chunks that are

most likely to give a good answer, the

biggest mistake you can make at that

point is actually just sending that

chunk to the LLM. Because if you

remember our diagram earlier, we might

have a great chunk because we've

actually put as much context into that

chunk as possible, but we still don't

have that chunk context that surrounds

it and comes with it. and therefore we

don't have the full story to make an

informed decision or our agent doesn't

have the full story because it doesn't

have all the context surrounding that.

So you can actually use a few different

strategies for this, right? And it's

called context expansion. So there are

multiple ways you can do this, right?

One's neighbor expansion, which is

pulling before and after. The other is

full document ingestion. Now with the

types of documents that we're using,

they are 30 plus page documents. often

we're not going to pull the full context

of the document inside there because

we're actually going to spend a lot more

money on the LLM token usage by doing

so. So, we're going to opt for the

neighbor expansion technique, which will

be really good in most use cases, which

is actually just giving it enough

context so that it can review the

information that just came before it,

the information that came after it. Now,

you can overengineer and make this

really specific to pull exact sections,

but what we're going to do is just try

and do the 20% that gives us the 80% of

the results, right? So, what we're

actually doing and what we didn't show

you earlier is inside this langen code

node, we've extended the logic here to

actually output things like the chunk

size, the chunk number, the content that

comes after it, i.e. the chunk that

comes after it, and the chunk that comes

before it. But not only that, some

additional summary information around

that context. So that when we actually

output any chunk inside here, we can see

all of this information in the JSON. So

it's got no chunk before because this

was the first chunk we pulled, chunk

number one. But after it says may result

in electric shock, fire, and or serious

injury, which we can see was the chunk

from afterwards. And then in the

summary, it's also given us this

additional context. Read all safety

instructions and warnings before

assembling and using your petrol lawn

mower. So that is what this text chunk

is about. Basically, it's taken that

text chunk and it's used an LLM to say

summarize what this is about and apply

that as context. And what that means is

when we upload or ingest our data in the

earlier stages, we've actually uploaded

all this additional context in our

metadata. the chunk number, the context

before, the context after, and the

context summary. So that when we do

actually come to retrieve the

information, we not only have the

semantic search from the vectors that

pulls back the most relevant

information. Then we rank those chunks

using the re-ranker. We apply one final

layer which is actually bringing in all

the additional context from before and

after that chunk to give the LLM that

we're using the best chance to give us

the most precise answer with the

information it's got. And we can see

that in action when we go back to the

chat window. We run our query for safety

instructions through and actually it's

then able to apply all those three

layers to find the right information

from the document and actually return

that to us. And we can see here here are

the consolidated safety instructions

extracted from the provided manual text.

Follow these every time you assemble,

operate, transport, maintain or store

the lawn mower. And it's got a bunch of

information that it's pulled exactly

from that query. And that's a

combination of everything we've done so

far in the ingestion phase, the query

expansion, the reranking, and then

actually consolidating all of that

context to give us the best chance of

finding the right context to give us an

accurate answer. So hopefully you can

see how powerful that is and you've

completed now the three-step blueprint

for our advanced rag agent. So you've

seen all these distinct components and

now they need to be connected into that

robust pipeline for your rag agent. And

as you know this is exactly the kind of

challenge that workflow automation is

designed to solve. And that's why we're

using nan. But before you can assemble

complex systems like this you need to be

fluent in the core building blocks that

you'll be using. And that's why in this

next video, we're going to focus on the

8020 and I'm going to show you the 13

most essential nodes within NAN that

you'll need to master for building

advanced AI systems like the one we just

designed.

Master RAG Workflows in N8N By Learning These 4 Nodes

Simon Scrapes | AI Automation

45 days ago

22:45

AI Automation & Agentic Workflows

Rank #2

Description

🚀 Grow your business with AI & Automation: https://skool.com/scrapes 💻 14-day free trial with n8n: https://n8n.partnerlinks.io/scrapesai Stop struggling with hallucinations and "silent failures" in your n8n workflow. This n8n rag tutorial reveals the 80/20 of building a professional rag agent in n8n, moving beyond basic text splitters to master n8n ai agents for beginners. You’ll learn how to use ai nodes n8n for agentic chunking, implement supabase n8n rag for better data storage, and use query expansion to supercharge your n8n automation workflow. Whether you're exploring n8n nodes explained or looking for a complete n8n ai masterclass, this guide covers everything from Llama Parse ingestion to Cohere reranking to ensure your no code workflows deliver precise, context-aware answers every time. 00:00 - The Hidden Flaws in Your RAG Agent 00:25 - 1: Ingest & Chunk Data Correctly 06:12 - 2: Find The Right Information 12:56 - 3: Give The LLM The Full Picture 18:42 - Bonus - Context Expansion ABOUT THE CHANNEL Hey there, welcome to the channel! I love helping business owners build AI agents and automation systems that actually work. Over 100,000 people have learned AI & Automation through my courses, and I keep things focused on what you can use today - no fluff, just practical implementation. Whether you're automating your own business or helping others do the same, glad you're here. #n8nrag #ragmaster #n8ntutorial

Video Details

Category

AI Automation & Agentic Workflows

Featured Date

January 14, 2026

Quality Rank

#2

AI Recommended