Loading video player...
A lot of people are calling Gemini File Search a gamecher that's going to kill Rag. Well, I spent two days implementing and testing it in N8N and I uncovered five key aspects that most people are overlooking. And some of these could massively trip you up if you build on this functionality. So, no point waiting around. Let's get into it. And to set the scene, a quick overview of what Gemini file search actually is. Essentially, it's a tool that's built into the Gemini API to ground the AI's
responses in your data, making them more accurate, relevant, and verifiable. And that's essentially the purpose of Rag anyway. And what sparked people's interest is this idea of a fully managed RAG system. There are a lot of moving parts to a Rag application. And Gemini has abstracted some of these. So from an ingestion perspective, it'll handle taking in your file, chunking it into segments, embedding that into vectors and storing those in a vector database. And then when you ask a question of
Gemini, it can then convert that question or that query into an embedding, carry out a semantic search in the vector store and then generate a grounded response. So with this Gemini file search tool, it's able to handle all of this behind the scenes, making life a lot easier for you. And there are some real benefits to this approach. The main one being that it's a fully managed pipeline. So you don't need to provision or manage any vector database infrastructure like Quadrant or Pine
Cone or Superbase. For the moment, it's incredibly cheap with no storage costs. And because this is a fully managed rag system, it means you can rapidly prototype rag solutions on top of it essentially just by wrapping the main APIs. And that's something Gemini demonstrated very well in their press release where they use the AI Studio app builder to quickly prototype an app. Another great aspect of the system is that it can ingest lots of different file formats and it's not just scanning
machine readable text. It's carrying out some level of OCR. So, it's able to read scanned PDFs and documents, for example. When it was released earlier in the week, people were describing it as a new gamechanging tool that's going to kill RAG. But not only is this actually just rag built into the Gemini API, it's not new. Open AAI have file search built into the responses API and they had something similar built into the assistance API before that. And there's
almost feature parity between OpenAI's file search and Gemini's file search. But I think what's actually captured the imagination of the public is the pricing. If we look at Gemini file search, it's free storage. You pay 15 cent per 1 million tokens when you're embedding the documents or importing them. And in terms of an inference cost it just depends on what model you choose. And I think this was a really clever pricing strategy. That idea of having free storage yet having quite
expensive document embeddings. With OpenAI, for example, they charge for storage. Now the first gig is free, but after that it's $3 per gig per month or 10 cent a day. And then they charge you $2.50 50 for file search tool calls which I think is a bit stingy to be honest. So I spent two days testing out the Gemini file search API and integrating it into NADN. And here are the five key aspects that I realized people are totally overlooking in this conversation. And the first one is you
probably still need a data pipeline. As I mentioned, it was a very clever demo that Google used when demonstrating this functionality. The idea that you upload a document in a web browser and then chat to the document is all well and good, but for production rag systems you generally need to import thousands of documents and then also keep those documents up to date. You can't have duplicates. And the API doesn't carry out any uniqueness checks on the documents you're uploading. So, you need
to build this logic on top of the API. Here, for example, I uploaded the exact same document three times into the file store. And then when I asked questions of the knowledge base, I was getting back duplicate chunks. And this was resulting in poor responses because of the 10 chunks that came back, most of them were duplicates of each other. So Gemini didn't have the data to hand to generate the response. And this handling of duplicate records is a critical part
of any rag system. And this is the reason why you probably still need a data pipeline. Except you don't need a data pipeline that's going to take a document, chunk it, create embeddings upsert them to a vector store. You instead need a data pipeline that's going to take a file and carry out a uniqueness check to make sure that that file hasn't been uploaded before or if it has and it's changed it needs to be replaced. Within our rag systems, we usually handle this with a record
manager. Within NAN here, I have a data table called my Gemini record manager. So for every document I upload to the Google file store, I keep track of that document here. The document ID, the file name and importantly a generated hash of the file which is like a unique fingerprint of the file and the document ID. And these things are crucial because if you upload this file again, it'll have the same hash and we can skip it. Or if you upload a new version of this
file, it'll have the same doc ID but a different hash and you can actually update it. This is what this type of pipeline would look like. I've just uploaded a file to the folder. So, let's just run it here. We search the files and folders to get a full list of files to process. You will be running this type of pipeline on a schedule. So, this might be checking every minute for new files. So, you need to set a lock flag to make sure that you're not triggering a new import if an existing import is
already in flight. So, then with all of the files that are being processed, we download each file. We generate this hash which again is that unique fingerprint. We then search our record manager and we're looking for documents that have already been uploaded that have the same document ID. So it comes true to this if node. Does it have the same ID? And in this case it doesn't. So seeen as the document ID doesn't exist we want to carry out another check to see does the hash exist just to make
sure that there isn't a different file with identical contents. So with this search, we're checking to see are there any records or any documents that have the same hash value. And in this case there isn't. So we can now go to import this file. If it did, we could simply do nothing and just archive the file and move on. So this is a key design pattern in a production rag system. You need to avoid duplicate content. That's going to pollute your vector store and then it'll
rapidly deteriorate the quality of responses because you're just getting lots of chunks back that are duplicates of each other. So back down to here, the doc doesn't exist in the record manager. There's no other documents with the same hash that exists in the record manager. So from here then we can just reload the binary of the downloaded file. And this is a great NAND hack because usually the binary of a downloaded file is only available to the node after it. Whereas
with this code node, you can essentially reload the binary of a previously loaded file by using this expression. So that one is definitely worth taking a note of. So outside of that then we have our binary and then before uploading to the Gemini file store we want to extract out metadata so that we can associate it with the file in the vector store. This makes retrieval a lot more accurate because you can search subsets of the vectors within the file store. I'll get
back to this in a few minutes because there are problems here that I want to discuss. But after that then it's straight into uploading the file to the Gemini file store. It's a two-stage process. Firstly we get an upload link. So you need to pass in various headers and then you can also pass in your custom metadata and any custom chunking configuration that you want to send in. And then once you have your upload link again I'm reloading the binary and then I'm uploading it to the vector store
based off the upload URL that I was provided in the previous step. After that then we can check if it's processed. We can add the record to our record manager. And this is crucial because this is where we save our hash or document ID and the file ID from the Gemini file store. And from there we can archive the file and move on. And just to demonstrate now if we try to reprocess that file. So it's disappeared from here. Let's drop this back into the folder to be processed. And now let's
re-trigger the ingestion flow. And you'll see now that the document ID does exist in our record manager. And as a result, it's just coming up here to do nothing and archive the file. So, we've avoided duplicating the content in the Gemini file store. And let's do one more test. So, let's say that there's a new version of this file. So, let's drop it back in here. And if we go to file information, manage versions, and let's upload a version two of this file. And
this is a great example of where you only want the latest version of a file in a knowledge base. Now, I don't actually have a latest version of this file, so I dropped in a dummy file, but we'll click close. And you can see that has just changed. So now if I come in here and rerun the ingestion and again this might run every minute anyway when the schedule is up and running. And now you can see the doc does exist. However it actually has changed. The hash is different. So we've removed the old
version and this is deleting the old document within the Gemini file store. We're forcing the deletion of all the chunks. We're deleting the record in our own data table in N8N. And then we go back through the process of uploading the new file. Now, if we come into our record manager, you can see we now have an ID26, we're missing a document headline. That's fine. I'll explain that in the next section. So, that in a nutshell is why you still need some level of a file processing pipeline with
the rag system based on Gemini's file search. The only difference is it's just not as complicated. You're not getting into extracting text from documents chunking them, embedding them, upserting them. Instead, you're just sending the file to Gemini to be processed. And then back to the idea of not needing the infrastructure to run rag, you still need a database here. Here I'm using Nadn's data tables. So that's all coupled in their system. But if you were running this in code, you would need a
Postgres database. You would need a file somewhere. You need some way of keeping track of the documents that are in your vector store. If you'd like to get access to our Gemini file search ingestion and inference flows, then check out the link in the description to our community, the AI automators. And I think it's fair to say that the Gemini file search is a mid-range rag system. It's better than naive rag, but obviously it's missing the advanced rag techniques that certain use cases
absolutely require to get accurate results. And secondly, it's a black box. Everything is hidden away behind the API. And this is of course by design but it just means that if things start going wrong, if you're not getting the right grounded answers from Gemini, it could be hard to figure out what actually is the problem. In my previous video on this channel, I talked about how there's no one-sizefits-all approach when it comes to rag systems. And this still holds very true. This Gemini file
search system will be a very good fit for certain basic or mid-level use cases. However, once you hit a ceiling with the responses that you're getting there is no way to dive under the hood and start making changes. At that point you'll have to completely replatform into something you have more control over. And Gemini File Search is lacking certain features like hybrid search contextual embeddings, re-ranking multimodal responses, or context expansion, which is a topic I've talked
about on this channel. And while it can ingest spreadsheets and CSV files, it is at its core a semantic search engine. it doesn't have the structured retrieval that certain users questions require. The third aspect ties into the way it reads the documents and actually chunks them., Here,, for example,, I, have, a non-machine readable document. As you can see, I can't select any text here. However, when I imported this, all of the text was OCR successfully, and this
was pretty fast as well. So, the system does work quite well on the OCR front. However, you're not getting markdown headings with the OCR or with the text extraction. So here I have a heading one and a heading two. And if you look over here, it's just text separated by new lines. So you are losing the document hierarchy with this system. It just isn't carrying through the structure. And I've created multiple videos on the importance of markdown chunking when it
comes to rag systems. So that's a little bit of a pity. The other thing I was surprised by was that there's quite basic chunking, it seems, in the system. I have read that they're going to be using intelligent and dynamic chunking. But from my experience of importing this file, this chunk that was returned, for example, is starting in the middle of this sentence of the document. Now, this could be the overlap. So, it's most likely using recursive character text
splitting. And this may be the start of the overlap. So, let's say the start is actually this if the compute resource which is here. But at the same time it's starting mid-sentence, which is not ideal. And if you look at where it finishes, it finishes during the runbased. It's finishing mids sentence. So you can see down here during the runbased. So it's missing on one of the following which is at the end of that sentence. So this is pretty crude chunking as far as I'm concerned and is
likely going to result in issues where it's actually splitting sentences and losing critical context in between chunks. Now, this is a key problem of rag systems anyway. And this is why I typically use markdown chunking so that we can have cleaner, more contained chunks and we're not missing critical context. So, that one was a little bit of a concern, but again, because this is a black box, because it's an API, they might upgrade this tomorrow and it might work fine. So, you just don't know. If
you do have a use case that will benefit from leveraging the structure of a document using its headings, then check out this video on our channel called Next Level Rag where I demonstrate a technique called context expansion. My fourth learning is that metadata extraction is quite challenging with Gemini file search and they're missing one key feature which would be a huge addition if they added it. So to get back to my ingestion flow here, I have a metadata enrichment section. And just to
take a step back, the way I would normally do this is I would import a file. I would extract out the text. And you need lots of ways of extracting out text because there's different file formats and different ways of going about it. But once you have the piece of text from that file type, I would normally then send that into an LLM either the full file or the first few pages of a file. And I would extract out a document summary. I may extract out things like document dates, categories
anything that would make sense from a metadata perspective so that we could filter the vectors during query time to be able to get better responses from the knowledge base. And the issue with the Gemini file search store is that when we upload a file, I can show it to you here. Initially, we get back an upload URL, which is this one, we then send the binary and we get back a task ID that we can check the progress on. But even when the document has been successfully
processed, as you can see here, we're not getting the contents of the document to then further enrich these chunks with rich metadata. And that's the key problem. Once you upload the file to the file search store, there is no way of retrieving all of the chunks of that file to be able to recreate or rebuild it., At least, I, haven't, found, a, way, to, do it yet. So the difficulty with that then is you essentially need a different way of processing that file to extract the
text so that you can then use an LLM to extract out metadata. And that means you start recreating some of the abstracted features that Gemini have created in the first place to make life easier. So this entire metadata enrichment section is a bit of a problem really because here I'm only handling PDF files, but I need to be able to handle the 100 plus files that Gemini should be able to handle. So ideally they would add another endpoint where you can essentially fetch all
chunks related to a document and that way you can then reprocess that document in a different way. And they do have a metadata enrichment endpoint that you can use to update the chunks or update the documents in the file store. The actual metadata filtering when triggering Gemini works pretty well actually. So if we open up the payload here, we're passing in metadata filter. I've just hardcoded document sport. These are Formula 1 documents and I've set the AI agent to clarify with the
user which sport we're talking about. So if I ask what are the rules on pit stops for example, the agent has come back to me to specify which sport I'm referring to. So that way we're getting the right metadata filter to get a better result set from the Gemini grounding. So we'll say formula 1 and then the agent passes in formula 1 as a metadata filter into the generate content endpoint. So this is one of the ways that you can use Gemini file search within N8N is to hook
it up as a tool call for an agent. The problem with this approach though is that you have multiple agents. So this is the main agent. I'm actually using Gemini here and it's calling another Gemini agent that has the file store attached. And this is because Nadn doesn't currently support Gemini file search stores. So we get back our answer which looks good. And if we look at the response from the tool call, you can see the text response that we're getting back. Here you can see the chunks that
have gone in to formulating this response. Now, these chunks do look very large. And again, this is the nature of the black box. I can't exactly dive in to figure out why that's the case, but what's nice is you do get these grounding supports. So, of the response that was generated, you can see which chunk indexes refer to which pieces of text within the response. So, that's a really nice feature. From an N8 perspective, you're probably well used to the AI agent node where you can hook
up different tools, different models and away you go. And while you can use the Gemini file search with this node it's more as a separate expert or tool that you actually chat to. So it's more like a sub agent. Whereas if you actually want to use the Gemini generate content API endpoint properly, you're better off just to hit it directly and then pass in custom payloads depending on what you want. And this is what I mean by you are really tied to the Google infrastructure and the Gemini
ecosystem. The Gemini API has huge capabilities around text processing image, video, documents, file search now tool calling. So it is ideal if developers are looking to leverage a lot of this tech. And in a way, it's actually competing against NADN's own AI agent node and functionality around tool calling and memory and model usage. So you definitely can use the two systems together. You just need to choose a design approach when it comes to actually building out the chat flows.
Here I just tie directly into the API. Here I'm using an AI agent that has a tool call into the Gemini API. And there's one other option and it brought out a dedicated Gemini node. Now this doesn't yet support file search stores but it's possible they'll bring out a new version soon where you can actually just pass in the file search store name into this node and then you can hook up different tools and away you go. lots of different ways that you can actually approach it. So that's the metadata
filtering and extraction and the challenges around that. So I think an additional API endpoint to fetch all chunks of a document would be really really welcome. And finally, the last thing is with this approach of using Gemini file search or with the likes of using OpenAI's file search, you're totally wedded into those ecosystems. All of your data will be stored with those companies. So you definitely have to satisfy yourself with the privacy policies, the data retention policies
data security policies and then you also need to think about things like personally identifiable information GDPR because with these types of hosted solutions, your corporate data is essentially sitting on someone else's property. So that is the price you pay for the convenience of this type of service. And the other thing is vendor lockin. You can't exactly use Gemini File Search and OpenAI's Inference or LLMs. With Gemini File Search, you have to use 2.5 Pro or 2.5 Flash. And again
that'll be fine for some use cases, but for others, you will want to have the option and flexibility. So, onto the verdict. Where does this fit in the Rag landscape today? Well, number one, it isn't new. Rag as a service has existed. There's plenty of companies that do it as well as the likes of the main providers like OpenAI, AWS Bedrock Azure. So, Rag as a service isn't new. The key thing here, as I mentioned, is the pricing. But even outside of that the idea of a fully managed Rag pipeline
where you can simply just send in documents and then just chat to them is pretty compelling for a lot of users. And if your company allows for it, it's definitely not a bad first step into the world of RAG. That being said, you definitely lose the flexibility of configuring the infrastructure behind the scenes and once you hit that ceiling with the technology, you definitely will need to replplatform onto something else. But by and large, this will be a great solution for a lot of companies.
If you'd like to get access to our Gemini file search ingestion and inference flows for N8N, then check out the link in the description to our community, the AI Automators, where you can join hundreds of fellow builders all leveraging the latest in AI and Rag to further their businesses. I've spent hundreds of hours learning rag and aentic design patterns, and I've distilled all of that down into nine different rag designs. Check out my master class here for a full deep dive.
š Get access to our Gemini File Search n8n workflows + advanced RAG blueprints in our community https://www.theaiautomators.com/?utm_source=youtube&utm_medium=video&utm_campaign=tutorial&utm_content=gemini-file-search Lots of people are calling Gemini File Search a "game-changer" that will "kill RAG." But after two days of production testing and n8n integration, I've uncovered 5 key issues that nobody's talking about. In this deep-dive, I'll show you exactly what works, what doesn't, and where Gemini File Search actually fits in the RAG landscape. šÆ What You'll Learn: ā How Gemini File Search actually works (ingestion, chunking, embeddings, retrieval) ā The 5 critical limitations hitting production RAG systems ā Why you still need data pipelines (duplicate handling, record management) ā Metadata extraction challenges and workarounds ā Real pricing comparison: Gemini vs OpenAI file search ā Three different n8n integration approaches with pros/cons ā Vendor lock-in considerations and data privacy implications ā Complete production ingestion + inference workflows š Useful Links: Context Expansion & Document Hierarchy: https://www.youtube.com/watch?v=y72TrpffdSk Gemini File Search: https://blog.google/technology/developers/file-search-gemini-api/ Gemini File Search Docs: https://ai.google.dev/gemini-api/docs/file-search ā±ļø Timestamps: 00:00 - What is Gemini File Search? 03:04 - #1 You Still Need Data Pipelines 10:07 - #2 Mid-Range Black Box RAG 11:25 - #3 No Markdown & Basic Chunking 13:38 - #4 Metadata Challenges 19:09 - #5 Vendor Lock-In & Data Privacy 20:02 - The Verdict š¬ Questions or Comments? Are you considering Gemini File Search for your RAG systems? What's your biggest concern about managed RAG solutions? Drop your thoughts below!