Stop Vibe Coding: Context Engineering & RAG for AI Agents [# 12 Cole Medin] | DailyDevLists

Loading video player...

Full Transcript

10,761 words • EN

There is way too much fluff in the AI space. Every single day there's five new technologies that are released and the biggest challenge is to figure out like what should I actually pay attention to. Cole Medin is one of the sharpest operators in applied AI today. The CTO of Automator, a founder, a builder, and a teacher with a gift for making advanced AI actually usable. Through his company, Dynamus, and his private AI mastery community, he's helped thousands

up their workflows. And with a YouTube channel topping 179,000 subscribers with over 8 million views, he's become the go-to voice for anyone serious about AI agents, rag systems, and scaling with automations. Whether he's architecting intelligent systems or teaching the next wave of builders, he's dedicated to one mission, helping people leverage AI with clarity, precision, and impact. >> How does that work for you? Like in the end, the bottom line is this is going to

save you time or make you money. So like, what should this agent do for you to get to that point for you? There's this cursor, there's claw, there's whisper flow. What's your stack? You see the difference between a midlevel developer and a senior is that mids are really, really fast. They're really good, right? They might be a mid, but you learn these painful lessons as time goes on that will then save you so much time on the back end. >> Basically, for every single piece of

information or concept that we've turned into a numerical representation, we can attach additional values that describe that information. The problem was people didn't really treat their prompts with enough respect and the whole like context around their Asia with enough respect. You just give a couple of sentences cuz you're, you know, kind of acting lazy. >> What it makes me think about is almost like prompting and praying. You just shove it off and pray that it works

versus context engineering. >> I just ask the questions of how are you going to actually use this agent? And sometimes they don't even know themselves. And that's when it gets trickier. How much time do you spend on this pre-planning phase of context engineering before you launch in the actual development of the project? Hey Cole, it's awesome having you on the podcast. One thing I want to say is that people's time is really valuable and so what is the value that people are going

to get by listening to this podcast? >> Yeah, good question. So, we're going to dive into context engineering, what it looks like to build our agents in a way that actually scales. Then we're going to get into rag, which is something that I really specialize in. I've done a lot with rag and n and superbase. We'll get all into that. And then the last thing we're going to talk about is how we can take our workflows in N8 and really move it into a production environment. And so

literally no matter what you're doing with building agents that you're going to care about these things and so we're going to hit on some highle stuff that's going to apply no matter what you're working on. >> Fantastic. So looking at these three key areas since you are the progenitor or the online inventor of context engineering. Can you talk to me a little bit about the genesis of context engineering? So there's a problem that we had towards the start of the year

that is definitely addressed at this point thanks to context engineering. But the problem was people didn't really treat their prompts with enough respect and the whole like context around their agent with enough respect. It really deserves to be its own engineered resource just like the best the rest of our system like our end workflows or if you're working in code your code base. And so like even going so far as to version control your prompts, like you keep track of how your prompts have

evolved over time, you different versions of your prompts so you can go back to another version if your new one just started making your agent go off the rails. That kind of thing is really important and thinking about the context for your agent as a completely separate system from the workflow itself. So the tools that you have for your agent, the descriptions for those tools, which is also a part of the prompt to the LLM, and then of course your system prompt.

And those really are like the core components when you think about context engineering for the agents that you're building in NAN. >> And can you explain the difference for anybody that's listening the difference between vibe coding and context engineering? >> Yeah, 100%. So vibe coding in essence doesn't treat the context with respect like I was saying. So when you vibe code and you just tell your AI coding assistant to go and create this codebase or you tell the agent builder to make

this agent and you just give a couple of sentences cuz you're kind of acting lazy. That's what people do a lot where they just expect the coding assistant to take a very simple request and do exactly what they want the assistant to do. But it's never that simple because they're going to make a lot of assumptions getting things wrong not because the large language model is doing a bad job but because you weren't specific enough. And so another big component of context engineering is how

can we be specific enough? And if I was to say like the one primary goal of context engineering, it is simply to reduce the number of assumptions that your AI agent or coding assistant is making. And that is all about specificity. And with vibe coding, we don't have specificity. It's all just sending in a prompt, hoping for the best, and going with the flow, which I think is where the word vibe comes from. So context engineering is before we even have our agent or coding assistant do

anything, we are going to sit down and really think through what are my requirements? How am I going to translate that into a system prompt or the tools that I have for my agent, whatever that might be. >> What it makes me think about is almost like prompting and praying. You just shove it off and pray that it works versus context engineering. Sounds like there's more of a methodology, a system, and an overall practice to make sure that you're giving it the right input so

you get the right output because we both know that AI runs off of data. So the more data, the more details, the better the output's going to be. Yeah, that's exactly right. Yeah. And the system is a key word there because you create your process. like you have a template for your system prompt that you always use or you have tools or capabilities I want to give my agent and you have something that's more generic that you adapt to your specific use cases like having that

kind of thing is a really good way to avoid vibe coding because then you always have that system in place like you said. So that's good. >> Run me through if somebody wants to switch or level up from vibe coding to context engineering. What is the framework? What is the methodology? What is the practice? If I'm just looking at a blank canvas or a chat GBT window, what are the elements or components that I need to do or even better the SOPs, step-by-step processes in order to make

a solid context engineering game plan? >> Yeah, that's a really good question. Honestly, I would say start simple and just think about context engineering as a mindset shift more than it is like you have to fundamentally rethink your entire approach because you can go from vibe coding to context engineering by very simply just spending more time with your planning up front. And something that you can do very practically to step into that right away is ask your coding

assistant or your agent like whatever large language model you're working with, ask it to ask you questions so that you can get on the same page. And so something that I do all the time when I'm working with AI coding assistants is I give like a brain dump of here's what I want to build next. I want to build this application and it's going to have these kinds of users. I want people to be able to do this and this. And then I'll specifically ask it. ask me any

clarifying questions for any gaps that you have in your knowledge. Let's make sure we get on the same page here. And and that will start to teach you like, oh, this is where I should be more specific. Like you see these patterns develop when you go through this process. It keeps asking you these questions where it's, oh, I'm never thinking about security, so maybe I should start being specific about that just as a random example. And so you build that muscle over time. Okay, this

is how I want to create this system of getting more specific and like practically what that actually looks like. like what those different categories are that you want to list out to the assistant. >> It's almost like metaprompting. Metar prompting was using AI to figure out what kind of prompt you should give it. So, you prompt it to ask the questions and then it helps you formulate the prompt that's necessary. Yes, it sounds like a part of this though is you brain

dump with some sort of metaprompt that says, "Here's what I'm thinking about doing. I want you to ask me as many clarifying questions as I need in order to come up with something that you have certainty with so that we can move forward with a solid game plan. Is there anything else inside of there where you do brain dump to metaprompting? Is there any other elements inside of there that you think people should be aware of besides just ask me questions? Are there

certain types of questions or I want you to create me a checklist? >> Yeah, so a lot of these things I cover on my channel quite extensively. just thinking about besides having the coding assistant help you come up with like areas that you need to include in context engineering, I give suggestions myself about the different kinds of things to consider. And so like very much like at a high level when you're building a new AI agent or a new application, you want to think about the

success criteria, you want the coding assistant to understand if I check off these 10 boxes, I built these 10 things or the agent can now do these 10 things, then it is complete. Even something really high level like that people don't usually think about but it's really good to like if you're a project manager you've heard of this phrase all the time where you have like how is the end user actually going to use this agent or application. So you define that clearly

so it's not just like the what we're building but the why and the how and like having that defined as well. And then also when you have a larger list of things that you have to knock out if you have clearly defined tasks so you can break up the problem that is also really useful. I could go on and on about all these different core categories that I like to include in my plans and like what I'm giving as context, but I think I'll leave it there for those are like the core three.

>> Sure. So, let me move through some of these just to kind of recap and come back to an assessment so someone can walk away with practical tactical advice and tips. So, step one, you can say, I'm going to brain dump everything that I have of what I think this thing would look like. I know that I'm gonna have it ask me questions, but you also know, can I create a a checklist or a step-by-step process of how you want this type of thing executed? As well as what does

success look like and what does failure look like and what is a user journey. You could say what is my path from somebody being discovering the application to being onboarded to the habit loops and patterns all the way to mastery of this application and figuring that out. And if you can have it say these are the different areas I want to tackle. Let's first work on the overall questions you have. Then next thing is from that let's work on user journeys and then from that let's work on success

versus failure states and then from that we can say okay based on all this information can we make a order of operations SOP for the different categories that we want to tackle so that we can kind of line by line go down through a checklist and walk away with something that it doesn't involve context rot and the overall issues where it has a tendency to forget the last thing you said from 200,000 tokens ago. I'm doing a synopsis of your channel in about a paragraph or statement is again

just going to open up the cats and what's possible. Somebody could clip out just that one section and go, "Okay, great. I have a framework to lead me down through a series of prompts that could then ultimately get me closer to being a context engineer." >> Yeah, that's good. Yeah, you phrased it really well. And I think like the one big takeaway for all of this is when you're engineering your context, you also just want to be very specific about how you can break up something into many

different granular implementations, right? Like when you build an application or an AI agent, you start with the basics. You don't try to build everything at once. And that's also part of context engineering is how we can be specific to the agent. Like we have our phases here. Phase one, let's start with something very simple because coding assistants, any kind of assistant that's going to help you build like an end workflow, they always love to overengineer things. They will always do

more than you ask it to unless you're very specific about how focused you want to be. >> I have two questions. My first one I want to get to Thomas and then we'll probably shift gears over to Rag. Question one, how much time do you spend on this pre-planning phase of context engineering before you launch into actual development of the project? I'm not counting the back and forth chatting with AI. How much time planning makes sense? >> Yeah. So, it depends a lot on what I'm

building. I would say a general rule of thumb is somewhere between a half hour and an hour. And so when I'm starting a brand new application or agent, whatever, I always start with the overarching document that I call a PRD, a product requirement document. You might have heard that before. It's really just outlining the scope of work for finishing our application, like that first version. And so like that creating that and putting my thoughts into the architecture and security and like the

step-by-step process, that's what I'll put like a half hour or an hour into before I'm ever even like talking to the coding assistant or having it do something on my behalf. That's great. And it's good to have a time frame because people don't always understand. Now you understand that this is the time window and these are the elements that I should ideally have chunked down in that time window. I'm sure also with any skill or ability that you can get into,

the better you get at it, the faster you can go through it, but there is still a methodology to follow. >> That's right. You talk about coding assistance, there's all these different versions that you could use, right? There's this cursor, there's claw, there's whisper flow. What's your stack? >> Yeah. So, claude code is my favorite AI coding assistant right now. but also the entire process, the system that I built for context engineering, it is agnostic of the individual tool that I'm using.

And so if Anthropic has an outage like they unfortunately have once in a while, I'm able to immediately switch to another tool without a hiccup. So I can go over to Codeex, which is my second favorite. Also, Google just released anti-gravity as a new agent IDE along with their release of Gemini 3. That's probably my third favorite right now. So there's a lot of fantastic tools out there and there's not one that completely dominates the market like Cloud Code did even a few months ago.

But yeah, like I would say don't focus on the individual tool is also just like a little golden nugget of advice that I want to share. But cloud code is the one that works the best for me generally right now. >> Inside init for anybody that doesn't know this and people that are starting to build out with these projects, it is always good to have a fallback AI inside of init. we have a fallback AI feature that's inside the agents that if OpenAI has an outage or something else has an

outage, it has a fallback to another AI. As someone who's ran an AI agency before, you have people calling you up because everything's broken. You don't know why it's broken and then you realize it was just chat GPT having an outage and then that was the issue. So fallbacks are super critical whether you're doing it inside or you're doing it inside of this coding environment. That's one one piece of it. The other the other element I want to get into is not only about the time window and the

technology stack on these pieces, but in terms of if we look at this as a as an overall arching system, you have context engineering to create the game plan. You take that game plan and you put it into a production environment like a cursor or cloud code or any of these other ones out there. It's really going to move into that. And I know it's important and we're going to get into rag in a hot second, but one thing I also want to double down on you. We're talking about

this is that people, humans, we're creatures of habit. We do the same things over and over again. And so a lot of people, a majority of people still use chatbt to write code as me and you both know it's not the best coding platform. What are your activities to be able to stay at the cutting edge to be able to say okay I was using claude but now there's this I was doing that what is your process or what are your patterns that you have to stay uptodate >> yeah so the biggest thing to keep in

mind is there is way too much fluff in the AI space every single day there's five new technologies that are released and the biggest challenge is figure out like what should I actually pay attention to versus what should I scrap immediately I'm not even going look into this cuz I know that it's not going to be something that I'm going to evolve my stack into using. And so the biggest thing for me whenever there's a new technology that is released, I'm going

to do a lot of research on that and like all the other things that were released recently and I'll use a tool like Perplexity or OpenAI's deep research to help me with that kind of as a starting point. Here are the five big things that I heard about this week. tell me more about them and based on my preferences because you can set up projects and things so it understands my development stack and what I'm building it can suggest is this something that I should

actually care about. Not that I always trust its recommendations but it's a good starting point. And then the other thing that I'll say is that having an AI community is the most important thing for being able to cut through the fluff and learn what people are leveraging that's new that you would actually want to take advantage of yourself. I happen to be running an AI community. myself. So I get to learn from everyone in the community as much as they're learning

from me cuz I'm seeing the things that they're playing around with. And a lot of times these people, they happen to have more time than me for the couple of days that something new came out. And so for example, when Google released anti-gravity, it's their agentic IDE that I mentioned earlier. I happen to be very busy during that couple of days working on some YouTube content. So, I didn't get to try it, but there were like four dozen people that were able to

use it within my Dynamo community. And so, I got to hear all about it before I even tried it out. And from what they said, I knew, okay, this is definitely worth paying attention to. And like I shared earlier, I still think cloud code is better, but it's close. And like that kind of thing when, okay, this might become the best tool for me to include in my stack very soon. I want to pay attention to that. Yeah, it's hard to separate the hype from the helpful, especially in this whole world of

marketing buzz. And so when someone says, "Oh, this is the innate killer. This is this killer." And it just they're going clickbaity, rage baity kind of stuff, and you don't know until you actually dive in and you really do an analysis of it because really the data tells all. And I completely agree. When you're sourcing and you have a collective people, a community of people that are all deep researching these categories, then it does surface things to the top as well as doing your own

research. I think it's incredibly powerful. So moving from context engineering into rag and inaden let's talk about the just getting started all the way up to the biggest rag deployments the ones that have to handle if you do if you rag a seven-page piece of paper that is one thing but if you have millions >> of documents a different type of approach is needed. So talk to me about this process. >> Yeah 100%. First of all, rag with N8N is my introduction to NAN as a platform.

And the first video that I ever put on YouTube that got like a significant amount of views, it's up to I think 150,000 right now. It was me using N8N to build a rag agent with Superbase. So, it's very cool. Like it has a special place in my heart. This entire stack and building rag agents and like in that video I talk about building a very basic rag agent. So there are a lot of strategies that we can get into. If we have time, I'll probably talk about a couple of them. But at a very basic

sense, what you have with RAG is you're giving your agent tools to search your knowledge base, your documents. Because the problem we have with agents is you can never give all of your documents into the context of your LLM unless you just have a very small number of documents. And so what you have to do is you have to give your agent the ability to search through your documents. But keyword search is usually not enough because a lot of times you want to connect concepts together instead of

just keywords. Like for example, if I am interested in let's say looking at some fast food restaurant options and I have some agent that can help me because it like understands all of the fast food options around me. If I search for like I want a hamburger, then I wanted to recommend maybe the Big Mac from McDonald's. just trying to find like a really universal example where if you do a keyword search, you might not actually find that. You search for burger, you

might not find McDonald's and Big Mac because there's no exact keyword there. But if you use rag and you can connect concepts together, then you have a special kind of model in rag called an embedding model and it's able to connect these words together or phrases together if they are conceptually related. Like a Big Mac is a burger. Therefore, there's actually a lot of similarity there. And so, you do this kind of what's called semantic search where you're finding the

similarity between phrases and keywords instead of just doing is this word equivalent to this word. That's a very fundamental search and you can combine them both together. And even that is a specific rag strategy. But essentially what you do with rag is you have your original data source. This could be a bunch of PDF documents you have locally. It could be a bunch of files that you have in Google Drive, wherever that might be. And you have some kind of process that takes the documents from

there and it feeds it into what is called a vector database. That is where you use the embedding model to create that kind of numerical representation of the data so it can be searchable by your agents. And if I could show a diagram right here, I'm not going to do that for the sake of time. I could like really walk through what this looks like technically, but just know that at a high level, we're transforming our data into something that is searchable by our agents and Superbase is my favorite

database to store this information for rag. And so they have this extension called PG vector that essentially turns Superbase, which is a Postgress SQL database into what can be used as a vector database. And so now whenever we have a query that our agent produces like I want to search for the best burgers near me then it sends that into the vector database it finds the concepts that are similar like McDonald's and Big Mac and it returns that as context to the agent. And so now

it can use that extra information to enhance its answer. So rag is short for retrieval augmented generation because we are augmenting the agent's ability to generate answers for us thanks to the information it can pull from its search in the vector database. And the vector databases they pretty much scale to infinity. And I mean take that with a grain of salt. It's not literally to infinity. And there's a lot of considerations that go into that. But when you have a million documents in

your knowledge base versus 10, the actual search is not affected that much. like the speed of the search is not affected that much. And so you're able to pull from a very large corpus of documents exactly the information that you need to pull in as context for the LLM. And so that way you don't have to overwhelm the context for your large language model. You can't even fit a lot of documents into the context if you try, but you're still able to get like that bit of information that you need.

So that's rag at a high level. the kind of thing that like I could explain for hours and really get into stuff, but I'll leave it there for now. >> You're right in terms of scaling. There's an equation that I try to make for people that might understand this that come from the design world or the photo world, right? You have a normal PNG photo that whenever you scale up that photo and let's say it's a small tiny headsh shot and you scale it up, it becomes very blurry

>> and so that's the image becomes blurry. If you turn it into a vector image, it turns that into math. And so you can scale a vector image up to infinity, it's not going to lose resolutions because it's mostly just doing math in order to have those ratios make sense. So that's why you can take a tiny head shot that's vector and turn it into a giant billboard more or less because you have the ability to turn it into math. And the same thing we're doing with words is we turn that into math using

vector vectorzation. And it allows them to be able to make it searchable for the AI. Hence why it becomes something that can be done at scale. >> Yes, that's really good. That that's actually a great analogy. I haven't heard that before, but that makes a lot of sense. Yeah, cuz basically what you do with rag is you take the query that either the user produces or the agent produces and you turn that into a mathematical representation as well as everything else that is already

numerically represented in the vector database. So you take that the numbers that represent this query and you drop it into the vector space and you just look at the other ideas or words that are close in that vector space that have a similar numerical representation. And so you drop it in, you pick out the things that are close and then that's what you send as context for the LLM. So it doesn't matter if there's 5 billion concepts that are somewhere else in the

vector space. We only care about what's close by. So it's not like we have to scan through everything in order to find what's relevant. That's why it scales so incredibly well. >> And some ways to produce better results. I'm going to lean into one. We can dial on that. Then we can also talk about Stanley's things up. There is an ability to add metadata to be associated with any of the vectorized databases that then allow us to do a reranking tool. Something that allows us to then get

more accurate results. >> Can you speak more to that? >> Yeah. So metadata filtering is very powerful. Basically, for every single piece of information or concept that we've turned into a numerical representation, we can attach additional values that describe that information. Um, so one really good use case people like to uh they ask me all the time, could I use one vector database for multiple clients? Could I have all their information stored in one vector

database? And there's a lot of reasons you don't want to just so that you have like full isolation. But like theoretically you can have a single vector database with all the information because you have a piece of metadata that says what client that information is for. This is also known as multi-tenency. So if you have a SAS application with all these different user storing information, it can just be a single database. And so then when the user wants to perform a search, you

would figure out okay what what's their user ID or what is their client ID? And then you do a filter. I'm going to look in the vector database, but I'm only going to search through the records where the metadata value of the client or user ID is equivalent to the one for our search. And so then we are only navigating through a subset of the information in the vector database. So we're absolutely guaranteed to have isolation for what we're actually searching through. And so that's a

really powerful concept. There's a lot of other good examples of metadata filtering like maybe you're operating in one business where you have different departments like you only want to search through the sales knowledge base or only you want to search through the marketing knowledge base or you have different like there's an idea of hierarchical rag where you have like different levels of categories and you want to be specific with like how deep you're going in your

search versus you want it broad into a higher set of information that you're searching through. You can use metadata for that. It gets quite extensive because it's a very flexible component to rag because you can really store any information as far as like just metadata attached to each record and then you can search through that in any way that you want. When you're talking about this hierarchical is very interesting. I do want to double down on this is people

that are listening to this that want to apply this into business. a lot of AI agencies and other people that go >> great this makes sense in terms of is there an overall system strategy process operation methodology for installing this into a business that does have marketing sales operation fulfillment customer support and they have all these different elements can talk to me a little bit of strategy is it all in one vector database do you break them up into different vector stores that then

give you access to know which one to go into how would you approach stall this into your business So it depends on one really big thing as far as if I do one vector database or many. It depends on if I guess the way to explain this is do I want to only ever search in one department like I'm only looking at the sales knowledge or searching through the marketing knowledge or do you maybe have the CEO of a company that wants to be able to search through everything at

once and if you want that flexibility then it has to be in one vector database or it's optimal to have in one vector database and then you take advantage of metadata filtering. And so if you have this internal dashboard where users sign in, if they're on the sales team and they sign into the dashboard, then there's a piece of metadata attached to their record, their user record. They are on the sales team. So now when they talk to this agent that you have built into this internal dashboard, the

metadata filter is going to be set to only search for the records in the knowledge base from the sales team. So you have this permission set up per user because then when the CEO logs into this dashboard, they maybe won't have any kind of filter attached to their record. So when they do a search, it's going to search through all of the documents for all departments at the same time. And then of course probably give them the option to do a filter if they want to

like really hone in on something. And that's the idea of hierarchical rag like the CEO is able to search very broad or maybe there's going to be some kind of drop down race. let me ask this question but only under the context of the sales documents or the sales data. >> Yeah, this is where the user journey user stories make a ton of sense because for example the CEO of the company might want to generate sales reports but they also might want to see customer tickets.

They might want to look at the entire system of what's going on. And so it's going to vary on what are their day-to-day operations in order to understand their patterns of behavior, their user journey, and then being able to design a system for that. So when you're doing this, are you when you go to say make this data metadata like what's your process for understanding a company's operational style to know that I do need to install this dashboard for the CEO needs to have these five filters

or this sales director needs to have these three filters. Do you have a process for bringing people through to understanding what kind of metadata structure you want to set up for this vectorized database? >> So, the most important thing when you're talking to a client or just some company where you're going to set this up for them, you want to not focus on the technicalities of things. I'm never going to explain metadata filtering the CEO of a company. Unless they ask about

it, they'll get into it if they want to connect with the engineering team, whatever. I've done that before where like I'll really get into the weeds on things, but at a high level, it's more like let me ask them about their process. in in the back of my mind, I know how that translates to the technicalities of things. Oh, they just told me that they want complete isolation and they're never going to want to search across different departments. Like that tells me like,

okay, it's probably good. I could just create multiple smaller vector databases for this company. Or they tell me the CEO's, yes, this agent's going to be great for my team, but I also want to use it to connect ideas together or see how all my teams are doing cuz maybe there's like real-time documents that are being adjusted into this pipeline. But in that case that tells me like okay it should be one vector database and so like I said don't focus on the technicalities I just ask the questions

of how are you going to actually use this agent in the end and sometimes they don't even know themselves and that's when it gets trickier but it's more like at that point you just guide them through okay here's my ideas for how you could use the agent like how does that work for you like in the end the bottom line is this is going to save you time or make you money so like what should this agent do for you to get to that point for you that clarity of thought

it's one of the things it almost feels like pulling teeth when you work with a business that they said I want AI to do everything and you have to say okay it depends on how you want to use it and if they come up with one two or three examples of how they use it but then once you get it fully built later on they tell you they want use it actually 15 different ways and it wasn't built for that it's so important to have those foundational elements of a clear mapping

of how the data flows and then we want to call them SOPs or user behavior journeys of what is the step-by-step process that you want to have done that I wake up in the morning, I'm in a panic because I think my company's running out of money. So, I go on as a CEO and I look it all up to see where's my money going. What are those patterns of behavior and how do we extract them out? And that's one of the I think talents of a really high level AI agency is the ability to be able to extract that

knowledge in the beginning. It's almost like context engineering, but you have to do it as team developed context engineering so you can get that data. >> Yeah. Yeah. It's context engineering for the business you're working with. So that you're making sure you're asking the right questions so that you know what they want and so you can set the right expectations. Like you said, Dylan, you don't want to build it for three use cases and then it's they expect 15 a month later. You establish

what those use cases are and then you set that as a clear expectation like in the contract this is what the agent is going to be able to do. If you want more then you'll pay to be quite blunt pay me more to extend the agent or build out more systems. Is there anything else you want to mention in the category of rag and doing these for large scale deployments? Is there anything else that people should be considering? For example, I need to extract all this information out of the team. We need to

go back and forth, be able to pull out this data so I can understand all the use cases in terms of the technical implementation. Is there anything that people should consider if they are doing this for a giant company? >> Yeah, so the biggest thing is you might be working with ugly data. So we talked like very high level like you have your data source and then you feed that into your vector database. Now what that feeding into looks like sometimes it can be really simple like in N8N you might

have a Google drive trigger where you're watching for new files and you're just working with Google Docs and so you just have a extract text from text file node and then that goes into the insert into sufabase vector database node. In an ideal scenario it's basically that simple. There's a couple of other nodes to make sure you don't like duplicate records and things like that, but technicalities aside, it can be that simple. Or you could be working with really messy data where they have

proprietary data formats or PDFs from scanned physical paper copies of things that just like you need to apply a really powerful OCR model on top of. You could be extracting information from these like really ugly database schemas that you just have to figure out like how do I even navigate this database? There's so many possibilities when you get into the weeds with a company that especially with proprietary data formats. It's going to be something you've never seen before and you just

have to do that problem solving before you even get to the point where you have the data to build the agent on top of. And so really the very first thing when you're building a rag agent for a business is to understand their data and establish like what does it look like to actually build what's called the rag pipeline to take information from their data sources and put that into the place the vector database where the agent is actually going to be doing the search

>> 100%. Yeah. Messy data unclean data some databases can handle it. You're talking about this the ELT or ETL, right? you want to be able to extract the data, transform the data, load the data where it needs to go or if that's you transform it ahead of time so you can put it in cleanly in a database and then some other databases are a little bit more forgiving so you can extract the data, load the data and then be able to transform the data afterwards. You're

right in terms of understanding the data that's coming in and then ultimately what you want to do with it comes down to how dirty is the data and how forgiving is the pipeline that you're going to put it into. >> Yeah, the end goal to get something into a vector database is to have the raw text. So like using an image model like an OCR model to pull text out of a PDF for example, you just want to have that string in your end workflow so that you could just put that right into the

superbase vector store or whatever that might be. >> Yeah, I was working with a client a long time ago and they were working with their client and they were having a chatbot upload the data and all of a sudden became all these variables because first it was text. Okay, no problem. Then PGs, okay, OCR. Then it was PDF, right? And then there was one that was a I forget the name of the file format, but it was like an email like an MTL ETL. I can't remember the file format. >> Oh yeah. Yeah.

>> I'm talking about it. It's like an email downloadable format. All of a sudden, it's just custom format that just came out of the blue and then it broke the system. So it's like how can you make sure that you're accounting for all of these different types of data formats that are going to go in that could inherently break your system. And you want to compensate for that. Very good point on why we want to know step by step user behavior. Are they going to be

only uploading text? Are they going to upload images? What are they going to be doing that's going to affect the pipeline? >> Yep. Exactly right. And one really important thing if you never really know and you can never really know what file formats people are going to try to upload, it's good to have a fallback kind of like we were talking about earlier with LLM fallbacks where if you don't have a file format that you recognize, then you maybe try to ex extract the text in the most standard

way possible. If that fails, then you should have some kind of way to report the failure either back to the user back in the system where you have your monitoring set up or both. You have to handle that gracefully. You can't just have the application crash. And so a lot of times what I'll have either just like in some JSON file that I have like config in my codebase locally or in a database somewhere, I'll have stored a list of the file extensions that are actually supported by the Reggga

pipeline. And then as I evolve the system, I'll add on to that list or change things. And then that way I'm always comparing against that list. So when a file comes in, I extract the extension. I see if it's something that I support. And then if not, that's when I go down like the fall back route that I was talking about. >> Yeah. Yeah. Look at the extension is actually a really good tip. If you look at the extension, go, is this a file type that we support? If so, continue. If not, alert.

>> That's right. Yeah. >> So then moving on from this, we've talked about context engineering. We've talked about rag pipelines to a good degree. I'm sure we can go much deeper, but shifting over to the categories of you've built out this awesome inadin workflow experience and this is something that you do. Some people take in and they bring it to scale. There's a lot of enterprise companies that do use this at scale for certain use cases and there's other people that want to bring

it into a different environment for whatever reasons they want to turn this into a different type of production ready environment. Talk to me about the process of converting an inad workflow into a codebase into some sort of other production ready environment so that you can then bring it to scale in another. >> Yeah. And this is something I do all the time where I will use nitn like you said Dylan it can definitely be something that you scale to production but a lot

of times what I like to do especially because I come from a very technical software engineering background myself is I'll build an n workflow as a proof of concept for something because it really is the fastest way to build things. But then when I want to really scale to production I want to work in a codebase I go through a process of translating that workflow into code. Typically Python code can be TypeScript as well. Those are the two programming languages that I prefer using. And going

back to context engineering, we actually do get to go a little full circle here because I will take the NAN workflow. I will literally just download the JSON of the NAN workflow and include that as some of the context for the AI coding assistant. And so again, going back full circle, the primary goal of context engineering is to reduce the number of assumptions that the LLM has to make when it's building something on our behalf. And the beauty of an NAN workflow is it's really specific because

when you download the JSON file, you have all the different nodes that outline the workflow for the agent or the rag pipeline, whatever it is. You have all the parameters that you set. You have the system prompt that you added into the AI agent node. You have all these different things. And so that reduces a lot of the assumptions the LLM is making. It can read that JSON file. And by the way, large language models are very good at understanding JSON. It's very structured for them to pull

out here are the keys, here are the nodes and their attributes. Like it understands all that right out of the box and so it can take that to understand what you want to build and then you just have to describe generally how you want to translate to code as much as you can and then maybe anything else that you want to add on to it. Typically I'll start really simple where it's okay let's just replicate exactly what we have in N8N then start adding on those other things that we want to do to

evolve it more into that production version of the code. One of my favorite things with Nadin is being able to copy the visual format and then paste it somewhere else and you can absolutely see all of the code right there. There's not a lot of abstraction layers. There are some other automation platforms that shall not be named that you can't really do that with which makes it a lack of clarity and those are great for working with people that are less technical and

it's also good for maybe not breaking the system. But if you really want to get into the weeds and be able to evolve this and have it be a very flexible system being able to have access from converting visual code into this written format like JSON that inadin does incredibly powerful. So if you take this in code which is JSON put it into some sort of coding assistant where you have all the context inside of there do you then feed you said context engineering

do you feed it a meta prompt around that and say here's all this information do you also give the use cases do you previously grab the SOPs from before or the user journeys and feed it in there as well how much additional context do you need beyond just the scope of the code itself in order to convert start into Python >> right yeah so typically what I do is I treat the end workflow as the additional context instead of the other way around. So what you were getting at is okay

start with the workflow as the main thing. Now what other things that we talked about do you include like the user story, the task list and things like that. I will actually start with all of that and then just as one of those things I wanted to reference I'll give it the end workflow. Now I will say that this is like the primary context for you to reference because it is very specific what we're building out here. But I still do want to like have the checklist of the tasks and have the

success criteria even if the success criteria is molded around like you have implemented these things from the NAN workflow but I still do have pretty much the exact same process. >> Makes sense. So when you're doing you're grabbing the context initially and then one step in your SOP process is to when I say SOP for anybody that doesn't know it means standard operating procedure. It just means a step-by-step process for moving through a thing. And when you

grab this context and then you have your own SOPs of grab context, bring them to the system and then a line item in that SOP process is the ability to say here is now the workflow or one of several workflows that I want you to convert over into Python. So let's just say you do bring that over into Python. It's inside your coding environment. How do you get this deployed? Let's talk about this. Is this something that you then you do cloud code to GitHub, GitHub,

self-hosting? What is this? Now you have Python code. What's next? The way that I deploy an application does depend on the type of app. So for front-end applications, if it's like a Nex.js app, I'll just deploy it to versel. If it's like a React application, which if you've ever built with like lovable or bolt, like they they're always React applications. Platform I really using is called Render. So you can host what are called static applications. That's what

a React app is completely for free on render. Now for backend applications, there's a couple of different platforms that I like. If you've ever like self-hosted nan, you're probably familiar with these. If you were to go to digital ocean, hner hostinger, being able to like deploy applications in a machine that basically pay monthly for in the cloud there is a really good way to host like your agents and your API endpoints. You can use render for that as well. or if you're like going really

enterprise level sometimes clients will have a certain requirement around compliance or SLAs's and then you at that point we want to go like really enterprise which the big three cloud platforms for that is Google cloud or GCP AWS and Microsoft Azure and GCP is the one that is usually like the easiest for me so that's like the enterprise cloud option that I'll go to where they have containerized serverless environments for your agents and being able to host static apps and things

there as well. Yeah, it's so funny navigating those ones and I've built applications, AI agent applications that host on those ones. AWS is one of the hardest to navigate. You ever try to give permissions to somebody inside of there? It's not as simple as make admin or give for you have to go through all of these processes. And one of the things that I've done before, I don't think they make it available right now. And I think they took it away. I hope please Google if you're listening they

had inside the whole Google workshop area the Google AI area they had the ability to screen share and I could screen share my screen and I could talk to the AI as I was doing it and I was actually building out authentications I was like okay I'm inside AWS how do I give this user these permissions where do I go what do I do and just being able to navigate that environment was just like oh come on man why is this so hard but also incredibly stable it's what the

it's what the big boys use AWS and all this GCPs all those ones. So I just want anybody that's interested in getting those ones just know there's a bit of a learning curve when you get inside of there. Fantastic. So now we've talked about how do you do front end, how do you do back end and so funny because full circle we're talking about context engineering you want to have the details in the front end you want to get really clear on that we talked about deploying

that inside of init coding environments what that looks like how to how do you use context engineering for that and then ultimately transitioning that into from workflows into this type of Python straight codebased environments putting that into there and then ultimately deploying versus front end or backend depending on the services required. Is there anything else that you think is really topical or noteworthy that you should mention in this process in the stack?

>> Yeah, so you mentioned GitHub for a sec and I didn't focus on that, but GitHub is the glue for everything because not only do I use Git to create safe states as I'm building out my application, but it's also where I host all of my code bases that I'm deploying to the cloud. So it doesn't matter if I'm using Digital Ocean, Hostinger, render, GCP, whatever. Like it's always going to be pulling from a GitHub repository, which is also how I can manage my different environments like dev, test, and

production. I can roll back versions of things if I want to deploy an old version because something new I created broke, which unfortunately happens all the time, no matter how good of a developer you are. And so Git and GitHub is like a core part of my development workflow. and it has been for like every engineer since I don't know 1970 and maybe it wasn't GitHub but just some kind of version control system was really important. So I use that a lot. I have a lot of security and testing built

in like whenever I make a commit to a GitHub repository. So like that kind of like CI/CD part of of the GitHub workflows is really important as well. For those of you who don't know what that is, like continuous integration, continuous delivery is all about like whenever I make a change in my code, I automatically want to run my security checks and my unit tests and my integration tests like doing all these validation checks every single time I create one of those save states in my

code. And even deploying things automatically as well is also really powerful. I haven't done a manual deployment in years now because we have that built into these platforms like they're able to leverage GitHub to watch for changes to the codebase and then if it's on a certain branch like our main branch then it's going to automatically deploy things new to production. Yeah. hooking up especially if you're I won't say vibe coding but let's say you're using cursor or cloud or these ones be

able to connect that up to a GitHub account so that you can have multiple stave states and be able to iterate on that process because >> part of the things is you can do your best at having the context engineering and follow a step-by-step process inside of there but if you're able to upload the GitHub you can have so many save states inside of there and you thing one of the things I had a video I did on the YouTube channel of how to actually have your nad in workflows and how to upload

them as these folders and these products that are then being tagged that have been saved up in there because there's been times that I was using a hosting service for NAD. I won't name which one, but I was using it and then I uploaded a code node that wasn't really verified and it crapped the whole system out the whole thing. >> And so my entire it was done. It was bricked and but because I had all of my backups that were daily saved up on GitHub of workflows, I could simply

extract. Yeah. Good. You only know this as someone who also develops this type of stuff when you get there when it only needs one time. So if anybody's please learn from the pains and lessons that we have these stage dates, you have these backups. There is proper methodologies for this because sometimes it's so easy and they say it's the difference between a midlevel developer and a senior is that mids are really fast and they're really good, right? Seniors usually do

things right. They think about the process. They go deep in the thoughts, right? And so some people can get up and moving really quickly and they feel like a mid, right? Or they might be a mid, but you learn these painful lessons as time goes on that will then save you so much time on the back end along with the context engineering and quick to deployment and all that fun stuff. Fantastic. This has been incredible. Cole, it's been amazing having you on the podcast. Any final words before you

tell people how they can find you and get a hold of you? Yeah, I would just say that like a lot of these things if they're not really concrete in your mind right now, what's my process for context engineering or how do I build a rag pipeline? Just start really simple and you build that muscle over time. Like you start to to realize, oh, these are the things that I should be laying out in my context, my coding assistant right away. And so you'll learn like you start

really simple and you build the system for yourself. And so there's a lot of guidance that I have like on my YouTube channel for example to get you started. But I always say that like the best system is the one that is optimized for what you are working on specifically. And so that applies to anything you're building and any process that you have for building as well. Everything with context engineering. The last thing that I want to say is that everything that

we're talking about here, if it's a little vague in your mind, like how do I set up my rag pipeline or what is my strategy for context engineering? I'll say just start really simple and you build that muscle over time. and you start to learn, okay, here are the things that I want to be really specific about to my coding assistant or here's the different file formats that I want to work with in REGG and like you just build that muscle over time. So, don't be intimidated. Just start really

simple. And I will say that as you start simple and evolve things, you're also getting to customize your system to what works for you. And that's what I always tell people is the most powerful system is the one that you've customized to specifically what you're working on. And as a starting point, I teach a lot on my YouTube channel about rag and context engineering and AI coding. And so if you just go and search my name on YouTube, Cole Medine, then you'll find it. And

yeah, I'm doing content every single week on building AI agents and leveraging AI coding assistants in ways that actually scale. >> Fantastic, Cole. It's been an honor and pleasure, my friend. Much love. I'll see you on the other side. >> Thank you, Dylan.

Stop Vibe Coding: Context Engineering & RAG for AI Agents [# 12 Cole Medin]

n8n

70 days ago

49:59

Ai Whitelist

AI Whitelist

Rank #1

Description

In this episode, we sit down with Cole Medin, CTO of Automator and expert in applied AI, to dive deep into context engineering, RAG (Retrieval Augmented Generation), and building scalable AI workflows. Cole shares practical strategies for cutting through the noise in the AI space, designing effective prompts, and moving from prototypes to production-ready systems. Whether you’re an AI builder, developer, or just curious about the latest in automation, this conversation is packed with actionable insights and real-world advice. 00:00 – Introduction: The challenge of too much “fluff” in the AI space and how to focus on what matters. 00:22 – Meet Cole Meine: Background, expertise, and his mission in applied AI. 01:59 – What listeners will learn: Context engineering, RAG, and moving workflows to production. 02:40 – The origin of context engineering: Why treating prompts and context as engineered resources matters. 03:49 – Vibe coding vs. context engineering: The importance of specificity and reducing assumptions. 06:18 – Practical steps for context engineering: Mindset shift, planning, and using AI to ask clarifying questions. 08:47 – Success criteria and user journeys: How to define what “done” looks like for AI projects. 12:36 – How much time to spend on planning: Product requirement docs and upfront investment. 13:54 – Favorite AI coding tools: Cloud Code, Codex, and Google’s Anti-Gravity. 15:23 – Staying up to date in AI: Research strategies and the value of community. 18:09 – Introduction to RAG (Retrieval Augmented Generation): What it is and why it matters. 20:41 – How RAG works: Embedding models, vector databases, and semantic search. 24:45 – Metadata filtering in RAG: Multi-tenancy, hierarchical search, and business use cases. 28:46 – Handling messy data: ETL/ELT pipelines and preparing data for AI agents. 32:06 – Scaling workflows: Moving from n8n prototypes to production code (Python/TypeScript). 34:38 – Deployment strategies: Frontend, backend, and cloud hosting options. 37:13 – The importance of version control: Using GitHub for safe states and CI/CD. 40:05 – Final advice: Start simple, build your process, and customize your system. 41:15 – Where to find more: Cole Meine’s YouTube channel for more on RAG and context engineering.

Video Details

Category

Feed

AI Whitelist

Featured Date

December 22, 2025

Quality Rank

#1

AI Recommended