Loading video player...
Okay, so Entropic just fixed one of the biggest issues with model context protocol and it's how it deals with tool definition when you connect an MCP server. Now by default when you connect a new MCP server it loads the definition of all the tools that are available on that server which results in a substantial portion of the context to be occupied by these tools without you sending even a single conversational message. There are some worse offenders of this. For example, GitHub MCP is a
good example of this phenomena. It has about 91 different tools that are loaded on connecting with the MCP server and it used to consume about 46,000 tokens, which is almost 22% of the whole context window that is available to Opus 45. It's a big problem that I have mentioned in a number of my videos, but they finally fixed it with MCP tool search, which can help you substantially reduce the context used by MCP. So, instead of loading all of the tool definition, now
the tool search allows cloud to dynamically load tools into the context when they are needed. So, in this video, I want to show you what exactly this new tool search tool is and how to use this as a developer. both if you're building MCP clients or you are building MCP server. So we already talked about what exactly the problem is. Connecting MCP servers with a lot of tools can really pollute your context window. Now Entropic has previously recommending using code execution that also
substantially reduces the number of tokens and it really looks for tools by treating the MCP as a file system. And this work is really an extension of the previous code execution workflow but using two different type of search mechanisms. Okay. So here is how your context switch window usually looks like if you connect say 5 to 10 different MCP servers with say 10 to 20 different tools. Now this is a substantial proportion of your context window. But with the tool search tool, you're going
to be dynamically loading specific tools by looking for those rather than pre-loading everything. But there are some things that you need to keep in mind. This is going to be only used when your MCP tool definitions occupies more than 10% of the context window. You can theoretically get about 85% token reduction and it dynamically uses or loads three to five different tools at once. And here's your context window supposed to look like after using the tool search tool. Now here's how it
works. So first claude or any client will first load only tool search tool as a single tool in the context window. When cloud creates a query, the system will search for tool catalog and it will try to find three to five relevant tools and after that it will only load the full definition of those three to five relevant tools. Okay. So there are two different variants of this approach. The first one is regular expression paste. So in this case, cloud writes patterns
like weather or get star data. These are best when your tool follows consistent naming conventions. Now on the other hand, you have something like BM25 which is a keyword based search mechanism. So cloud writes natural language queries like tool for weathers or database operations. This is semantic search with relevance ranking. Again we are not using any embedding here. It's mainly keyword based search mechanism and it's better when your tool names and description vary. So you'll have to pick
one of these approaches depending on whether your tool definitions follow consistent names or they are more descriptive and vary in name definitions. Okay. Now let's talk about how you would think about implementation of your tools both as a server and as a client to use them if you want to use this tool. Now as a developer your job is to make your tools findable. When cloud searches for tools it needs to find the right ones but how do we help it? The first thing
to consider is optimization of your tool description. This is the biggest lever you have in your arsenal. So look at these two examples of before and after. These are two descriptions trying to convey exactly the same idea. The first one is repetitive. It's verbose and uses a lot of token. On the other hand, the second one is a lot more concise. So it's basically get current weather, forecast or historical data and then you're putting the functionality of this
tool first and in the second part we simply have keywords that the system might be looking at. Now here are some best practices for description for tool definitions. So the first one is lead with function. This should be your first sentence. Keep it to one to two sentences. That's it. Add searchable keywords like fetch, get, retrieve, etc. Synonym also help cloud find tools and put constraints in input schema not on the description. Descriptions are for discovery. Schema is for validation. And
this is something that you really need to care about. And five that every word cost tokens. So be ruthless in optimizing the tool description. Now there's another field you should know about it's server instructions. So this is a system prompt like field that guides how cloud is going to use your tools. So look at this simple example. So for PR operations first check PR status and then view PR or approve it. So you're telling cloud the workflow. The order here matters a lot when tool
search is enabled. This helps cloud understand when to search for which tool. Now, as a developer on the client side, if you're building an MCB client, here is exactly how to implement tool search. It's a simple fourstep process. Okay. So, step one, you will have to enable the beta. Add this to the header of your API request. So, without this header, tool search won't work. This is required at the moment. Now, step two, add the tool search tool to your tools
area. So this is an additional tool that you're going to be using and you can select either regular expression based or BM25 based tool search tool. Now one of the most important thing you do not set defer loading on this tool. It needs to be loaded immediately. Step three mark your tools for deferred loading. Now you need to add this specific keyword to any tool that you don't want to immediately load when the MCP server is connected. So, this tells the system,
don't load this tool up front. Load it when Claude searches for it. And this is when you're going to get the most savings on the token usage. Okay. So, step number four, keep your essential tools loaded. So, not everything should be deferred. Keep three to five of your most frequently used tools without deferred loading. Now, here's how the pattern is going to look like. For essential tools you want to set deferred loading to false, those are the tools that are going to be automatically
loaded. And then for the tools that you want to search for, you simply set this deferred loading to true. Now here the balance is immediate access for common operations and search based for everything else. And you can do this on different tools within the different MCP servers. Okay. So here's a quick implementation checklist if you are going to be using this. There are eight different steps. So first enable the header. Second add the tool search tool either reax or IBM25. This is your
choice. Then third mark non-essential tools with deferred loading as true. Keep three to five essential without deferred loading and optimize your tool description. Lead with function. Add keywords. Six. Add server instructions. This helps cloud understand workflows. Seven, test with 30 or more tools. That's where you will see the biggest improvement. And the last one is that you need to monitor context using usage. So you need to track this before and after. Okay, I'm going to create another
video on how to do this when it's more more widely available. But when should you use uh tool search? So if you have 10 or more MCP tools which are occupying more than 10% of the context of your agent then you should definitely consider using the tool search tool. Now skip it if you only have three to five tools if all your tools are frequently used or if latency is absolutely critical because search based exploration will also increase the latency as well. Okay. There's a common
pitfall that you need to consider. The first one is don't defer load the tool search tool itself. This defeat defeats the purpose and don't make description too short. Keyword matters for search because something like get weather is worse than get current weather forecast or historical data. And third, don't keep too many tools without defer loading. If you defer load nothing, you get no benefit out of it at all. Okay, so that's it. That's the tool search tool. Go implement it. your contact
window will thank you. Now, if you have any specific implementation questions about specific implementation details, drop them in the comment. I'll try to answer. And if you are building MCP tools and found this content useful or using MCP tools, please consider subscribing to the channel. Anyways, I hope you found this video useful. Thanks for watching and as always, see you in the next
In this video, I discuss how Anthropic has addressed the issue of context window pollution caused by MCP server tool definitions. By using the new MCP tool search, you can dynamically load tools when needed, significantly reducing token usage. I explain how this new tool search works, its benefits, and step-by-step instructions for developers on both MCP clients and servers. I also share best practices for optimizing tool descriptions and highlight potential pitfalls. If you are an MCP developer or use MCP tools, this guide will help you make the most of your context window. LINKS: https://x.com/trq212/status/2011523109871108570 https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool https://github.com/anthropics/claude-code/issues/7336 My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 TIMESTAMPS: 00:00 The biggest Issue with MCP 01:24 Understanding the Context Issue 01:56 How Tool Search Solves the Problem 03:09 Variants of Tool Search Mechanisms 05:04 Best Practices for Tool Descriptions 06:12 Client-Side Implementation Steps 08:42 When to Use Tool Search 09:13 Common Pitfalls to Avoid