Loading video player...
The Cloud Code team have just fixed the biggest issue with MCP by adding tool
search, a way to reduce context by up to 95%, simply by searching for a
tool name before using it, instead of preloading all available tools into context, which could
be tens of thousands of tokens used up even before writing your first prompt. But
why wasn't this the way it worked before? And did they steal this technique from
Cloudflare? Hit subscribe and let's get into it. MCP servers are absolutely
everywhere. There's one for GitHub, Docker, Notion. There's even a better stack one, which I've
heard is really good. And with people using clawed code and LLMs for everything other
than code, it seems like MCP isn't going anywhere anytime soon. but it has its
problems. Naming collisions, command injections, and the biggest of all, token inefficiency, because all the
tools from a connected server typically gets preload into the model's context window to give
a model complete visibility. So tool names, tool descriptions, the full JSON schema documentation that
contains optional and required parameters, their types, any constraints, basically a lot of
data. The Redis team used 167 tools from four different servers, which took up over
60,000 tokens even before writing a prompt. Almost half of Opus' 200k context window,
and this is even outside of skills and plugins. So if you have a lot
of servers, that could take up a substantial amount of tokens. Yes, I know there
are models out there, like Gemini, that have a 1 million token window, but models
tend to perform worse the more things you add to their context. So what's the
best way to fix this? Well, I've seen two popular paths online. the programmatic approach,
which is what Cloudflare have done, and the search approach, which is what the Clawed's
code team have done. I'll talk about the programmatic approach a bit later, but first
let's talk about the search process, which works like this. First, Clawed checks if preloaded
MCP tools are more than 10% of the context. So that's 20K tokens if the
context window is 200K tokens. If not, then no change happens and the
model uses the MCP tools as normal. But if yes, then Claude dynamically
discovers the correct tools to use using natural language and loads in three to five
of the most relevant tools based on the prompt. it will fully load just these
tools into context for the model to use as normal. This was actually their most
requested feature on GitHub and it works similar to Agent Skills, which only loads skill
names and descriptions into context. And when it finds a skill it thinks is relevant
or a skill that was mentioned in the prompt, then it goes ahead and loads
all of that specific skill into the context window. Progressive disclosure in a nutshell. Both
Anthropic and Cursor have seen great benefits when it comes to using this approach for
MCP tools. But what about the programmatic approach? This works by models orchestrating tools through
code instead of making API calls. So for these three tools that need to work
one after the other based on the previous response, instead of making individual API tool
calls, Claude in particular can write a Python script to do all of this orchestration,
then execute the code and present the result back to the model. Cloudflare have taken
this one step further by getting the model to write TypeScript definitions for all the
available tools and then running the code in a sandbox, which is usually a worker.
The Claude code team actually tried the programmatic approach, but found search to work better,
which I find really hard to believe considering Claude is very good at writing code.
And also the agent browser CLI headless Chromium thing that Vasell have released works
very well in Claude code. And I'm sure if you could convert all MCP tools
into CLI commands using something like MC Porter, it would be much easier and context
efficient for models to run a specific CLI command for a tool instead of loading
things into context. But hey, that's just my opinion. Overall, I'm glad the issues with
MCP servers are being looked into, and maybe it might just convince me to have
more than one server installed.
Claude Code Has FINALLY Solved the MCP Context Nightmare! The team just released MCP Tool Search, a game-changing update that dynamically loads tools into context when MCP tools would otherwise consume too much space. When Claude Code detects that MCP tool descriptions would use over 10% of context, tools are loaded via search instead of being preloaded. This directly addresses a major pain point where users were running 7+ servers consuming 67k+ tokens, making MCP development far more efficient without changing how tools work for end users. š Relevant Links Tweet from Claude Code Team - https://x.com/trq212/status/2011523109871108570 Cursor MCP fix - https://cursor.com/blog/dynamic-context-discovery Anthropic Article - https://www.anthropic.com/engineering/advanced-tool-use Cloudflare Code Mode - https://blog.cloudflare.com/code-mode/ ā¤ļø More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ š± Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack