Loading video player...
Okay, so MCP or model context protocol
is great in theory, but in practice,
well, not so much. In fact, there's a
new blog post from Enthropic which shows
that it has probably been the wrong
abstract for agents. I have previously
talked about security issues related to
MCPS, but there is a bigger problem and
it all boils down to context management.
MCPS are pretty bad at it. So, here is a
quick example. Right now we have only
two different MCBs connected. But if we
look at the context, it's already taking
almost 20,000 tokens which is about 10%
of the context. Now it's just the tool
definition. We haven't even sent a
single message. So using MCP servers are
contributing a lot to context rot. But
there is actually more. when a tool gets
called the intermediate results are also
sent to the context of your LLM or agent
and you can easily see that the context
management becomes a nightmare. So in
this blog post code execution with MCP
building more efficient agents enthropic
is highlighting these two problems tool
definitions overload the context window
and then intermediate tool results
consume additional tokens. So for
example, here is a tool definition. Now
depending on how many tools are
connected to the MCPS, all of them are
going to be loaded to the context of
your agent and that bloat your context
and this is even before you do anything
meaningful. Similarly, if a tool is
executed again, it's going to bring in
the tool results and the intermediate
results which may or may not be needed
at all. This again is going to bloat or
poison your context. So as an example,
they have that if you have a two hours
sales meeting, you are looking at about
50,000 additional tokens just by reading
that transcript that is going to be
returned by a tool. You don't probably
need that at all. Now the solution is
not to directly use tools through MCP
but rather build code agents that are
going to interact with these tools
independently through code. There is an
earlier paper executable code actions
elicit better LLM agents. Enthropic
didn't cite this paper but the idea is
very similar. There's also very
interesting work from Cloudflare code
mode the better way to use MCP and in
here they say that most agents today use
MCP by directly exposing the tool to
LLM. We tried something different
convert MCP tools into TypeScript API
and then ask an LLM to write code that
call that API. Now if you remember
within the MCP server tools are
basically wrappers around APIs. The idea
is that it standardized them but we were
kind of going back now and we are
creating these tool agents which are
going to directly call these API or
tools. Now according to the Cloudflare
team, they found the agents are able to
handle many more tools and more complex
tools when those tools are presented as
TypeScript API rather than being
directly called as tools the way MCP
servers does. And especially it's more
useful to use code agents if you have
multiple different tools chained
together. Okay. So how exactly do you do
this? Well, in this case, you treat the
tools as a directory structure. So,
every tool is going to be implemented in
a separate file. Now, here I have tried
to visualize that. So, you have your AI
agent. Then this is your MCP server.
Within the MCP server, you have
different types of tools that are
available. But if you look at this, this
becomes a simple directory structure.
Now, your agent is going to generate
code in order to call different tools.
But the question is how exactly it's
going to find a relevant tool. Well,
give your agent simple bash tools and
grip commands. So in this case, every
tool definition is going to be
implemented in a file that just becomes
your API call and then you give your
agent the ability to do code execution
very similar to what cloud code does.
Now CEX also uses very similar approach.
that uses a combination of bash and grip
commands to do retrieval on code. So
instead of putting all of the tool
definitions in the context of the LLM,
you basically search for the tools that
are going to be needed, but you do that
through code execution. So here is a
simple workflow that I created. First
the user request comes in, right? Then
it looks at the directory structure.
It's going to try to find tools that are
going to be relevant to this task. So
instead of putting all the tool
definitions in the context, we just do a
simple search and find the most relevant
tools in here. Then it writes code to
execute those tools. Now the beauty is
that you can catenate multiple different
tool calls, but the results are not
being directly exposed into the context
of the LLM yet. Right? So you execute
them. If it needs to get information or
data from external um sources, you're
going to be able to do those as well
because these are just code execution.
You get the data and you can do further
processing on the data. Now when
everything is ready, you can just pass
on the final results plus the tasks that
were taken to the LLM or your AI agent.
Now this way you'll be able to preserve
and the context of your LLM or agent
without introducing too much noise into
it. The Cloud Flare team has built
agents on Cloudflare following exactly
the same pattern. So here is a quick
overview of traditional MCP server. You
have a number of different tools that
are available which the MCP server
provides the tool schema plus it's going
to provide the definition. Now you first
get the list of the tools then the agent
decide which tool to use and it will
provide specific instructions where the
MCP server execute those tools. Now
here's the architecture that the cloud
flare team recommends. So for
traditional MCP servers the MCP server
provides the tool schema. Those are
going to be passed on to the context of
the agent and then the agent will decide
which tool to use. So it provides
specific text sequences to invoke a
certain MCP server. Again this is
basically pollutes the context. Right
now in the code mode you have your tool
schema but instead of directly passing
on the tool definition you are going to
create a list of different API calls.
Now, Tropic recommends to do search
based on model needs or the agent needs.
But essentially, it will do search on
which tools to use. Then it's going to
write code to make those API calls. The
agent worker is going to execute those.
It's a separate sandbox which is
separate from the context of your model
or agent. You get the results. Those
results are sent back to the LLM. So all
the execution and everything is
happening within this dynamic isolate
sandbox which is not rotting the context
of your main agent. Now there are some
really great benefits to it. So the
first one is progressive disclosure. So
rather than putting everything, you are
going to be looking at tool definitions
on demand and this results in much more
efficient tool calls because again
you're not stuffing everything into the
context of the model and the tool
execution is happening in a separate
sandbox. This also brings us to privacy
preserving operations. Security is a big
issue with MCPS especially the tools
from one MCP server can be invoked by
another server and also there is a
systemwide or system level access right
now on top of the results even before
sending it to the LLM. So one simple
example that they have here is you can
anonymize your data. Let's say if you
have personally identifiable information
before even sending that to NLM you can
basically mask that when the agent
writes code for it and then you dean
anonymize it. Another idea that they
have is the state persistent and skills.
Now skills is this new concept recently
introduced by Enthropic which also
progressively loads information rather
than dumping everything all at once.
Right? So you could potentially use
these code agents to directly interact
with it. Now what exactly does it mean
for MCP servers? Uh because essentially
you are just replacing its purpose with
code agents. Now the Cloudflare team
tried to answer this. They say MCPs is
still useful because it's it is uniform.
So MCP is designed for tool calling but
it doesn't actually have to be used that
way. The tools that MCP server exposes
are really just an RPC interface with
attached documentation. We don't really
have to present them as tools. Well,
does this mean we don't need MCPS
anymore? The main benefit is that MCPS
makes everything uniform. They were
designed for tool calling but you don't
have to use them in that way. The tools
in the MCP server exposes are just 3D
RPC interface with attached
documentation. And their idea is that
you don't really need to present them as
tools. You can just write APIs to
directly interact with those tools. They
say that it turns out MCP does something
else that's really useful. It provides a
uniform way to connect to and learn
about an API. So an AI agent can use MCP
server even if the agents developer
never heard of the particular MCP server
and the MCP servers developer never
heard of the particular agent. The idea
is that you are going to be providing a
unified or uniform API interface. Even
if you're using the directory structure
for tool calling using the code agent,
you can still benefit from this uniform
structure or uniform interface that MCP
provides. Especially if the agent is
writing code and therefore they say that
we would like the AI agent to run in a
sandbox such that it can only access the
tools we give it. MCP makes it possible
for agentic frameworks to implement this
and by handling connectivity and
authorization in the standard way
independent of the AI code. So that
means the main benefit of MCP is going
to be this uniform interface that it
provides for the agent to interact with
the tools or APIs via code. So we'll
have to look at MCPS from a very
different perspective moving forward.
Again, if you are looking at MCP
servers, don't just use them. Do let me
know what you think and how your
experience has been. In my personal
experience, I have started using MCPS
less and less specifically because of
the context rot that it introduced. But
this approach does seem to be able to
reduce that. Anyways, I hope you found
this video useful. Thanks for watching
and as always, see you in the next one.
MCP (Model Context Protocol) sounds great, but in practice it bloats an AI agent’s context with tool definitions and intermediate results—burning tokens and adding noise. This video shows why code-first agents—using sandboxed code execution and on-demand tool discovery—reduce token bloat, improve privacy, and scale better than direct MCP tool calls. https://www.anthropic.com/engineering/code-execution-with-mcp https://x.com/AnthropicAI/status/1985846791842250860 https://blog.cloudflare.com/code-mode/ Website: https://engineerprompt.ai/ My voice to text system: whryte.com RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0