Loading video player...
In this video, I'm going to be going
over progressive disclosure within Cloud
Code. Now, one of the interesting trends
that I've noticed over the past several
months is a lot of AI infrastructure
companies from Cloudflare, Anthropic
Verscell, Cursor, from products to model
companies across the board are all
arriving at the same conclusion
independently, and that's in and around
how to build AI agents. And honestly
it's not probably what we would have
expected 6 months ago. In this video
I'm going to be touching on progressive
disclosure, but I'm also going to be
touching on bash and file systems
generally. Now, this is going to be
applicable to how you can use cloud
code, but you can also use this within
other systems as well as how you can
develop agents. Right off the bat, I
want to touch on this blog post that
came out in September from Cloudflare
code mode, the better way to use MCP. In
this article, they basically describe
that the way that we've been using MCP
is completely wrong. And one of the
ideas with this is when we load up the
MCP directly as tools to the LLM, there
are a number of different issues with
that that actually come up. You might
have those MCPS within context and then
never actually use them. And one of the
interesting things with this approach
was actually converting MCPS to
TypeScript. And the realization was
effectively models are really good at
writing code. They're not necessarily
great at leveraging MCP. And essentially
what this boils down to is what if we
just had the model write the code, find
the MCPS that it needs rather than
having that all within the context. And
just a couple months later, Anthropic
basically confirmed the same conclusion.
They released some product features.
This was something that really didn't
get a lot of attention when it
originally came out. Basically, the idea
is instead of loading all of the tool
definitions up front, the tool search
tool discovers tools on demand. Claude
only sees the tools that it actually
needs for the current task. You can see
the context window at the top here of
the previous way using 77,000 tokens of
context. And then below with the tool
search tool, it only has 8,700 tokens.
This represented an 85% reduction in
token usage while maintaining access to
your full tool library. And internal
testing showed significant accuracy
improvements on MCP evaluations when
working with large tool libraries. Opus
4 improved from 49% to 74% and Opus 4.5
improved from 79.5% to 88.1%. The idea
with this is if you only have the tools
that you actually need or are using at
the current time, that context window is
going to work much more effectively. And
what was interesting with this is just
last week cursor also confirmed the same
thing. Effectively, here's the exact
same chart that I just showed you and
they explain the efficiency gains that
they do by doing this. but has reduced
the total agent tokens by 46.9%. And the
other thing that's really great with
this is when you're putting those MCP
tools within the context and you're not
necessarily using them all the time
you're going to be spending an awful lot
of money just sending in those requests
for just if they happen to need them. I
think this paradigm basically been
confirmed by enough heavy hitters within
the industry that this is a good pattern
to follow. And the interesting things
with this is we went from the focus
being in and around GPUs and now we're
moving to an interesting area where I
think sandboxes and file systems at
least the time of inference and when
we're using these AI applications is
going to become increasingly the focus
and effectively they all arrived at the
same conclusion. Progressively disclose
only what you need to the model when you
need it. The industry trend in terms of
how this is done is give agents a file
system as well as bash and let them
leverage those methods like being able
to GP and glob and find all the
different files that it needs and load
them up only when it needs that. Now in
terms of MCP now a lot of people have
given it flak over time. I don't think
it's going anywhere but in terms of the
way that we're actually leveraging it
just like Cloudflare mentioned that's
going to be the biggest change. Instead
of burning tokens and having all of
these different servers within the agent
context, whether it's within cloud code
cursor, or the agentic products that
we're building, we're going to have a
more effective way of how we're going to
be managing MCPS. And honestly, in my
opinion, it's a great protocol. A lot of
people have adopted it. There's easy
ways to deal with authentication and
things like that depending on the
services. There's a lot that's been
ironed out. I don't think that's going
away anytime soon. But I think the big
trend, that, we're, going to, see, is, instead
of having that tool schema sit within
context, we're going to have a lot of
those tools that would have otherwise
live within context progressively
disclose because there's going to be a
lot of tools that you might only use one
in 10 times for each turn of the
conversation. For instance, maybe even
less than that. Now, in terms of the
approaches for this, what's great with
this is it's actually really easy to set
up. Skills are the most obvious way to
use this within cloud code. You can set
up a skill file. The front matter is
going to be disclosed to the model.
You're, going to, be, able, to, have, the
description when the model actually
invokes different things and you can
move from really just 10 20 30 100
tokens within the front matter and then
have these skills where it will load the
first file and then it can have
references within that skill and
progressively disclose within each skill
and then also it can go and look through
all of the different other skills that
it has and similar thing progressively
disclose and only read those and load
them up as prompts within context when
needed. The insight and the shift is
really instead of just loading
everything, burning all the tokens
having less room for actual work and
degrading results of what the model is
actually capable of doing because say if
the context is at 100,000 tokens of
context, the results that you'll get
where if it had maybe just 10,000, it's
going to, be, much, less., And, we're, moving
to a way where we're going to be
discovering things on demand, loading
only what's needed. And the other great
thing with this is we're going to have
massive token savings, which means
faster applications, cheaper
applications, and overall better
results. Agents, they need file systems
and bash, and you can effectively get
out of the way. Now, the thing with this
is it's really a different new
architecture. I was actually uneasy with
this type of idea initially, but the
thing with it, I think that's really at
the essence that makes this really nice
is it's actually really intuitive
right? You can have these different
files. They can progressively disclose.
They can be within directories. they can
encode this knowledge and that's the
interesting thing with this pattern is
you don't need to equip an agent with
the knowledge of how to use a Postgress
database or how to use this or that
every single agent out there knows how
to use bash and once you know how to use
bash you can update files you can read
files you can use all of these different
methods like skills which is effectively
progressive disclosure which is the same
type of idea with all of this the
insight that Cloudflare had was instead
of generating JSON tool calls generate
TypeScript code that runs in a sandbox
the MCP P server becomes a TypeScript
API in isolated sandboxes. And the
result that they found with this is a
98.7%
reduction in token usage. Back to the
blog post with Anthropic. Now within
advanced tool use, there were a few
different things that they put out and
they all sort of correlate to one
another. They had the tool search tool
the programmatic tool calling as well as
the memory tool. With programmatic tool
calling, for instance, the way that this
works similar to what Cloudflare
discovered, it will invoke the tools in
a code execution environment. Then in
terms of memory, these are file-based
simple markdown files. It was a similar
idea within cloud code. Something that
Boris Churnney I've heard him mention
where instead of having all of the
embeddings and vector, just have that
gentic search. It just felt better. It
just works well. And I think if you've
leveraged cloud code, seeing how it
reads different files, reads sections of
files at times, that feels like a much
better approach than all of the
mechanics that go into a lot of these
embeddings type of systems. Next up
this is a little piece of alpha. Right
now, there's an experimental MCP CLI.
There are a couple ways this is changing
at time of recording when I put this
out. This might have actually changed to
another flag. It could potentially be
removed as they're working on it. This
is something that they're actively
trying out is how to get that tool
search capability directly within Cloud
Code. Now, what you can do with this is
instead of having all of that MCP within
the context window of cloud code, if you
set this flag, you're going to be able
to have the same tool search capability
and instead of loading all of those
tokens within context, you'll be able to
have that same capability. Now, is it
perfect? Is it right? I found it work
quite well, but does it work quite as
well as having the MCPS directly within
context? I'm not entirely sure. So, this
is still actively a work in progress
but if you want to try it out, it's a
really simple flag within Cloud Code
that you can use if you want to try this
out. And the really wild thing with this
direction is if you had a number of
different MCP servers, that could easily
add up into tens of thousands of tokens
of context that was being directly
passed to the model every single time.
And this effectively brings it down to
almost zero. Now, there will be a little
bit within the system prompt in terms of
how they actually make this work and
those mechanics, but it really is orders
of magnitude less context that you're
going to, be, using, from, something, like
that., And, I, think the, big, and, exciting
thing with this is all of a sudden we
can be a lot more ambitious. We don't
need to be bound by only being able to
have 5, 10, 20 MCP servers and then all
of a sudden the performance degrades
within our application or within cloud
code. Now we can have thousands, tens of
thousands, maybe even hundreds of
thousands of skills or MCPS or whatever
that is within a directory or within a
system that's easily able to look up and
find what it needs at time of when it
needs it. And additionally, we can have
hierarchical structures similar to
skills is you can have a flat directory
of all of the different skills, but you
can also break it up into subsklls. it
can read different pieces and discover
okay, I need this reference that's
within this skill file and go down the
lineage and find what it needs. There's
a few different ways in terms of the
architecture of this, but all in all, I
think this is the paradigm shift that
we're going through right now, like
literally over the coming weeks and
months where all of a sudden we're going
to have applications that are going to
have access to a ton more capabilities
and work quite effectively through some
of these strategies. And I think what's
interesting with this whether it's
Anthropic within their web app they use
sandboxes now their sandbox products
from Verscell Cloudflare Daytona Lovable
uses a form of sandboxes all of these
different sandboxes. What it allows us
to do is to have these sort of ephemeral
file systems where we can read and write
to spin up little applications and then
shut them down as needed. I think this
is going to be much more of the paradigm
in 2026 for agentic development and how
you can also leverage cloud code. If
you're leveraging cloud code within the
cloud, similar idea here, they're
spinning up a sandbox. But what's
interesting with Anthropic is within
even cloud, the consumerf facing web
app, you'll notice that it will also
write to a file system for a lot of
different operations. It will also write
scripts as well for a lot of different
features like if you're working with a
spreadsheet or whatever it might be. All
in all, if we boil it down, MCP's file
system and code execution, that might be
the, answer,, at least, as, it, stands, right
now. Just to run through the pattern
quickly on how you could use this within
cloud code. You have access to the file
systems. It can read, write, search
files. We've all seen it leverage that
within the core methods. It has bash as
well. Execute commands, run scripts
push things to git, whatever it might
be. And then now what we can do is we
can have code execution to call the MCP
servers. The idea and the mindset to
think about is give the agent a file
system and get out of the way. Tools
become files, discovery becomes search
execution becomes code, and context
remain small. Next up, another
interesting insight that Anthropic had
was that Claude can automatically clear
old tool results as you approach your
limits. So, this is another interesting
idea that instead of having and adding
those all to context is you can
progressively remove those from context
as they might become less and less
relevant. Now, in terms of memory, I
think the way to think about this is
it's just files. This can be your
claw.md. This can be different MD files.
This can be different scripts. This can
be skills. It's nothing too complicated.
There's no embeddings. There's no
complex retrieval. Just keep it simple.
We can read it. We can edit it. We can
search it. Keep it simple. If it's
simple for us, it's going to be simple
for agents. Now, how do we actually
leverage this? So, I think it's with
skills and progressive disclosure. So
within cloud code, you can create a
skills directory and you can put
different skills that you have. It might
be a web research skill. It could be a
code review skill. Within that skill
file, you can chain different references
to different scripts, different markdown
files. And that's going to be how you
can have these different files where it
will read and load up only at time of
when it needs it. Effectively how it
works, the agent will see the front
matter, of, the, skill, and, that's, going to
be what gets loaded within the static
context. Say it's a web research skill.
You could say okay within this skill I
have firecrawl or what have you within
that skill and then the agent will only
search and read that skill folder and
load up all of the context that it needs
when it actually needs it. So the idea
with this is you're going to be able to
scale to many more skills without that
additional context bloat. Okay. So all
in all, I think you can now be more
ambitious without context burn worries.
Agents can tackle bigger tasks. Before
we had to keep tasks a little bit small.
We had to minimize tool use, watch for
context limits, worry about it
resetting, and now we can have things
that can run for multiple hours, use
potentially dozens or hundreds or maybe
even thousands of different tool
integrations. We can have these complex
workflows without complex orchestration.
If the system knows how to effectively
look up for tools as well as skills, all
of a sudden these systems become much
more powerful and much more ambitious in
terms of what we can build. We can build
systems that potentially run for hours
run autonomously all of a sudden as a
result of some of these new patterns.
Context potentially is no longer the
bottleneck. If we can just offload
context to memory to these different
files, we can leverage the tool search
capabilities. We can leverage the skill
and progressive lookup capabilities. All
of this combined is a really effective
way to manage context. We can have a
system that sort of has memory and
working memory, can write helper
scripts, can update skills. There's a
ton that we can do by leveraging the
file system and bash that I think is
pretty exciting. All in all, I think the
trend is pretty clear. I think
Cloudflare had a lot of really
interesting ideas. Then anthropic came
out, cursor, and now I think everyone is
converging on, hey, this is actually a
pretty good idea and pattern. It's a
little counterintuitive, but it does
work. and the industry is really
converging in and around the same answer
right now. Tools as files, loaded on
demand, bills, progressive disclosure
bash is all you need. That's pretty much
it for this video. If you found this
video useful, please comment, share, and
subscribe. Otherwise, until next one.
Progressive Disclosure in Claude Code In this video, we explore the concept of progressive disclosure within Claude code and its impact on building AI agents. Highlighting recent trends among top AI infrastructure companies like CloudFlare and Anthropic, we discuss the paradigm shift towards using file systems and bash to manage context effectively. Learn how to implement skills and progressive disclosure in Claude code to save tokens, improve performance, and enable more ambitious AI applications. Links: https://blog.cloudflare.com/code-mode/ https://www.anthropic.com/engineering/advanced-tool-use https://cursor.com/blog/dynamic-context-discovery https://code.claude.com/docs/en/skills 00:00 Introduction to Progressive Disclosure in Claude Code 00:03 AI Infrastructure Trends and MCP Usage 00:38 CloudFlare's Approach to MCP and Token Efficiency 01:20 Anthropic and Cursor's Confirmation of MCP Efficiency 02:57 The Shift to File Systems and Bash for AI Agents 04:24 Implementing Progressive Disclosure in Claude Code 06:34 Advanced Tool Use and Programmatic Tool Calling 07:19 Experimental MCP CLI and Future Directions 08:49 Scaling Skills and Managing Context Efficiently 13:31 Conclusion and Final Thoughts