Loading video player...
Hey, this is Lance from Langchain. I
want to talk of a few general context
engineering principles and how they show
up in various popular agents like manis
like cloud code and also in our recently
released deep agents package and CLI.
So first agent can be simply thought of
as an LLM calling tools in a loop. An
LLM kind of makes a tool call. Tool is
executed. observation from a tool goes
back to the LM and this continues until
some termination condition. Now the
length of tasks that AI agents can
perform is getting longer. A nice result
from meter shows that the task length is
doubling around every seven months. Now
the challenge with this is that as
agents take on longer tasks you
accumulate more tool results. For
example, Manis mentioned that the
average Manis task is over 50 tool
calls. Likewise, Anthropic has mentioned
that production agents can often be
hundreds of turns. The problem is that
as you populate the context window with
results from all these different tool
calls, you're passing all those prior
tool results back through the model at
every turn. And so the cost and latency
associated with running your agent can
really blow up. And not only that,
performance can degrade. So, Chrome has
a nice report on conx rot that discusses
how performance degrades with respect to
context length.
And so, what we've seen is that agents
are increasingly being designed with a
few different principles to help address
this.
Of course, agents have a few common
primitives, a model, prompting, tools,
and often hooks.
Take cloud code as example using cloud
series models. The system prompt is
actually available. You can look at it
at this link here. I'll make sure that
this document is in the video
description. It has around a dozen
native tools and it does allow for hooks
which are basically scripts that can be
programmatically run at different points
in this agent life cycle. For example,
before each tool call or after each tool
call. Now, our deep agents package and
our deep agent CLI is similarly set up
with these primitives. The package
allows for any model provider. The CLI
uses openthropic currently. You can see
the prompts. It's all open source. It's
using eight native tools and 11 native
tools for the package in the CLI. I'll
show those later in detail. And we also
allow for hooks at various points in the
agent life cycle.
Now these primitives that kind of make
up what we call an agent harness in
mind, what are the common techniques
that we see across different agents for
managing the problem of context rot and
of accumulating tokens from many turns
of tool calls? Well, context engineering
is kind of the broad term that captures
many of these principles. Carpathy
outlines it very nicely here. It's the
delicate art and science of filling the
context with just the right information
for the next step, which is very
applicable to agents. you're trying to
steer the agent to make the right next
tool call along a trajectory of actions.
And the three common principles I like
to distill are offload, reduce, and
isolate. So offloading is moving context
from the LM context window to something
external like a file system where it can
be selectively retrieved later as
needed. Reducing is just simply reducing
the size of context pass at each turn
and there can be a bunch of different
techniques to do that. And finally,
isolating context. So using separate
context windows or separate sub aents
for individual tasks and I share some
references here. I've talked about this
on latent space podcast. I had a webinar
with Manis where we talked through these
principles and how Manis uses them. I'm
going to review them here and also talk
about how deep agents package and CLI
employs these ideas. So first offloading
context a trend that we've seen
repeatedly is that giving agents access
to a file system is very useful. It lets
agents save and recall information
during longunning tasks.
And this is pretty intuitive. I share a
link here from Anthropics multi-agent
researcher where they basically have the
researcher write a plan, write it to a
file, go do a bunch of work, and then
they just retrieve that plan after a
bunch of sub aents did work, make sure
that everything's been addressed. So you
can just write to a file and read it
back into context when you need to kind
of reinforce the plan that was laid out.
And this is very useful to ensure that
you actually don't forget specific steps
in the plan. By externalizing it to
file, reading it back into context, you
ensure that it's persisted and that the
agent can be more easily steered. Since
you're selectively pulling it back into
the context window as needed to help
keep the agent on track.
Now another interesting thing about the
file system is often times it's
persistent across different agent
invocations. For example, if you're
running your agent locally on your
laptop with cloud code, cloud code can
always reference this cloud MD file
which can live at various levels. It can
live at the project level and also
there's a global cloud MD. This cloud MD
can store information that you want to
persist across all your different
interactions with cloud code as an
example. So manis uses these same ideas.
Of course with manis it runs remotely.
So it uses a sandbox environment which
contains a file system and gives the
agent access to a computer and it
supports user memory. Now the deep
agents package allows for different
backends. So you can use the langraph
state object which is just in memory or
you can use a file system backend for
example your local machine. And the deep
agent CLI is a lot like cloud code
running on your laptop where it will
just use your local file system as a
backend. The deep agent CLI also support
for memory using a memories directory as
well as an agent.mmd file.
The principle here we've seen repeatedly
is that giving agents the ability to
offload context to a file system has a
lot of benefits. You can persist
information during long running
trajectories and you can persist
information and you can persist context
across different invocations of the
agent in things like a cloudmd file or
an agent MD file or in the case of deep
agent CLI a memories directory. Now
another benefit of the file system is
that you can actually offload actions
from tools to just scripts. Now what do
I mean by this? We want agents to
perform actions. Let's say we want to
give an agent 10 different actions.
Often you can think about that as okay,
for every action I'm defining a unique
tool. I'll bind all those tools to the
agent. So I have an agent with 10
different tools. Now the LM in that
agent has to determine when to use each
of those 10 tools. And you also have to
load all those tool instructions into
the system prompt. So there's two
problems there. One is confusion in
terms of what tool to use. And two,
you're also bloating your instructions
with a bunch of tool descriptions. Now
look, with three or four or even 10
tools, it's not a big issue. But if we
talk about hundreds of tools, this can
be significant tokens just spent on all
the tool descriptions.
So one principle and in the webinar with
Manis, we cover this in depth is
actually keeping the function calling
layer very lightweight. So give the
agent only a few functions to call, but
make sure they're are very general
atomic functions that can do lots of
things and push a lot of the actions out
to something like scripts in a file
system. So for example, Manis gives the
agent like a bash tool and file system
manipulation tools. And with those two
things, it can just search a directory
of scripts using various tools to
navigate the file system and execute any
one using the bash tool. So with like
three or four simple tools for file
manipulation as well as code execution,
it can perform a very large number of
actions as specified by the scripts that
you give it. And so that's a way to
expand the action space of the agent
significantly while only giving it
access to a small number of tools. And
this principle we see repeatedly if you
look at cloud code, Boris Churnney and
Cat Woo, the engineering and product
leads of Cloud Co, were recently on a
great podcast. I have the link here
where they mention that cloud code is
only using around a dozen tools. And
when you're using it, you can kind of
see it uses glob grip. It uses bash. It
uses fetch to grab URLs, but it's not
using that many tools. It's only about a
dozen. Manis is using less than 20
tools. With deep agents, we actually
only have eight native tools. And with
the deep agent CLI, we have 11 native
tools. I'll show those below. Now, a
related idea is progressive disclosure
of actions. Anthropic talks about this
specifically in its recent release of
skills
and this is an interesting quote from a
nice blog post that I link here. Claude
skills are very simply a skills folder
which a bunch of with a bunch of
subfolders each of which is a specific
skill and each subfolder just has this
skill MD file a markdown file with a
header. The header just explains in very
brief language what that skill does. The
header is the only thing that's loaded
into cloud code initially and you can
see in this diagram that's exactly what
they show here.
So there's a brief snippet about each
skill available.
Now in the case of claude skills if cla
wants to use any given skill it just
then can selectively read the full
skill.md file. So again just the header
is read into the system prompt by
default. If Claude wants to actually
execute a skill, it'll read that full
skill MD file. Now, that skillmd file
can reference any other files in that
same skill directory. So, it could
contain scripts. It could contain other
files that contain more context. And so,
what's really nice is Claude with only
its bash tool as an example can just go
ahead and read the full skill MD file
and then if needed can execute any other
scripts in that same skill directory or
read any other files in. So, it's just a
nice way of progressively disclosing
actions to Claude without loading all
that into the system prompt ahead of
time and importantly without binding all
those different capabilities or skills
as tools. Remember, you're only using
for example in the simplest case the
bash tool to read the skillmd file and
then to execute any scripts in that
skill folder or read any other files in
that folder as well. So, I think about
this as a very simple way to give agents
access to different actions in a way
that saves tokens because they're
progressively disclosed only if in this
case Claude needs the skill.
And it's only using simple built-in
tools like the bash tool and maybe some
file manipulation tools. So, Manis is
using a very similar principle. The
manis agent has access to a large number
of different scripts and it can discover
those scripts using its native file
search as well as bash tools. Now, we
don't yet have this notion of skills in
the deep agent CLI, but I'm actually
working on adding that right now because
I think it's a very nice way to give an
agent access to lots of actions without
bloating its context window with
instructions
and without having to bind additional
tools. Now I do just want to briefly
make it even more crisp what specific
tools are in the deep agents package
just to highlight this point that often
we're seeing agents ship with small
numbers of general atomic tools. So deep
agents package only has basic tools for
file manipulation
a task tool for creating subtasks with
sub aents and a to-dos tool to generate
to-dos. The CLI extends this slightly
with some search tools
and a bash tool. Now let's talk about
reducing context. There's three
interesting ideas here. Compaction,
summarization, and filtering. So first
I'll talk a little bit about what Manis
does. So Manis uses this idea of
compaction. So this on the left is
showing a trajectory of tool calls and
tool results. And of course tool results
can be quite tokenheavy. Now what they
do is they just compact old tool results
by saving their full result to a file
and just referencing that file in the
message history. Now, they only do this
with what you might call stale tool
results that have already been acted on,
but it's a very nice way to reduce
tokens in the message history.
And so, this is kind of a neat diagram
that they showed. Imagine your agent's
running. It's performing many turns. So,
after some number of turns, you get very
close to the context window of the LM.
And that's when they apply this
compaction. So, they take all the
historical tool results. They're all
bloating that message history and they
compact them all down, offload them to
the file system and that brings down the
overall context utilization
significantly. The agent keeps running
and this progressively starts to
saturate and then they apply
summarization. So summarization looks at
the entire message history which
includes the full tool result messages
and summarizes it all down to much more
compact distilled summary which then the
agent can use and you can see goes
forward. One interesting point is that
this compaction step is actually
reversible because you can always go
back and look at the raw tool results
which are saved to these files. That's
another benefit using the file system.
Summarization though is not. So that is
a step that needs to be carefully
thought through because when you do
summarization you necessarily lose
information. Now you see these ideas
employed by Enthropic as well. So,
Enthropic recently shipped context
editing which just prunes the message
history of old two results in a
configurable manner and cloud code
applies summarization when you hit
around 95% of the context window. Now,
the deep agents package applies
summarization with summarization
middleware
and so that automatically kicks off
after some threshold 170,000 tokens and
it preserves some number of messages.
Of course, is all open source and
configurable.
Now, one of the other things employed in
the deep agents package and CLI is that
file system middleware will actually
filter large tool results, which is a
nice way to prevent excessively large
tool results from being passed directly
to the LM. Now, finally, let's talk
about context isolation. This is a
technique that we've seen employed
repeatedly. And this is a pretty simple
idea. Many tasks performed by an agent
can be assigned to a sub agent. That sub
aent has its own context window. And so
it can start fresh on a particular task,
particularly if that task is nicely
self-contained, execute that task and
just return the output back to the
parent agent. And that's this first
pattern shown here. And this was
discussed by Manis as well. This
communication pattern. So you have a
parent or main agent. It wants to spawn
a sub aent to do some task. It passes
some instructions to that sub aent. The
sub aent churns along and passes that
result back to the main agent. That's a
very common pattern. Now there is some
nuance here. Sometimes you want to
actually share more context with that
sub agent and actually manice allows for
sharing
the full message history that the parent
has with the sub agent. Similarly with
deep agents similarly with a deep agent
CLI the sub agent actually has access to
the same file system as the parent. So
there is some shared context between
them. So just to summarize agent
harnesses typically employ at least
three principles for managing context.
offloading, reducing and isolating. So
some of the most common ideas in context
offloading include using the file
system. We see that across the board.
Cloud code, manice and the deep agent
CLI all support use of the file system.
Enabling user memories. This is
intuitively the ability to remember
information across agent invocations.
Cloud enables it with cloud MD. Deep
agent CLI has a memories folder as a
memories directory as well as agents MD.
Manis also supports
cross- session memory. Use minimal
tools. This can significantly save
tokens in terms of tool descriptions and
minimize the number of decisions that
the agent has to make across different
tools. Cloud code uses only around a
dozen tools. Manis is less than 20. DB
agent CLI is 11. Give the agent a
computer i.e. a bash tool. All these
agent harnesses do that. Progressive
disclosure of actions. So cloud code
does this with skills. Manis does this
by basically giving the agent access to
a directory with a whole bunch of
different scripts and letting it peruse
that directory on an as needed basis
using its existing file system and bash
tools. Skills for deep agent CLI are
work in progress. Now this idea of
compaction basically pruning old tool
messages. Manis for sure does it. The
cloud SDK does support it in this idea
of context editing they call it. I
assume it's being done in cloud code but
I'm not positive. So actually should
flag I should probably flag this as
yellow because I'm not entirely sure but
I imagine it is being done.
We know for sure that the cloud code
does summarization
once you hit around 95% of the context
window. Manis does this. Deep agent CLI
does this.
And all three support sub agents for
isolating different tasks to unique
context windows.
Now the deep agent CLI is open source.
contributions are welcome and it's fun
to try to employ the these ideas in open
source harness that can be used with
many different models. So hopefully this
was a useful overview of how these
principles operate across different
popular agent harnesses and how they're
being used in the deep agent CLI. And
any questions or contributions are very
welcome. Thanks.
This video covers the core principles of context engineering for AI agents and how they're implemented across popular frameworks like Claude Code, Manus, and LangChain's DeepAgents. As AI agents tackle increasingly complex tasks, managing context windows becomes critical. This video breaks down three key principles—offload, reduce, and isolate—and shows how leading agent frameworks implement them to handle longer tasks efficiently. 0:00 Introduction to Context Engineering 1:00 Agent Primitives & Harnesses 3:00 Context Engineering Principles 4:00 Offloading Context: File Systems 6:00 Offloading Actions to Scripts 8:00 Progressive Disclosure of Actions 11:00 DeepAgents Tool Overview 12:00 Reducing Context: Compaction 13:00 Reducing Context: Summarization 14:00 Reducing Context: Filtering 15:00 Isolating Context: Subagents 16:00 Summary & Comparison 17:00 Conclusion Video notes: https://www.notion.so/Context-Engineering-for-Agents-2a1808527b17803ba221c2ced7eef508?source=copy_link