Loading video player...
In this video, I'm going to show you how
I use Langmith's new agentic features to
help me build agents. We're going to use
Langmith to help us build a simple
assistant that can triage and respond to
incoming emails. So, every time I get a
new email in my inbox, my agent is going
to read it and figure out what to do
with it. We'll use Langmith to help us
make sure that the agent is doing the
right things. Let's start by looking at
this agent.py file. In this single file,
we're going to define a simple deep
agent.
First, and also most importantly, we
have our system prompt. Most deep agents
have pretty extensive system prompts.
You move a lot of the complexity out of
the agent architecture and into the
actual prompt itself. This prompt has
some background information about who I
am, and it also has some criteria and
rules for how to handle incoming emails.
When an email comes in, the assistant
can take a bunch of different actions
with its tools. It can write an email
response. It can kick off new email
threads with new people, or it can call
a sub agent, which we'll look at more in
a second. It can also just mark the
email as read if it doesn't think I
actually need to look at it. Our sub
agent here is specifically focused on
interacting with my calendar. It has
specific prompting around how to find
meeting times where I'm available and
also book the meetings for me. And so
with these pieces, we can assemble our
deep agent. We have that overall system
prompt with some background of myself
and the direction of the different
things that I want the agent to do. We
give the main agent some general tools
to write emails, kick off new threads,
or mark emails as red. These tools are
all defined in a separate file, but they
basically just connect to the Gmail API
and let our agent actually take actions.
Then we have that sub agent called the
meetinguler. And this has access to a
few specific tools for calendar
interactions. Finding available times on
my calendar and then scheduling the
meetings. And so with just these few
lines of code, we've built a simple
personal assistant. What I really want
to emphasize here is that it's really
quick, easy to put together a deep
agent. But this is really just the first
step. The question now becomes, how do
we make sure the deep agent actually
works? How do we actually make sure it's
doing exactly what I want in different
scenarios? And how can I improve this
deep agent over time?
The natural next thing to do is just to
run the agent. So, let's do it. We have
an example email here coming from a
friend of mine, Oliver Queen. Oliver is
emailing me about wanting to talk about
deep agents, and he suggested that we
meet at 8 a.m. next Monday. We can take
the agent we just created and invoke it
on a single input message. This message
just says, "Hey, an email came in.
Handle it to the best of your ability."
The agent will take this input email and
call whatever tools it needs to handle
it.
Now, we do have a print statement here
that'll print out the final state output
of the agent. Let's go ahead and wait
for it to finish.
We can see the printed output of the
agent here, and it's super hard to read.
It's really hard for me as a human to
parse through this.
As an alternative to this print
debugging, I've actually set this agent
up to trace to Langmouth. All I did was
set my Langmouth API key as an
environment variable and set Lang Smmith
tracing to true. And so now when I ran
this code, all of the agents decisions
and outputs got logged to Langsmith.
Let's go take a look at the trace.
Langmith is our observability and eval
platform. One of the first things you
can do with Linksmith is set up tracing.
And clicking into our most recent trace
here, we can see exactly the input that
came in, that same email thread from
Oliver.
We can also now see all of the actions
that the agent is taking on it. One
important thing to note about deep
agents is that they have some built-in
tools, including the ability to call sub
aents through this task tool. The sub
aent can then call its own tools. So
this becomes a multi-layered, pretty
longunning process.
This view is already way better than the
print text dump of my agents outputs,
but as a human, it's still going to take
me a decent amount of time to click
through each of these steps and see
exactly what's going on. And so that's
where Poly comes in. Poly the parrot is
a new tool in Langmith that can read all
the different runs here and lets me chat
against this trace. I'm just going to
use one of the default prompts. I'll ask
to summarize this trace and we can watch
for a second as Paulie tackles this.
You can see that Paulie is reading
exactly what went on, listing out the
runs. Uh, and eventually, here we go.
Polly has given us a nice summary. We
can see that we received an email
request from Oliver. We delegated this
to the meetinguler sub agent. We checked
calendar availability for me
specifically on Monday. We saw it was
free, relatively late, and so we sent an
email response confirming the
availability and accepting Monday as
that time slot.
Cool. But to throw a wrench in this,
let's say I actually didn't want the
agent to do this. Let's say truthfully
that I don't love waking up early and I
don't actually want to take any meetings
before, let's say, 9:00 a.m. Eastern
time. So, how do I actually go about
fixing this? If I navigate back to VS
Code, there's a tool that we have called
Langmith Fetch. It's just a package that
you can install. And when you run Lang
Smmith fetch, you need to set the config
to pull traces from a particular
project. So for us, I'm actually going
to go back to Langmith real quick. I'm
going to grab the ID of our personal
assistant tracing project, and I'm going
to paste that when I set it in the
config of Langmith Fetch.
Now, when I run Langmith Fetch, it's
just going to pull the most recent
trace. We can see in a moment the full
message history, the email that came in,
what our assistant thought to do, it
called a tool, it did its analysis,
confirmed it works, sent the response,
and then marked the email as read.
Now, you might be thinking, why is this
even useful, right? We just saw a much
more detailed version of this in the
Langmith UI. The benefit here is that by
exposing this in the terminal, we now
allow our coding agents to actually use
this information. The coding agents can
call linksmith fetch themselves and see
what happened in the most recent traces.
I'm going to show you how you can do
this with our deep agent CLI. I'm going
to run our deep agent CLI. This kicks
off our coding agent. And if you haven't
tried the CLI before, it's pretty
similar to Cloud Code. The experience
should be pretty familiar.
I'm just going to ask this. Hey, can you
tell me what happened in the most recent
trace? Summarize it for me.
The agent's thinking and I've actually
given the agent some instructions
already in a markdown file about how to
approach this stuff and that it can use
lang fetch. So we can see the agent now
wants to run lang fetch and I'm going to
let it do it.
Now lang fetch traces are executing in
the background and in a moment we'll get
a nice summary from the agent. It says
here this short summary uh this is what
triggered it. The meeting was accepted
and confirmed and great.
So now instead of manually going and
changing the prompt myself, I can also
just talk to the coding agent. So I'm
going to say, "Hey, I like sleeping in.
I don't actually want to take early
meetings, and can you make sure that in
the future the agent only accepts these
meetings after 9:00 a.m. Also, while
you're at it, write a test to make sure
that it remembers this successfully."
And so we've given it a few tasks here,
but because I just ran lang fetch, we
have that information in the context
window.
And so the agent is going to work for a
while. It's going to list out files.
It's going to read some of the files I
have locally. And then it's going to
start making some edits.
So here we go. The agent has come up
with a recommendation. It wants to edit
agent.py. Specifically, it wants to edit
the prompt. I'm going to accept this. It
looks good to me.
Now, it's going to read our test
assistant file and add a test.
It comes up with this nice test, very
similar to the example that I just had.
And so I'm going to improve this by
adding this to test assistant.py.
Now, as a sidebar real quick, I want to
talk about why I've chosen to write
these tests in piest. Deep agents can
handle a variety of tasks and the
success of handling these different
tasks can be measured in a lot of
different ways. Writing in piest or or
viest for JavaScript gives us the
maximal flexibility.
I can in these tests assert that a
certain tool call was made in a certain
scenario. I can also in the same test
use an LM as judge to check that my
final result or my final email in this
case followed some specific criteria.
I've really found that when writing
tests for deep agents, it's nice to have
that flexibility to assert whatever
you're specifically looking for for a
particular given input. And both viest
and piest afford a lot of flexibility
here.
So now let's run the test. And the
beauty of this is that in running this
test, we're also logging more
information to Langmith. And so after we
run this test, whether or not it passes,
we can ask the agent to list out this
trace with Langmith fetch. See what just
happened, see if it was acceptable. And
if it fails, we now have this
programmatic agentic loop where the
agent has a clear reward function. It
can keep iterating on this prompt until
this test passes.
And so just to rehash the different
things that we covered in this video,
deep agents super easy to get started
with. It's really easy with just a
prompt and a few tools to come up with a
pretty powerful agent. But that is
really just the first step. From there,
it's really important to figure out if
your agent is actually doing what you
want it to. The best way to do that is
by setting up tracing to Langmith so you
have full visibility.
In Langmith, you can talk to Poly and
ask Poly questions about what your agent
actually did during execution. And this
is just a quick way to speed things up.
I've seen deep agents that can go for
hundreds of turns, takes several
minutes, and that can be a pain to walk
through yourself.
Then we have linksmith fetch, which
works in the terminal to pull trace
information in. This is really powerful
when you make it accessible to coding
agents.
And so these tools are all intended to
make it really easy for us as developers
to work with AI while building deep
agents. You can try out Poly and
Langmith Fetch today. Let me know how it
works for you. Pix.
In this video, we walk through how to build and observe a deep agent using LangSmith. We’ll build a simple email assistant that reads incoming emails and decides how to handle them — triage, respond, or take action — using a prompt-driven approach. You’ll learn: How to define a deep agent in a single file • Why most agent complexity lives in the system prompt (not the architecture) • How to encode rules, context, and decision criteria into prompts • How to use LangSmith to observe, validate, and debug agent behavior This walkthrough is useful if you’re building longer-running agents and want confidence that your agent is doing the right thing at each step. - Learn more about LangSmith: https://docs.langchain.com/langsmith - Learn more about debugging deep agents: https://blog.langchain.com./debugging-deep-agents-with-langsmith/ - Learn more about agent engineering: https://blog.langchain.com/agent-engineering-a-new-discipline/