Loading video player...
Hi everybody, Professor Gassimi here.
Welcome back to another lecture on AI
agents at Michigan State University. The
topic of today's lecture is on agentic
design patterns. Specifically, there's
three topics I'll be taking you through
today. The first is a motivation. So,
why should you care to learn about these
design patterns? The second is a survey
of some of the essential design patterns
that we're seeing used today across
common applications. And the third are
some of the emerging tools and methods,
things that you'll probably want to be
aware of if you are a researcher or
interested in creating innovations in
this space. Let's start with the
motivation. So first and foremost,
recall that an AI agent is a system that
integrates three things. There's a
generative AI system, that's typically a
large language model, and then there are
a set of tools, things like web search,
as well as memories. Think of a database
as an example. And it's the interface of
these three and the interaction with an
external environment, be it a user or
some other ecosystem that creates the AI
agent.
In order to leverage these three items,
the LLM, the tools, and the memory to
achieve a goal within this environment,
the best possible response or some other
optimization, there's three things that
the agent does. The first is perception,
trying to figure out what actually
matters from that input and importantly,
if anything was missing. There's
planning, which is figuring out what it
wants to achieve. How does it measure
the progress and what specific steps
does it take? And in order to do this
planning, this is where it leverages the
tools and the memory. The tools here are
the things it can use to help it as it
plans its specific steps. And the
memories are prior information that's
relevant or important as it wants to
plan those specific steps to achieve the
goal. Finally, of course, the last
action is to take the action, generate
the output that will change the
environment over here toward the goal.
So to help us develop a more intuitive
understanding of what these key
components are, I think it helps to
ground it in a couple of examples. Let's
start with chat GPT. So in the case of a
commercial language model like chat GPT,
you the user are the environment. You
provide inputs in the form of prompts.
And those prompts could be text, they
could be images, they could be images
plus text, or they could even be your
voice.
And when you when you provide those
inputs, what's important to understand
about this perceptual process is at
least as far as I know in 2025,
stat is only really concerned with the
text and the images you provide. So I'll
give you an example. If you use the
voice capability
on your chat GPT app on your phone or on
your machine and you speak to it and you
ask it a question in a sarcastic tone
versus a regular tone, you don't get a
difference in the response type. And the
reason is because what is being
perceived, the perceptual part of chat
is focused primarily on the text content
and the images. It's not really paying
attention to how you say things. It's
paying attention to what you say. Okay.
So, this is a real example of that
perceptual process of the agent in play.
It's trying to figure out what from the
input to pay attention to so that it can
perform its objectives as an agent.
After this perception is completed, it's
provided to a large language model which
has access to some tools and some
memories. The tools here, let's say, are
web search and the memories here are
some things that it's remembered about
you in the past. If you have used a
commercial LLM such as chat GPT for any
period of time, you've probably noticed
that at the bottom of your chat,
every once in a while, you'll get a
message that says updating memories. And
this is when the LLM decides that
there's something about what you
revealed to it in the course of the
prompt that's worth remembering about
you. your profession, uh, your age, you
know, where you live. It might be a set
of things that are considered of
interesting or relevant if it wants to
generate prospectively some responses
about you. Okay. But this LLM using a
combination of what it perceived and
knowing what the tools and memories that
are available are goes through a
planning process. Okay. This could use
chain of thought like we discussed in
our previous lecture or some other
method.
And a typical planning process might be
to determine if we need to recall
certain memories based on what the input
was. Determine if we need to call any
tools like, hey, did the user ask me a
question that's going to require a web
search? Did they ask me to choose a good
restaurant for them as an example? In
which case, I'd need to look up
restaurants in their area, maybe
recalling where they live. I'd need to
do that web search and then of course
synthesize the results so that I can
respond to the user. Okay. So as you can
see here,
an AI agent like Chat GPT integrates the
generative AI, the tools, the memories
to make sure it can action the
environment
through
certain subcomponents of the input that
it perceives. It achieves that through
planning.
So let's go through a second example
here that's probably a lot less familiar
to many of you which is an AI agent in
the context of health. So here the
environment is no longer a user who's
prompting the system but the environment
is the patient. So this patient instead
of providing words or prompts is
providing another kind of input. It's
providing labs, vitals, imaging, and
other information about the state of
their health. And just like in Chhat
GPT's case, there is a perceptual
process that was taking a subset of what
you provided it. Remember, we were
ignoring um some of the properties of
the voice and just we were only
interested in the text of what the user
said. Well, so to here, this AI agent
might only be interested in a subset of
the data that comes from the patient.
Let's say it's the labs, the vitals, and
the imaging. Now, once it collects this
data, this LLM wants to use a
combination of what it collected along
with some tools and some memory so that
it can perform planning and generate an
action that will ultimately help advance
the state of the patient. Let's say the
goal here is uh survival. I would
certainly hope that that's what a health
agent would want to achieve.
So, some tools in this case you could
see may not be something as broad as a a
standard web search, but maybe it's a
search of very specific scientific
publications from a known authority like
PubMed.
Or maybe it's a tool that can take the
images that are passed in and turn those
into a text report of any anomalies that
were seen in the image, an issue in your
brain scan, something weird in your CT
scan and so on. So you might call a tool
like Vizai that takes those images and
turns it into text so that this LLM can
reason on it. Memory in this context
might be elements from the medical
history. Do you have allergies that were
recorded from a previous visit? Do you
or your family members have a history of
cancer or other ailments? And so you can
see here that there's things there's a
pattern here, right? That's similar
between what GPT does and what this AI
agent do. There's a perception of
certain elements from the environment.
the there's a generative AI system in
LLM that's choosing to use some tools
and pull from some memories. It uses a
combination of the tools and the
memories to form a plan. Is the patient
stable? Is the diagnostic known? Should
I order tests? Should I apply
medication? So that ultimately it can
create a action. In this case, what
should I do to help the patient survive?
So there's a clear parallel here, right?
Both systems are performing perception
planning to generate actions and the way
they're doing that is with a generative
AI
uh system as well as some tools and some
memories.
The challenge here is that the proper
design of an AI agent depends on the
task. And just to give you an example
here,
let's say that this AI agent
despite having these tools and the
memories and having carefully generated
its plan makes a mistake. That happens,
right? If you've used an LLM, sometimes
they lie, sometimes they
and
what you would not like to happen is for
an action in the context of of a patient
being taken care of is wrong because
that error could be fatal. You could you
could really harm someone if you make
this mistake. And similarly, it might
not be the case when you're pulling data
from this authoritative source that all
of the content within that authoritative
source are held equal. Some papers that
get published academically are higher
quality. Others are, you know, maybe not
low quality, but they're, let's say, the
quality is less high than others. And so
there might be some things about the
system given how consequential the
outcome here is for this patient that
make this design pattern of performing
the perception letting the LLM use tools
and memory to generate an action and
applying that action to the patient that
design pattern could be problematic. So
what's the solution here? Well, one
thing we could do is take the actions
that are coming out of this agent
and we could pass them by a doctor, a
human expert who can vet, yeah, you know
what, that action applied to the patient
makes sense or no, I want to decline it
and come up with something else. And
that would solve the problem of just
passing these actions through
automatically in a loop. This is an
example of a human in the loop design
pattern because as you can see within
the loop we have a human here that's
doing a validation.
And the design patterns that we will be
covering later in this lecture are
basically
different generalized solutions that
help solve common classes of problems.
This is one example. Right? you have a
problem where
making a mistake is highly
consequential. How you solve that
problem is by putting a human in the
loop. So now that we've covered the
motivation for these agent design
patterns,
let's go through a survey of some of the
essential patterns that are out there.
Starting with the simplest one, which is
the single agent pattern. This is the
one you all know and love. It's the
super basic interaction pattern that you
have with GPT. When you're calling the
API,
you pass in a uh an input.
This could be a prompt uh that you have
and you specify in this agent a system
prompt. It could be for example
performing sentiment analysis on some
text and you just call the agent with
some text you want to perform the
sentiment analysis on and it responds in
a way that you have specified per the
system prompt. Advantages of this are
it's very easy to implement and it's
highly predictable if you've engineered
your prompts correctly. The limitation
of course is that you can't handle
multi-step tasks and every time you want
to uh achieve a new task with this
agent, you're going to have to either
change the system prompt or create a new
agent with a different system prompt.
So, it's a little rigid. For the sake of
clarity, uh a system prompt in the case
of the sentiment analysis agent might
look something like what you're seeing
on the right hand side here. So, we
would specify that the task is to
classify input text. And we list some of
the classes we're interested in. These
are the usual suspects, positive,
negative, neutral, and mixed. We might
also indicate how we want the answer
format to look in the response.
Sentiment followed by colon and one of
the four options we have up here. So,
when a user provides an input, the
perceptual mechanism captures this
input. Lauren loves her camera
and the agent uses the system prompt
plus the input to generate an output.
Positive. Super simple. It's what you're
used to. What distinguishes this from
the chain of thought pattern is that the
agent in this case is decomposing the
problem into a sequence of reasoning
steps. And at each of these reasoning
steps, it's making
in most instances, but not all, a call
to a LLM that handles that part of the
reasoning.
Okay, so we'll go through an example in
a minute, but at the highest level, this
is useful for stepbystep problem solving
tasks, math, logic, and so on. And
that's because this setup
allows you through decomposing the
problem instead of handling it in one
system prompt.
It allows you to do multi-step reasoning
that is easier to debug and therefore
easier to fix as you're doing the prompt
or agent design, which is usually
actually uh a little bit harder in the
chain of thought pattern than it is in a
standard system prompt. Okay. It's also
a little bit slower for obvious reasons.
If I'm calling an OM three times because
I've decomposed my problem into three
steps, that's obviously going to take
more time than if I call it once. Let's
look through a real example so that you
have an intuition for functionally how
this chain of thought pattern is
implemented. So let's say I have a user
input, very simple. I have two cows,
Mary has two sheep. How many animals do
we have?
And then I've designed a chain of
thought
um system prompt here which is you are
an agentic chain of thought reasoner
and you work in these four phases that I
have specified by the way you could put
something else for your reasoning
pattern. You plan, you solve, you
critique, and then you finalize.
And you always output the most
appropriate phase
and then the content for that phase.
Okay, so that's the input and the system
prompt. So the first thing that you do
is
you would call this agent with your
input, right? This is exactly what was
specified above. And given this input
and the system prompt that the LLN has,
it should provide an output where it
gives you the phase in this case
planning which was the first step.
Remember up here look first step was
planning. So it will output plan and the
content that it will output are some
steps. I'm going to count cows. I'm
going to count sheep and then I'm going
to add them.
And it will then go and make a second
call to an LLM
independent of the first one. And this
time, notice what it does. It passes
both the task as well as the plan. This
is the same plan that was output from
the first step, right? See, count cows,
count sheep, add them. So this gets
input into the LLM with the task and the
plan and the same system prompt. What
comes after plan? Well, according to our
system prompt, the next step was solve.
And so it looks at these two and uses
this to generate the solve. In this
case, the results are the cows are equal
to two,
the count of the sheep, sheep is equal
to two, and add them results in four.
Okay. Okay. And so it gives a candidate
answer to this question up here as four.
What's really important for me to note
here is that in the system prompt, you
may have noticed I didn't specify
to
make the steps in this particular
format. That was something the LLM came
up with on its own. You could, of
course, specify that format
um with more clarity or you could leave
it up to the LLM. Usually specifying
makes the performance more predictable
and so I therefore recommend it. Okay.
So anyway though we pass in the task and
the plan into the second step. We got
the solve. Okay. So this ends up coming
up into our list for the third call to
the LLM which is the critique phase. So
it takes these three it generates a
critique. Um so it looks at what were
the steps. You wanted to count the cows.
You wanted to count the sheep. you
wanted to add them and it's supposed to
look at this input and basically
criticize if this makes sense or not. In
this case, the verdict that the LLM gave
was pass. So this comes up now into the
input to the next step. And if you if
you recall the last step after the
critique phase was the finalization
phase.
So in this case this input comes into
the LLM and the output is answer is for
the justification is provided down here
and this answer is what you would pass
back to the user in the chain of thought
pattern. Okay. So I wanted to break down
this example so that you understood in a
chain of thought pattern particularly
when you're dealing with an agentic
treatment of the problem as opposed to
just regular prompt engineering
you typically make a few calls to the
LLM
and at each of the calls you might
change things about the input here to
receive the appropriate output. This was
this was one example. There's several
ways that you could structure your
system prompt to either impact the
reasoning process itself
or provide greater specificity about
um how the steps are articulated,
formatted and so on. Okay.
So let's look at another one of the
design patterns which is the tool using
agent pattern. And this is a case where
an agent leverages external tools or
APIs to enhance its capabilities. So you
have your environment out here.
Um this might be a query from a user.
This agent receives a query like hey
tell me what the temperature is in um I
don't know Scotia, the average
temperature in Nova Scotia today. And
then this agent doesn't know that from
its pre-training data, right? The LLM
doesn't know what the temperature is
today because it's not seen that when it
was trained. So, it's smart enough to
know it doesn't know that. And it calls
an API, maybe something from the weather
channel or some other tool, gets the
result, and integrates a combination of
the result of the API
plus what it knows internally in order
to respond back to the user. Okay. One
example of this um that is actually uh
pretty common is to use the calculator
API to solve math problems. And the
reason this is common is because large
language models are not,
you know, built to do mathematics.
They're built for linguistic or rather
probabilistic reasoning
uh in the inductive sense over over
words. they're not really meant to do um
formal symbolic reasoning in the
mathematical sense. Um so the advantages
of using this tool using agent pattern
is that it really I would argue that
it's actually essential for real world
applications. If you want to build any
kind of system that's going to touch
users, you will probably have to
integrate at least one tool into that
system. Um it's also if you design it
correctly very modular and extensible
because um you can make this agent here
aware of a set of tools in the system
prompt that it can call and as you add
more tools
you can update the system prompt to make
the agent aware of those new tools and
so it's therefore modular. It's
extensible. It's flexible and for this
reason this is a really common and
useful agent design pattern. The
limitations are of course that you have
to have a system in place for error
handling when the tools fail. These uh
APIs some some of them might be under
your control. It could be a database
that you own but some of them may not
be. It might be a third party
API that you call from the weather
channel. And if that fails, you need a
way for the agent to be aware of the
failure and either go to a backup tool
or respond back to the user that, hey, I
couldn't respond because there was an
error. There's also depending on what
the user passes here, there might be
some security considerations when you're
considering calling external tools. For
example, if if a user here accidentally
passes their credit card number forward,
you may not want to go pop that credit
card number and other information into
uh uh a Google search or some other
place that um would retain a public
record of it as an example. Okay, so
that's the tool use pattern. Let's go
through a real example just like we did
previously to help you develop your
intuitions for how you would implement
the tool use uh agent in practice. So
you'd start with a system prompt and it
might look something like this. Your
tool using agent, your available tools
are and then you'd list them. So maybe
we have a calculator API
and we have the post request as well as
what's expected in the body of the post
request to this API as well as how we
want the result to be returned. And then
maybe we have something similar for the
weather API. We have a uh an example API
endpoint here. Um we have in this case a
get request that it uh it requires and
we specify again the response format.
We also specify down here some rules. We
want the um output to be JSON. Uh if
it's a question that involves math,
which obviously the agent will have to
assess, then we want to call the
calculator. If it's a question about the
weather, we want to call the weather
API. And when calling the tool, we want
to return Python code as a string that
we can execute.
Uh, and finally, if we're passed a JSON
with a tool result, we want to forward
that back to the user. That's because we
got a loop here, right? The user is
going to ask a question,
the agent is going to call some tools,
and then those tools are going to
respond with results. And we don't want
to just get stuck in this loop. We want
to be able to pass those forward to the
human being. And with the system prompt
specified, the first step might be for
the LLM to choose the right tool given
the user input. So let's say the user
input is what is 5 + 2. Well, what we'd
expect given the way we wrote our system
prompt where it had the weather API and
the calculator to choose from is that it
might return an output that looks
something like this. You might notice
it's JSON formatted as we specified and
it says hey you need to call this Python
code import requests response equals
request.post here's that API endpoint
and it passed in the JSON expression in
the format that we had specified within
the system prompt and it prints let's
say the response that's returned.
Of course, after the agent does that, we
have to using the system now
call that Python code. So let's say that
the system in this case is a Python
environment. We'd run exec and we would
execute exactly the string that we were
provided by the large language model.
We would get a result from the tool.
Let's say the result is uh hopefully 10
if it's a calculator API. And then we'd
pass that result back to the LLM. So the
task was what was the tool? The tool
result was 10. And this informs what we
return back to the user. 10. Okay. So
that's a concrete example of a tool use
um pattern.
A memory augmented or what's sometimes
synonymously called a retrieval
augmented pattern is
I think of it as an extension of this
tool use agent where one of the tools
that you have down here is a database or
some other storage of information that's
relevant for generating your response.
An example of an agent that might need
to use this is something that navigates
the web, let's say to dominoes.com, and
orders your favorite pizza. Why? Because
in order to make the order, it has to
remember first what your favorite pizza
is. It would then also have to be able
to take your credit card information,
put it in the appropriate place, put
your address, and so on. So, it needs it
needs some memories about you, right? It
needs to be able to not only have those
memories about you, but to be able to
retrieve the memories as a function of
let's say what the the web page contents
look like. Okay, so memory augmented or
retrieval augmented uh patterns are also
really common when you're building real
world applications where you need
personalization,
the ability to learn and adapt over
time, uh you want to reduce mistakes. Of
course the there are some unique
challenges that come with these memory
augmented systems. The first is that you
have to deal with how you fetch from
this memory. Later lectures we're going
to be talking about some of those
retrieval strategies like if you have a
database of contents you want to pull
from that database to surface the most
relevant information to the agent. How
do you do that? That's content we're
going to be covering later in the
semester. But suffice it to say, you
have to handle this when you're doing a
retrieval augmented pattern. And what
comes with that is the risk of of
pulling the wrong information, pulling
outdated information or again surfacing
information to this external tool in the
case of the the pizza ordering agent
that uh places you at a security risk.
Maybe you you give a social security
number instead of a credit card as an
example.
Okay. Another one of these patterns is
the React pattern. And this is uh really
a combination of two of the patterns
we've seen previously. In fact, even the
previous one was just an extension of
the tool calling pattern that we saw,
which is why I didn't take you through a
specific example.
In the case of the React pattern, what
the agent is alternating between taking
reasoning steps and taking actions,
often using tools or APIs.
So, you already saw chain of thought.
That's the reasoning. And the difference
between the react pattern and chain of
thought is that at the end of a sequence
of reasoning steps, you have an action
step. And that action step could be um
calling a tool.
It could be responding. It could be
accessing a memory. And doing this in in
series, right? either reasoning followed
by action multiple times up until a
point of termination or at its simplest
just one reasoning followed by an action
loop that results in your final answer.
A common example use case here is
reasoning about a question, querying a
database, reasoning again, calling a
tool and doing that in a pattern until
you can uh answer a user question.
Really important for complex multi-step
tasks. uh a lot of the more um
performant generalpurpose AI systems
that people are building actually use
this React pattern.
Um the limitations of course are exactly
what you'd expect. It's more complex
and what comes with the complexity is
additional overheads. In this case,
there is a state management problem that
you have to recall. So, if you're
performing a multi-step reasoning and
action task and that that lasts for a
while, you may recall when we did the
chain of thought that each time we were
passing the previous
um state forward into the next stage of
the reasoning process.
Well, you could imagine that if this
gets very long or complex that the input
side after a few steps could get could
get quite long. And as we discussed in
the first lecture, if you pass too much
information
that's irrelevant
to an LLM on the input side, you end up
hurting your responses. So you have to
you have to take care of things like
state management, figuring out how you
store the intermediary results after
each reasoning step, the consequences of
the action and surfacing the subset of
those
uh steps and the actions that are
relevant for the next steps and next
actions you have to take. So there's
these additional overheads
u that results of course in uh
additional needs for robust error
handling and um some cost to the speed.
Understandably,
the human in the loop pattern is
actually a very simple extension
typically of the react pattern where you
have an agent that's performing the
reasoning and it's taking an action. But
that prior to taking that action, which
is now moved over here, compared to the
previous slide where it was here, it's
now jumped over here. In this case,
after the reasoning step is complete,
you ask a human to verify, either
decline and kind of go back to square
one or approve so that the action can be
taken.
Okay, this might be um a design pattern
if you had a tool that you wanted to
extract findings from clinical notes,
for example, that were going to be used
to schedule follow-ups with patients, CT
scans or whatever. This helps ensure
that all actions taken by the system are
validated which is great. Um you can
also configure this so that you bypass
human verification when the confidence
is high. You may recall from our prompt
engineering lecture that
there are some very simple techniques
and also some more advanced ones that
will help you assess
how confident an LLM is in its response.
The simplest of those is to just ask it
the same question five six times and see
if it changes its response or if it's
very consistent. So you could imagine a
situation here where you pass
information
um in in the reasoning process. You go
through that reasoning process three
four times. If there's a contradiction
somewhere in the outcomes of the
reasoning, then you have a human step in
to adjudicate basically the differences.
Okay. U the third advantage of this is
that when you combine this with proper
memory management, you can actually
enable an agent to learn from its
mistakes. So imagine we have a loop here
where the agent proposes a reasoning
path, a human declines, but then we
store that declining of um the proposed
action and we store it within the agent.
And then we have the agent learn from
that as a memory the next time it
generates its reasoning path. As for
example, in a few shot sense, we inject
this as an example of the wrong
reasoning path. That would make sure
that the next time we encounter a
similar problem, we're more likely to
achieve the approval. Okay, the
limitations of this of course are every
time we add a new block to a diagram
like this, we're adding additional um
uh steps that require debugging.
In the case of the unique challenges
that exist with human verification,
you if if you don't have enough that
sort of flows through this system
automatically
um without the human needing to look at
it, you risk the system being seen as
redundant. That is this human could say,
well, if I'm going to have to approve
everything and look at everything
anyway, why don't I just look at the
input directly? Why do I need your AI
agent? And so thinking through basically
what you pass to the human and what you
you bypass is very important for this
kind of AI agent pattern. A last thing
to note here is that uh human beings
also make mistakes. And so
you you might have a circumstance
depending on how reliable the human user
is here where even if they decline an
action it may or may not mean that the
action was the wrong one. So there's
sort of an interesting meta problem to
be solved when you're dealing with human
in the loop.
Okay, the last of these that I want to
cover is the agent orchestration
pattern. This one's actually very easy,
I think, to understand and it's related
to what you're doing in your first
homework assignment for this class. This
is where you have um one agent called an
an orchestrator and the agent can
basically call a bunch of simpler kind
of prefabed pre prespecified agents here
and this orchestrator can figure out how
it wants to call these agents in
sequence. Um one of those agents could
be a human by the way be to orchestrate
the responses across the set in order to
aggregate and return the results back to
the user.
This is actually pretty good. Um, it's a
simple pattern and it's pretty good at
handling complex multi-step tasks. I
think for the level of complexity
depending on how you design it, you can
also get really nice parallelism out of
this task. So if if you have for example
a task where you need three things to
happen in parallel and you've designed
your agents appropriately, you can
distribute the tasks outward, collect
the results and aggregate in the
orchestrator in less time than
sequentially putting them through. Of
course, the the main challenge of this
approach is you need you need this
orchestrator
um to either be
prompted, fine-tuned, or pre be provided
with really high quality fshot examples
for how it does the orchestration of
these agents to get high quality results
here. uh as part of that you may end up
dealing with more complex state
management problems um in a design
pattern like this.
So now that we've covered the motivation
for these design patterns in AI agents
and we've also gone over some of the the
basic or essential design patterns that
are used today. Let's shift our
attention to some of the emerging tools
and methods that we're seeing in the
scientific literature.
The source that we're going to be using
for the coverage today is the archive
paper that you see linked here. The
primary goal of that archive paper was
to cover a review of prompt engineering
methods. But they have a really
wonderful section um I think it's
section four in the paper that covers
some agents that they reviewed which
have had an impact. they've been highly
cited by the community or have been
widely used to inform secondary
activities that are under development in
the research community. So the four
groups in the taxonomy that were
provided in the paper were tool use
agents, codebased agents, observation
based agents and retrieval augmented
generation agents.
And you can see on the right hand side
here a set of the tools that are covered
within the paper. I do encourage you to
go read through all of those. For the
sake of our lecture today, I just want
to highlight four of these, specifically
the four that I'm showing here on the
furthest right, so that you have an idea
of some of the more innovative
approaches that people are taking for
the design of agents that I suspect will
be making their way into the next
generation of AI agent design patterns.
Let's start with the first one, MRKL
systems, which stands for modular
reasoning, knowledge, and language
systems. Specifically, I'd like to cover
a a really neat paper um which disclosed
a tool called tool former which was
trained to decide in a piece of text
that was provided so in a prompt which
API calls to make, when to call them,
and what arguments to pass. And what I'm
showing you here are two figures that
come from the paper. On the left hand
side, I've got the first figure which is
showing some text that someone might put
in in a prompt. Let's say here we have
the New England Journal of Medicine is a
registered trademark of then you can see
there's this purple text, the MMS.
So this purple text is actually what the
tool former model is generating. You
pass in the text up to this point. The
New England Journal of Medicine is a
registered trademark of and tool former
figures out that it can
call an API on its own just based on the
text. It can call an API to answer this
question about who owns the registered
trademark of the New England Journal of
Medicine. In this case, it calls the QA
tool
and it passes in this query and it gets
the answer Massachusetts Medical Society
which you can see matches the actual
answer in the in the text here that they
that they're showing the MMS. Another
example here is um out of the 1400
participants 400 or
and it's um smart enough to understand
that in order to answer this question
you need to call the calculator API and
take 400 /400 which yields 29 and as you
can see in the actual text that they
trained on 29% was in fact the right
answer. So this sort of very genius
trick that they used to generate tool
former was
taking data sets very similar to what
you see here where a question implicitly
was asked and the response was generated
and generating a data set where as they
stepped through this text
um at key moments they made a set of API
calls to a collection ction of APIs that
they have in the back, wiki search,
calculator, and so on. And then they
figured out what was how close basically
was the response of the API
to the authoritative text that we
trained on. Okay. So, how close was the
response Massachusetts Medical Society
to the MMS? And the idea is that if
if the API generates a response that's
very close to the MMS or in the case of
the second example, if the API generates
a response that's very close to 29%,
then we know that for a question that
looks like this in the text, that the
right thing to do is to insert an API
response here.
This is useful from a training
perspective because now we can ask any
open question
and tool former gives us a way to
suggest which API calls to make to
answer the question that we're
interested in. So a really neat uh idea.
Their GitHub repository was starred over
2,000 times. Definitely suggest you take
a look at it and read the full paper if
you're a graduate student in this
course. um a phenomenal take I think on
self-supervised
um development of a transformer that not
only knows how to generate the next
token but also knows when to call a tool
to help it answer a question. Okay, so
we've covered the first of these
systems. Let's move to the second one.
Program aided language modeling. Now,
this one is is is so so straightforward
that it won't even take me much time to
cover and in fact we covered sort of a
rendition of this when we went over the
prompt engineering lecture.
There's a paper that was published where
they they described this program aided
language model and the idea very simply
was to take any input that a user gives
and to train the system on how you
translate this input into code. So in
the case of this example, Roger has five
tennis balls. He buys two more cans of
tennis balls. Each can has three tennis
balls. how many tennis balls does he
have?
The idea is to turn this input into
uh a series of Python commands that
could be used to compute the answer and
then to simply execute the Python
command as they're showing here on the
output. Okay, so very straightforward,
very simple idea. Procedurally, how you
do this is by asking a language model to
determine
perhaps if this kind of task or question
that's being asked by your user is
better suited for deductive reasoning or
symbolic reasoning and to if the answer
to that question is yes in your chain of
thought to resort to casting the problem
to a programmatic form Python. for
example and executing the code so that
you can perform the symbolic reasoning
more effectively. Okay, so that's an
example of uh program aided language
models.
Let's move to the third example of
lifelong learning agents specifically
Voyager. Um, now this paper, uh, I
actually thought it was it was a fun
example because it does a deviation away
from text and is thinking about an AI
agent that exists in the context of the
game Minecraft. And specifically, what
this agent is trying to do is propo
propose exploratory
tasks to take or actions to do in the
context of Minecraft.
execute those tasks, save the
consequences that led to a good outcome
as memory.
How does it do this? Well, it starts by
defining basically some of the things
that the agent should try to achieve. It
should it should mine uh wood. It should
craft a table. It should combat zombies.
And eventually it should mine for
diamonds. So, it's sort of given some
guidance that it needs to do these
things as as it's trying to progress
through this sequence in order to get
ultimately to the point of mining
diamonds in the game.
um it can propose exploratory tasks that
it can take in and in this case it's the
form of um code that can be executed as
actions in the game so that you can
attack the zombie as an example and then
it checks how well it was able to
proceed through this this set. Did it
progress basically did it defeat the
zombie? Did it effectively mine uh
create the crafting table? Did it
increase its inventory of wood and so
on. And as the actions that it takes
succeed and or fail, it will update
the its memory of certain actions as
skills that it stores in a skill
library. And this skill library can be
used to either pull the best skill for a
task or you can explore in an
exploratory sense propose new tasks that
you learn from when you're interacting
with the world. So the key idea of
Voyager, which I thought was really
cool, is the proposing of new
exploratory tasks, executing them, and
saving the consequences as memories. So
that the agent isn't just responding
to the environment, but the agent is
trying to actively explore the
environment as well. And this was done
in the context of Minecraft, but you
could um anticipate this could be done
in other ecosystems as well.
Okay, let's come to the to the last one
here which is iterative retrieval
augmentation.
This is based on a paper um uh where
they described a method called flare
where they iteratively predict upcoming
sentences and if they were uncertain
about one of the tokens in those in
those sentences, they queried for an
answer. So I I need to start by
motivating why I think this is
interesting. When you're dealing with a
memory or a retrieval augmented
generation system, so that's where you
have the agent and it calls a database
to collect information. You always have
this devilish problem of when do you
call the database? Basically, when do
you phone the friend? When do you go to
the memory bank and try to pull it?
And so the innovation in this paper
which I really liked
was they came up with a principled way
to determine when you should call the
database and it works sort of like this.
They they have an input let's say
generate a summary about Joe Biden and
then they can have their language model
generate an output. Joe Biden was born
on November 20th, 1942 and is the 46th
president of the United States. Okay,
these are their examples, not mine.
What you do after this is if all of the
tokens in this sequence
the LLM was very confident about and you
can measure this by using the logets
which are the probabilities of the token
given all the previous tokens in the
sequence.
then you accept this sentence. You can
see it says, "Hey, this this looks like
it's pretty likely.
So, I'm going to move forward."
Let's say that the next sentence that it
wants to generate in the summary is Joe
Biden attended the University of
Pennsylvania where he earned a law
degree. But these two parts of the
sequence, University of Pennsylvania and
law degree,
it's less certain about those two. So
the lojits associated with the sequence
of tokens that generate these is low.
Then
this becomes the triggering event where
as you can see on the figure on the
right hand side here a search query is
performed against some database some
tool
that returns
the correct univer the correct
university. in this case University of
Delaware and in the case of the law
degree uh bachelor of arts in history
and political science. Okay. So very
simple idea um but very elegant too. Try
to figure out every time the LLM is
generating a portion of its response
where it was confident and where it was
less confident. In the places where it
was not confident, you want to go do a
retrieval of information from an
authoritative source, a database, a
tool, and so on. In the places where it
was confident, you can let it operate.
That's it for today's lecture. See you
in the next video.
This lecture, delivered by Dr. Mohammad Ghassemi at Michigan State University (CSE 491/895), introduces the concept of Agentic Design Patterns in the context of building AI agents The talk covers three main areas: 1. Motivation – Why design patterns are essential for agent development, especially when integrating generative AI with tools, data, memory, and planning components. 2. Survey of Agentic Design Patterns – A review of core design strategies such as the Single Agent, Chain-of-Thought, and Tool-Using patterns, with concrete examples and system prompts to illustrate their strengths and limitations. 3. Emerging Tools and Methods – How new frameworks and approaches can improve reliability, scalability, and safety of AI systems. The lecture emphasizes the importance of aligning agent design with task requirements (for instance, why improper design in medical AI could lead to harmful outcomes) and highlights reusable design structures that solve common classes of problems