AI Agents 3 - Agentic Design Patterns | DailyDevLists

Loading video player...

Full Transcript

7,799 words • EN

Hi everybody, Professor Gassimi here.

Welcome back to another lecture on AI

agents at Michigan State University. The

topic of today's lecture is on agentic

design patterns. Specifically, there's

three topics I'll be taking you through

today. The first is a motivation. So,

why should you care to learn about these

design patterns? The second is a survey

of some of the essential design patterns

that we're seeing used today across

common applications. And the third are

some of the emerging tools and methods,

things that you'll probably want to be

aware of if you are a researcher or

interested in creating innovations in

this space. Let's start with the

motivation. So first and foremost,

recall that an AI agent is a system that

integrates three things. There's a

generative AI system, that's typically a

large language model, and then there are

a set of tools, things like web search,

as well as memories. Think of a database

as an example. And it's the interface of

these three and the interaction with an

external environment, be it a user or

some other ecosystem that creates the AI

agent.

In order to leverage these three items,

the LLM, the tools, and the memory to

achieve a goal within this environment,

the best possible response or some other

optimization, there's three things that

the agent does. The first is perception,

trying to figure out what actually

matters from that input and importantly,

if anything was missing. There's

planning, which is figuring out what it

wants to achieve. How does it measure

the progress and what specific steps

does it take? And in order to do this

planning, this is where it leverages the

tools and the memory. The tools here are

the things it can use to help it as it

plans its specific steps. And the

memories are prior information that's

relevant or important as it wants to

plan those specific steps to achieve the

goal. Finally, of course, the last

action is to take the action, generate

the output that will change the

environment over here toward the goal.

So to help us develop a more intuitive

understanding of what these key

components are, I think it helps to

ground it in a couple of examples. Let's

start with chat GPT. So in the case of a

commercial language model like chat GPT,

you the user are the environment. You

provide inputs in the form of prompts.

And those prompts could be text, they

could be images, they could be images

plus text, or they could even be your

voice.

And when you when you provide those

inputs, what's important to understand

about this perceptual process is at

least as far as I know in 2025,

stat is only really concerned with the

text and the images you provide. So I'll

give you an example. If you use the

voice capability

on your chat GPT app on your phone or on

your machine and you speak to it and you

ask it a question in a sarcastic tone

versus a regular tone, you don't get a

difference in the response type. And the

reason is because what is being

perceived, the perceptual part of chat

is focused primarily on the text content

and the images. It's not really paying

attention to how you say things. It's

paying attention to what you say. Okay.

So, this is a real example of that

perceptual process of the agent in play.

It's trying to figure out what from the

input to pay attention to so that it can

perform its objectives as an agent.

After this perception is completed, it's

provided to a large language model which

has access to some tools and some

memories. The tools here, let's say, are

web search and the memories here are

some things that it's remembered about

you in the past. If you have used a

commercial LLM such as chat GPT for any

period of time, you've probably noticed

that at the bottom of your chat,

every once in a while, you'll get a

message that says updating memories. And

this is when the LLM decides that

there's something about what you

revealed to it in the course of the

prompt that's worth remembering about

you. your profession, uh, your age, you

know, where you live. It might be a set

of things that are considered of

interesting or relevant if it wants to

generate prospectively some responses

about you. Okay. But this LLM using a

combination of what it perceived and

knowing what the tools and memories that

are available are goes through a

planning process. Okay. This could use

chain of thought like we discussed in

our previous lecture or some other

method.

And a typical planning process might be

to determine if we need to recall

certain memories based on what the input

was. Determine if we need to call any

tools like, hey, did the user ask me a

question that's going to require a web

search? Did they ask me to choose a good

restaurant for them as an example? In

which case, I'd need to look up

restaurants in their area, maybe

recalling where they live. I'd need to

do that web search and then of course

synthesize the results so that I can

respond to the user. Okay. So as you can

see here,

an AI agent like Chat GPT integrates the

generative AI, the tools, the memories

to make sure it can action the

environment

through

certain subcomponents of the input that

it perceives. It achieves that through

planning.

So let's go through a second example

here that's probably a lot less familiar

to many of you which is an AI agent in

the context of health. So here the

environment is no longer a user who's

prompting the system but the environment

is the patient. So this patient instead

of providing words or prompts is

providing another kind of input. It's

providing labs, vitals, imaging, and

other information about the state of

their health. And just like in Chhat

GPT's case, there is a perceptual

process that was taking a subset of what

you provided it. Remember, we were

ignoring um some of the properties of

the voice and just we were only

interested in the text of what the user

said. Well, so to here, this AI agent

might only be interested in a subset of

the data that comes from the patient.

Let's say it's the labs, the vitals, and

the imaging. Now, once it collects this

data, this LLM wants to use a

combination of what it collected along

with some tools and some memory so that

it can perform planning and generate an

action that will ultimately help advance

the state of the patient. Let's say the

goal here is uh survival. I would

certainly hope that that's what a health

agent would want to achieve.

So, some tools in this case you could

see may not be something as broad as a a

standard web search, but maybe it's a

search of very specific scientific

publications from a known authority like

PubMed.

Or maybe it's a tool that can take the

images that are passed in and turn those

into a text report of any anomalies that

were seen in the image, an issue in your

brain scan, something weird in your CT

scan and so on. So you might call a tool

like Vizai that takes those images and

turns it into text so that this LLM can

reason on it. Memory in this context

might be elements from the medical

history. Do you have allergies that were

recorded from a previous visit? Do you

or your family members have a history of

cancer or other ailments? And so you can

see here that there's things there's a

pattern here, right? That's similar

between what GPT does and what this AI

agent do. There's a perception of

certain elements from the environment.

the there's a generative AI system in

LLM that's choosing to use some tools

and pull from some memories. It uses a

combination of the tools and the

memories to form a plan. Is the patient

stable? Is the diagnostic known? Should

I order tests? Should I apply

medication? So that ultimately it can

create a action. In this case, what

should I do to help the patient survive?

So there's a clear parallel here, right?

Both systems are performing perception

planning to generate actions and the way

they're doing that is with a generative

AI

uh system as well as some tools and some

memories.

The challenge here is that the proper

design of an AI agent depends on the

task. And just to give you an example

here,

let's say that this AI agent

despite having these tools and the

memories and having carefully generated

its plan makes a mistake. That happens,

right? If you've used an LLM, sometimes

they lie, sometimes they

and

what you would not like to happen is for

an action in the context of of a patient

being taken care of is wrong because

that error could be fatal. You could you

could really harm someone if you make

this mistake. And similarly, it might

not be the case when you're pulling data

from this authoritative source that all

of the content within that authoritative

source are held equal. Some papers that

get published academically are higher

quality. Others are, you know, maybe not

low quality, but they're, let's say, the

quality is less high than others. And so

there might be some things about the

system given how consequential the

outcome here is for this patient that

make this design pattern of performing

the perception letting the LLM use tools

and memory to generate an action and

applying that action to the patient that

design pattern could be problematic. So

what's the solution here? Well, one

thing we could do is take the actions

that are coming out of this agent

and we could pass them by a doctor, a

human expert who can vet, yeah, you know

what, that action applied to the patient

makes sense or no, I want to decline it

and come up with something else. And

that would solve the problem of just

passing these actions through

automatically in a loop. This is an

example of a human in the loop design

pattern because as you can see within

the loop we have a human here that's

doing a validation.

And the design patterns that we will be

covering later in this lecture are

basically

different generalized solutions that

help solve common classes of problems.

This is one example. Right? you have a

problem where

making a mistake is highly

consequential. How you solve that

problem is by putting a human in the

loop. So now that we've covered the

motivation for these agent design

patterns,

let's go through a survey of some of the

essential patterns that are out there.

Starting with the simplest one, which is

the single agent pattern. This is the

one you all know and love. It's the

super basic interaction pattern that you

have with GPT. When you're calling the

API,

you pass in a uh an input.

This could be a prompt uh that you have

and you specify in this agent a system

prompt. It could be for example

performing sentiment analysis on some

text and you just call the agent with

some text you want to perform the

sentiment analysis on and it responds in

a way that you have specified per the

system prompt. Advantages of this are

it's very easy to implement and it's

highly predictable if you've engineered

your prompts correctly. The limitation

of course is that you can't handle

multi-step tasks and every time you want

to uh achieve a new task with this

agent, you're going to have to either

change the system prompt or create a new

agent with a different system prompt.

So, it's a little rigid. For the sake of

clarity, uh a system prompt in the case

of the sentiment analysis agent might

look something like what you're seeing

on the right hand side here. So, we

would specify that the task is to

classify input text. And we list some of

the classes we're interested in. These

are the usual suspects, positive,

negative, neutral, and mixed. We might

also indicate how we want the answer

format to look in the response.

Sentiment followed by colon and one of

the four options we have up here. So,

when a user provides an input, the

perceptual mechanism captures this

input. Lauren loves her camera

and the agent uses the system prompt

plus the input to generate an output.

Positive. Super simple. It's what you're

used to. What distinguishes this from

the chain of thought pattern is that the

agent in this case is decomposing the

problem into a sequence of reasoning

steps. And at each of these reasoning

steps, it's making

in most instances, but not all, a call

to a LLM that handles that part of the

reasoning.

Okay, so we'll go through an example in

a minute, but at the highest level, this

is useful for stepbystep problem solving

tasks, math, logic, and so on. And

that's because this setup

allows you through decomposing the

problem instead of handling it in one

system prompt.

It allows you to do multi-step reasoning

that is easier to debug and therefore

easier to fix as you're doing the prompt

or agent design, which is usually

actually uh a little bit harder in the

chain of thought pattern than it is in a

standard system prompt. Okay. It's also

a little bit slower for obvious reasons.

If I'm calling an OM three times because

I've decomposed my problem into three

steps, that's obviously going to take

more time than if I call it once. Let's

look through a real example so that you

have an intuition for functionally how

this chain of thought pattern is

implemented. So let's say I have a user

input, very simple. I have two cows,

Mary has two sheep. How many animals do

we have?

And then I've designed a chain of

thought

um system prompt here which is you are

an agentic chain of thought reasoner

and you work in these four phases that I

have specified by the way you could put

something else for your reasoning

pattern. You plan, you solve, you

critique, and then you finalize.

And you always output the most

appropriate phase

and then the content for that phase.

Okay, so that's the input and the system

prompt. So the first thing that you do

is

you would call this agent with your

input, right? This is exactly what was

specified above. And given this input

and the system prompt that the LLN has,

it should provide an output where it

gives you the phase in this case

planning which was the first step.

Remember up here look first step was

planning. So it will output plan and the

content that it will output are some

steps. I'm going to count cows. I'm

going to count sheep and then I'm going

to add them.

And it will then go and make a second

call to an LLM

independent of the first one. And this

time, notice what it does. It passes

both the task as well as the plan. This

is the same plan that was output from

the first step, right? See, count cows,

count sheep, add them. So this gets

input into the LLM with the task and the

plan and the same system prompt. What

comes after plan? Well, according to our

system prompt, the next step was solve.

And so it looks at these two and uses

this to generate the solve. In this

case, the results are the cows are equal

to two,

the count of the sheep, sheep is equal

to two, and add them results in four.

Okay. Okay. And so it gives a candidate

answer to this question up here as four.

What's really important for me to note

here is that in the system prompt, you

may have noticed I didn't specify

to

make the steps in this particular

format. That was something the LLM came

up with on its own. You could, of

course, specify that format

um with more clarity or you could leave

it up to the LLM. Usually specifying

makes the performance more predictable

and so I therefore recommend it. Okay.

So anyway though we pass in the task and

the plan into the second step. We got

the solve. Okay. So this ends up coming

up into our list for the third call to

the LLM which is the critique phase. So

it takes these three it generates a

critique. Um so it looks at what were

the steps. You wanted to count the cows.

You wanted to count the sheep. you

wanted to add them and it's supposed to

look at this input and basically

criticize if this makes sense or not. In

this case, the verdict that the LLM gave

was pass. So this comes up now into the

input to the next step. And if you if

you recall the last step after the

critique phase was the finalization

phase.

So in this case this input comes into

the LLM and the output is answer is for

the justification is provided down here

and this answer is what you would pass

back to the user in the chain of thought

pattern. Okay. So I wanted to break down

this example so that you understood in a

chain of thought pattern particularly

when you're dealing with an agentic

treatment of the problem as opposed to

just regular prompt engineering

you typically make a few calls to the

LLM

and at each of the calls you might

change things about the input here to

receive the appropriate output. This was

this was one example. There's several

ways that you could structure your

system prompt to either impact the

reasoning process itself

or provide greater specificity about

um how the steps are articulated,

formatted and so on. Okay.

So let's look at another one of the

design patterns which is the tool using

agent pattern. And this is a case where

an agent leverages external tools or

APIs to enhance its capabilities. So you

have your environment out here.

Um this might be a query from a user.

This agent receives a query like hey

tell me what the temperature is in um I

don't know Scotia, the average

temperature in Nova Scotia today. And

then this agent doesn't know that from

its pre-training data, right? The LLM

doesn't know what the temperature is

today because it's not seen that when it

was trained. So, it's smart enough to

know it doesn't know that. And it calls

an API, maybe something from the weather

channel or some other tool, gets the

result, and integrates a combination of

the result of the API

plus what it knows internally in order

to respond back to the user. Okay. One

example of this um that is actually uh

pretty common is to use the calculator

API to solve math problems. And the

reason this is common is because large

language models are not,

you know, built to do mathematics.

They're built for linguistic or rather

probabilistic reasoning

uh in the inductive sense over over

words. they're not really meant to do um

formal symbolic reasoning in the

mathematical sense. Um so the advantages

of using this tool using agent pattern

is that it really I would argue that

it's actually essential for real world

applications. If you want to build any

kind of system that's going to touch

users, you will probably have to

integrate at least one tool into that

system. Um it's also if you design it

correctly very modular and extensible

because um you can make this agent here

aware of a set of tools in the system

prompt that it can call and as you add

more tools

you can update the system prompt to make

the agent aware of those new tools and

so it's therefore modular. It's

extensible. It's flexible and for this

reason this is a really common and

useful agent design pattern. The

limitations are of course that you have

to have a system in place for error

handling when the tools fail. These uh

APIs some some of them might be under

your control. It could be a database

that you own but some of them may not

be. It might be a third party

API that you call from the weather

channel. And if that fails, you need a

way for the agent to be aware of the

failure and either go to a backup tool

or respond back to the user that, hey, I

couldn't respond because there was an

error. There's also depending on what

the user passes here, there might be

some security considerations when you're

considering calling external tools. For

example, if if a user here accidentally

passes their credit card number forward,

you may not want to go pop that credit

card number and other information into

uh uh a Google search or some other

place that um would retain a public

record of it as an example. Okay, so

that's the tool use pattern. Let's go

through a real example just like we did

previously to help you develop your

intuitions for how you would implement

the tool use uh agent in practice. So

you'd start with a system prompt and it

might look something like this. Your

tool using agent, your available tools

are and then you'd list them. So maybe

we have a calculator API

and we have the post request as well as

what's expected in the body of the post

request to this API as well as how we

want the result to be returned. And then

maybe we have something similar for the

weather API. We have a uh an example API

endpoint here. Um we have in this case a

get request that it uh it requires and

we specify again the response format.

We also specify down here some rules. We

want the um output to be JSON. Uh if

it's a question that involves math,

which obviously the agent will have to

assess, then we want to call the

calculator. If it's a question about the

weather, we want to call the weather

API. And when calling the tool, we want

to return Python code as a string that

we can execute.

Uh, and finally, if we're passed a JSON

with a tool result, we want to forward

that back to the user. That's because we

got a loop here, right? The user is

going to ask a question,

the agent is going to call some tools,

and then those tools are going to

respond with results. And we don't want

to just get stuck in this loop. We want

to be able to pass those forward to the

human being. And with the system prompt

specified, the first step might be for

the LLM to choose the right tool given

the user input. So let's say the user

input is what is 5 + 2. Well, what we'd

expect given the way we wrote our system

prompt where it had the weather API and

the calculator to choose from is that it

might return an output that looks

something like this. You might notice

it's JSON formatted as we specified and

it says hey you need to call this Python

code import requests response equals

request.post here's that API endpoint

and it passed in the JSON expression in

the format that we had specified within

the system prompt and it prints let's

say the response that's returned.

Of course, after the agent does that, we

have to using the system now

call that Python code. So let's say that

the system in this case is a Python

environment. We'd run exec and we would

execute exactly the string that we were

provided by the large language model.

We would get a result from the tool.

Let's say the result is uh hopefully 10

if it's a calculator API. And then we'd

pass that result back to the LLM. So the

task was what was the tool? The tool

result was 10. And this informs what we

return back to the user. 10. Okay. So

that's a concrete example of a tool use

um pattern.

A memory augmented or what's sometimes

synonymously called a retrieval

augmented pattern is

I think of it as an extension of this

tool use agent where one of the tools

that you have down here is a database or

some other storage of information that's

relevant for generating your response.

An example of an agent that might need

to use this is something that navigates

the web, let's say to dominoes.com, and

orders your favorite pizza. Why? Because

in order to make the order, it has to

remember first what your favorite pizza

is. It would then also have to be able

to take your credit card information,

put it in the appropriate place, put

your address, and so on. So, it needs it

needs some memories about you, right? It

needs to be able to not only have those

memories about you, but to be able to

retrieve the memories as a function of

let's say what the the web page contents

look like. Okay, so memory augmented or

retrieval augmented uh patterns are also

really common when you're building real

world applications where you need

personalization,

the ability to learn and adapt over

time, uh you want to reduce mistakes. Of

course the there are some unique

challenges that come with these memory

augmented systems. The first is that you

have to deal with how you fetch from

this memory. Later lectures we're going

to be talking about some of those

retrieval strategies like if you have a

database of contents you want to pull

from that database to surface the most

relevant information to the agent. How

do you do that? That's content we're

going to be covering later in the

semester. But suffice it to say, you

have to handle this when you're doing a

retrieval augmented pattern. And what

comes with that is the risk of of

pulling the wrong information, pulling

outdated information or again surfacing

information to this external tool in the

case of the the pizza ordering agent

that uh places you at a security risk.

Maybe you you give a social security

number instead of a credit card as an

example.

Okay. Another one of these patterns is

the React pattern. And this is uh really

a combination of two of the patterns

we've seen previously. In fact, even the

previous one was just an extension of

the tool calling pattern that we saw,

which is why I didn't take you through a

specific example.

In the case of the React pattern, what

the agent is alternating between taking

reasoning steps and taking actions,

often using tools or APIs.

So, you already saw chain of thought.

That's the reasoning. And the difference

between the react pattern and chain of

thought is that at the end of a sequence

of reasoning steps, you have an action

step. And that action step could be um

calling a tool.

It could be responding. It could be

accessing a memory. And doing this in in

series, right? either reasoning followed

by action multiple times up until a

point of termination or at its simplest

just one reasoning followed by an action

loop that results in your final answer.

A common example use case here is

reasoning about a question, querying a

database, reasoning again, calling a

tool and doing that in a pattern until

you can uh answer a user question.

Really important for complex multi-step

tasks. uh a lot of the more um

performant generalpurpose AI systems

that people are building actually use

this React pattern.

Um the limitations of course are exactly

what you'd expect. It's more complex

and what comes with the complexity is

additional overheads. In this case,

there is a state management problem that

you have to recall. So, if you're

performing a multi-step reasoning and

action task and that that lasts for a

while, you may recall when we did the

chain of thought that each time we were

passing the previous

um state forward into the next stage of

the reasoning process.

Well, you could imagine that if this

gets very long or complex that the input

side after a few steps could get could

get quite long. And as we discussed in

the first lecture, if you pass too much

information

that's irrelevant

to an LLM on the input side, you end up

hurting your responses. So you have to

you have to take care of things like

state management, figuring out how you

store the intermediary results after

each reasoning step, the consequences of

the action and surfacing the subset of

those

uh steps and the actions that are

relevant for the next steps and next

actions you have to take. So there's

these additional overheads

u that results of course in uh

additional needs for robust error

handling and um some cost to the speed.

Understandably,

the human in the loop pattern is

actually a very simple extension

typically of the react pattern where you

have an agent that's performing the

reasoning and it's taking an action. But

that prior to taking that action, which

is now moved over here, compared to the

previous slide where it was here, it's

now jumped over here. In this case,

after the reasoning step is complete,

you ask a human to verify, either

decline and kind of go back to square

one or approve so that the action can be

taken.

Okay, this might be um a design pattern

if you had a tool that you wanted to

extract findings from clinical notes,

for example, that were going to be used

to schedule follow-ups with patients, CT

scans or whatever. This helps ensure

that all actions taken by the system are

validated which is great. Um you can

also configure this so that you bypass

human verification when the confidence

is high. You may recall from our prompt

engineering lecture that

there are some very simple techniques

and also some more advanced ones that

will help you assess

how confident an LLM is in its response.

The simplest of those is to just ask it

the same question five six times and see

if it changes its response or if it's

very consistent. So you could imagine a

situation here where you pass

information

um in in the reasoning process. You go

through that reasoning process three

four times. If there's a contradiction

somewhere in the outcomes of the

reasoning, then you have a human step in

to adjudicate basically the differences.

Okay. U the third advantage of this is

that when you combine this with proper

memory management, you can actually

enable an agent to learn from its

mistakes. So imagine we have a loop here

where the agent proposes a reasoning

path, a human declines, but then we

store that declining of um the proposed

action and we store it within the agent.

And then we have the agent learn from

that as a memory the next time it

generates its reasoning path. As for

example, in a few shot sense, we inject

this as an example of the wrong

reasoning path. That would make sure

that the next time we encounter a

similar problem, we're more likely to

achieve the approval. Okay, the

limitations of this of course are every

time we add a new block to a diagram

like this, we're adding additional um

uh steps that require debugging.

In the case of the unique challenges

that exist with human verification,

you if if you don't have enough that

sort of flows through this system

automatically

um without the human needing to look at

it, you risk the system being seen as

redundant. That is this human could say,

well, if I'm going to have to approve

everything and look at everything

anyway, why don't I just look at the

input directly? Why do I need your AI

agent? And so thinking through basically

what you pass to the human and what you

you bypass is very important for this

kind of AI agent pattern. A last thing

to note here is that uh human beings

also make mistakes. And so

you you might have a circumstance

depending on how reliable the human user

is here where even if they decline an

action it may or may not mean that the

action was the wrong one. So there's

sort of an interesting meta problem to

be solved when you're dealing with human

in the loop.

Okay, the last of these that I want to

cover is the agent orchestration

pattern. This one's actually very easy,

I think, to understand and it's related

to what you're doing in your first

homework assignment for this class. This

is where you have um one agent called an

an orchestrator and the agent can

basically call a bunch of simpler kind

of prefabed pre prespecified agents here

and this orchestrator can figure out how

it wants to call these agents in

sequence. Um one of those agents could

be a human by the way be to orchestrate

the responses across the set in order to

aggregate and return the results back to

the user.

This is actually pretty good. Um, it's a

simple pattern and it's pretty good at

handling complex multi-step tasks. I

think for the level of complexity

depending on how you design it, you can

also get really nice parallelism out of

this task. So if if you have for example

a task where you need three things to

happen in parallel and you've designed

your agents appropriately, you can

distribute the tasks outward, collect

the results and aggregate in the

orchestrator in less time than

sequentially putting them through. Of

course, the the main challenge of this

approach is you need you need this

orchestrator

um to either be

prompted, fine-tuned, or pre be provided

with really high quality fshot examples

for how it does the orchestration of

these agents to get high quality results

here. uh as part of that you may end up

dealing with more complex state

management problems um in a design

pattern like this.

So now that we've covered the motivation

for these design patterns in AI agents

and we've also gone over some of the the

basic or essential design patterns that

are used today. Let's shift our

attention to some of the emerging tools

and methods that we're seeing in the

scientific literature.

The source that we're going to be using

for the coverage today is the archive

paper that you see linked here. The

primary goal of that archive paper was

to cover a review of prompt engineering

methods. But they have a really

wonderful section um I think it's

section four in the paper that covers

some agents that they reviewed which

have had an impact. they've been highly

cited by the community or have been

widely used to inform secondary

activities that are under development in

the research community. So the four

groups in the taxonomy that were

provided in the paper were tool use

agents, codebased agents, observation

based agents and retrieval augmented

generation agents.

And you can see on the right hand side

here a set of the tools that are covered

within the paper. I do encourage you to

go read through all of those. For the

sake of our lecture today, I just want

to highlight four of these, specifically

the four that I'm showing here on the

furthest right, so that you have an idea

of some of the more innovative

approaches that people are taking for

the design of agents that I suspect will

be making their way into the next

generation of AI agent design patterns.

Let's start with the first one, MRKL

systems, which stands for modular

reasoning, knowledge, and language

systems. Specifically, I'd like to cover

a a really neat paper um which disclosed

a tool called tool former which was

trained to decide in a piece of text

that was provided so in a prompt which

API calls to make, when to call them,

and what arguments to pass. And what I'm

showing you here are two figures that

come from the paper. On the left hand

side, I've got the first figure which is

showing some text that someone might put

in in a prompt. Let's say here we have

the New England Journal of Medicine is a

registered trademark of then you can see

there's this purple text, the MMS.

So this purple text is actually what the

tool former model is generating. You

pass in the text up to this point. The

New England Journal of Medicine is a

registered trademark of and tool former

figures out that it can

call an API on its own just based on the

text. It can call an API to answer this

question about who owns the registered

trademark of the New England Journal of

Medicine. In this case, it calls the QA

tool

and it passes in this query and it gets

the answer Massachusetts Medical Society

which you can see matches the actual

answer in the in the text here that they

that they're showing the MMS. Another

example here is um out of the 1400

participants 400 or

and it's um smart enough to understand

that in order to answer this question

you need to call the calculator API and

take 400 /400 which yields 29 and as you

can see in the actual text that they

trained on 29% was in fact the right

answer. So this sort of very genius

trick that they used to generate tool

former was

taking data sets very similar to what

you see here where a question implicitly

was asked and the response was generated

and generating a data set where as they

stepped through this text

um at key moments they made a set of API

calls to a collection ction of APIs that

they have in the back, wiki search,

calculator, and so on. And then they

figured out what was how close basically

was the response of the API

to the authoritative text that we

trained on. Okay. So, how close was the

response Massachusetts Medical Society

to the MMS? And the idea is that if

if the API generates a response that's

very close to the MMS or in the case of

the second example, if the API generates

a response that's very close to 29%,

then we know that for a question that

looks like this in the text, that the

right thing to do is to insert an API

response here.

This is useful from a training

perspective because now we can ask any

open question

and tool former gives us a way to

suggest which API calls to make to

answer the question that we're

interested in. So a really neat uh idea.

Their GitHub repository was starred over

2,000 times. Definitely suggest you take

a look at it and read the full paper if

you're a graduate student in this

course. um a phenomenal take I think on

self-supervised

um development of a transformer that not

only knows how to generate the next

token but also knows when to call a tool

to help it answer a question. Okay, so

we've covered the first of these

systems. Let's move to the second one.

Program aided language modeling. Now,

this one is is is so so straightforward

that it won't even take me much time to

cover and in fact we covered sort of a

rendition of this when we went over the

prompt engineering lecture.

There's a paper that was published where

they they described this program aided

language model and the idea very simply

was to take any input that a user gives

and to train the system on how you

translate this input into code. So in

the case of this example, Roger has five

tennis balls. He buys two more cans of

tennis balls. Each can has three tennis

balls. how many tennis balls does he

have?

The idea is to turn this input into

uh a series of Python commands that

could be used to compute the answer and

then to simply execute the Python

command as they're showing here on the

output. Okay, so very straightforward,

very simple idea. Procedurally, how you

do this is by asking a language model to

determine

perhaps if this kind of task or question

that's being asked by your user is

better suited for deductive reasoning or

symbolic reasoning and to if the answer

to that question is yes in your chain of

thought to resort to casting the problem

to a programmatic form Python. for

example and executing the code so that

you can perform the symbolic reasoning

more effectively. Okay, so that's an

example of uh program aided language

models.

Let's move to the third example of

lifelong learning agents specifically

Voyager. Um, now this paper, uh, I

actually thought it was it was a fun

example because it does a deviation away

from text and is thinking about an AI

agent that exists in the context of the

game Minecraft. And specifically, what

this agent is trying to do is propo

propose exploratory

tasks to take or actions to do in the

context of Minecraft.

execute those tasks, save the

consequences that led to a good outcome

as memory.

How does it do this? Well, it starts by

defining basically some of the things

that the agent should try to achieve. It

should it should mine uh wood. It should

craft a table. It should combat zombies.

And eventually it should mine for

diamonds. So, it's sort of given some

guidance that it needs to do these

things as as it's trying to progress

through this sequence in order to get

ultimately to the point of mining

diamonds in the game.

um it can propose exploratory tasks that

it can take in and in this case it's the

form of um code that can be executed as

actions in the game so that you can

attack the zombie as an example and then

it checks how well it was able to

proceed through this this set. Did it

progress basically did it defeat the

zombie? Did it effectively mine uh

create the crafting table? Did it

increase its inventory of wood and so

on. And as the actions that it takes

succeed and or fail, it will update

the its memory of certain actions as

skills that it stores in a skill

library. And this skill library can be

used to either pull the best skill for a

task or you can explore in an

exploratory sense propose new tasks that

you learn from when you're interacting

with the world. So the key idea of

Voyager, which I thought was really

cool, is the proposing of new

exploratory tasks, executing them, and

saving the consequences as memories. So

that the agent isn't just responding

to the environment, but the agent is

trying to actively explore the

environment as well. And this was done

in the context of Minecraft, but you

could um anticipate this could be done

in other ecosystems as well.

Okay, let's come to the to the last one

here which is iterative retrieval

augmentation.

This is based on a paper um uh where

they described a method called flare

where they iteratively predict upcoming

sentences and if they were uncertain

about one of the tokens in those in

those sentences, they queried for an

answer. So I I need to start by

motivating why I think this is

interesting. When you're dealing with a

memory or a retrieval augmented

generation system, so that's where you

have the agent and it calls a database

to collect information. You always have

this devilish problem of when do you

call the database? Basically, when do

you phone the friend? When do you go to

the memory bank and try to pull it?

And so the innovation in this paper

which I really liked

was they came up with a principled way

to determine when you should call the

database and it works sort of like this.

They they have an input let's say

generate a summary about Joe Biden and

then they can have their language model

generate an output. Joe Biden was born

on November 20th, 1942 and is the 46th

president of the United States. Okay,

these are their examples, not mine.

What you do after this is if all of the

tokens in this sequence

the LLM was very confident about and you

can measure this by using the logets

which are the probabilities of the token

given all the previous tokens in the

sequence.

then you accept this sentence. You can

see it says, "Hey, this this looks like

it's pretty likely.

So, I'm going to move forward."

Let's say that the next sentence that it

wants to generate in the summary is Joe

Biden attended the University of

Pennsylvania where he earned a law

degree. But these two parts of the

sequence, University of Pennsylvania and

law degree,

it's less certain about those two. So

the lojits associated with the sequence

of tokens that generate these is low.

Then

this becomes the triggering event where

as you can see on the figure on the

right hand side here a search query is

performed against some database some

tool

that returns

the correct univer the correct

university. in this case University of

Delaware and in the case of the law

degree uh bachelor of arts in history

and political science. Okay. So very

simple idea um but very elegant too. Try

to figure out every time the LLM is

generating a portion of its response

where it was confident and where it was

less confident. In the places where it

was not confident, you want to go do a

retrieval of information from an

authoritative source, a database, a

tool, and so on. In the places where it

was confident, you can let it operate.

That's it for today's lecture. See you

in the next video.

AI Agents 3 - Agentic Design Patterns

Prof. Ghassemi Lectures and Tutorials

71 days ago

50:31

Agentic AI Systems

Rank #8

Description

This lecture, delivered by Dr. Mohammad Ghassemi at Michigan State University (CSE 491/895), introduces the concept of Agentic Design Patterns in the context of building AI agents The talk covers three main areas: 1. Motivation – Why design patterns are essential for agent development, especially when integrating generative AI with tools, data, memory, and planning components. 2. Survey of Agentic Design Patterns – A review of core design strategies such as the Single Agent, Chain-of-Thought, and Tool-Using patterns, with concrete examples and system prompts to illustrate their strengths and limitations. 3. Emerging Tools and Methods – How new frameworks and approaches can improve reliability, scalability, and safety of AI systems. The lecture emphasizes the importance of aligning agent design with task requirements (for instance, why improper design in medical AI could lead to harmful outcomes) and highlights reusable design structures that solve common classes of problems

Video Details

Category

Agentic AI Systems

Featured Date

November 13, 2025

Quality Rank

#8

AI Recommended