Master ALL 20 Agentic AI Design Patterns [Complete Course] | DailyDevLists

Loading video player...

Full Transcript

13,015 words • EN

There are 20 agentic design patterns

that separate pros from beginners. A

Google engineer recently released a

400page book on agentic design patterns.

And in this video, I'm going to give you

a summary of that book in plain English.

I spent hours trying to simplify these

architectures to make them as easy to

understand as possible. No jargon, no

theory, just practical patterns that you

can use today. And each one of these

patterns could solve an actual problem

that you're facing right now. And if you

watch till the end of the video, you'll

have a deeper understanding of agentic

structures and I'm going to give you

tons of stuff for free to help upskill

you as well. So without further ado,

let's dive into it. So I have 20 agentic

patterns that we're going to dive

through today. And each one has a set of

visuals. So it's not going to be some

clear-cut and dry just plain text. We're

going to walk through actual workflows

where I did my best to label what would

happen in each pattern in plain English.

And the first one we're starting with is

prompt chaining. Now, for each one of

these, I'm going to give you a summary

TLDDR, too long, don't read of what is

involved in this design pattern, and

then I'll walk through very quickly when

you could use it, where you could use

it, the pros and cons, and some actual

applications for that pattern. And my

idea is whether you watch this or you

take the transcript of this video, you

could essentially feed your Claude code

or your cursor or where have you this

transcript and it would understand which

pattern it should employ for what kind

of problem. So prompt chaining is where

you break a big task into smaller steps

and you run one after the other. And the

good thing about prompt chaining is it

gives multiple areas to basically catch

a failure before it happens because each

step in that chain, that's why it's

called a chain, validates the output of

the one before it before it passes data

to the next one. So you can think of it

like an assembly line where each station

completes its part, checks quality, then

hands it off to the next section. So

tactically you would have some form of

user input and then that user input

would be broken down into subtasks. Once

the data contracts or contracts between

these tasks are created then you go and

execute task one. Then when you go

through executing task one you want to

validate the output of task one. So task

two will now validate first that the

test passed or the data actually passed

properly from output one and then we go

to output two and it keeps going and it

if it fails it retries until it finally

passes and in this case we only go

through three executions but

theoretically prompt chaining could be

infinite assuming your budget is

infinite for LLM costs but there is

diminishing returns so if you put 50

different language model chains together

at some point you're either adding too

much or you're basically pushing it to

the limit where it starts hallucinating

on something it wouldn't have

hallucinated before. It starts to

overthink. So there is a magic number

depending on your workflow where it

could be from three to five different

parts of that chain where it's good

enough to do that validation. And the

idea is is that you merge all the

results. You assemble the final output.

You log all the artifacts. So anything

that happened throughout the entire

chain. So if something does go wrong or

your output looks a little bit

suspicious, you can go back through the

entire chain and see exactly where that

happened. So in terms of when to use, I

use prompt chaining a lot in all kinds

of flows, whether it be an automation,

an agentic automation or both. And you

can think of it as very useful with

complex multi-step processes, data

transformation. So imagine you have

really dirty data or data that's just

not standardized or fully structured.

You could have a pipeline with a mix of

generative AI and non-generative AI. So

let's say Pythonic or JavaScript where

it goes through and each part has a

pass. So let's say you had awful

columns. they're not properly labeled.

Step one could be let's label all the

columns based on the first couple rows

of data using Genai and then assuming

that it makes sense it passes it goes to

the next step where now maybe we clean

and make sure that each row has the

proper type of data in the right

expected structure and these multi-step

processes is where prompt chaining can

help a lot. So given that it's helpful

for things like document processing data

ETL code generation and one that'll

basically reemerge over and over again

is content creation. A lot of these

design patterns depending on your flow

for content creation is very helpful. So

the pro of using this design pattern is

it's modular, meaning you can swap in or

swap out different parts of the chain

and not necessarily break the entire

flow, assuming that the chain you input

is very similar to what is expected from

the other parts of that chain. But one

of the major cons for this is context

explosion. And this is essentially

because you're going from maybe step one

to step seven. And what if you were

carrying over all the context from step

one all the way to step seven? So you

could theoretically have a lot of data

depending on what kind of structure this

data is. So imagine it's JSON,

JavaScript object notation or something

that's a payload structure. Those are

very expensive on the token side. So if

you keep looping those and adding it to

the next part of the chain, you can now

end up at step seven where you have all

of this context and all the prompts and

now your new prompt for step seven and

now the likelihood for hallucination

could beat or defeat the whole purpose

of doing this to begin with. And the

next thing is if your prompts are not

very well tested, then it could pass. So

let's say you go from chain one to chain

two to chain three and somehow it

passes, but there's an actual error or

it shouldn't have passed to begin with.

So that's a prompting issue. You then

have error propagation where every

single node or every single part of that

chain is inheriting the first mistake

you made. And obviously this is going to

be slower because you have multiple

points of inference which in plain

English means multiple points where a

language model is going to have to be

intervening. So you have to wait for

that to run before the next step. And

then applications of this could be legal

document analysis, e-commerce product

descriptions, academic research

assistance and anything related to

report generation. The next design

pattern is routing. And routing in plain

English is where you have incoming

requests that get analyzed and sent to

the right specialist agent based on what

they need. And you can think of it like

a smart receptionist or an operator who

listens to what you need and directs you

to the right person or the right

department depending on tech support,

accounting, etc. And the key thing here

is if the operator is unsure, they

should go back and ask you some more

clarifying questions to better

understand where to route that request.

So using that as a segue, you have a

user request. The AI then analyzes the

intent and the context behind that

request. So once it does that, it has to

make a decision and that decision will

be whether or not it should be going to

the technical support agent, the sales

agent, the account management agent or

any of the other agents in your

workflow. And if it doesn't know, the

key thing here is it's going to request

clarification until the confidence is

higher. Now confidence could be a number

generated by an LLM where it goes

through the decision and you ask in the

prompt out of 10 tell me how sure you

are. Now again you open yourself up here

to hallucination because what if it says

it's an eight but actually meant six and

if you ran it again it would have been

five or six. So in this case it could be

helpful to add something deterministic

where you have something statistical

that takes a look at the decisions and

assesses it in some way that gives you a

number that you can rely on to go back

in that loop. And once you get a

response from any of the agents that it

ends up routing to, it goes to either a

success or a failure. And then it comes

to some form of decision and delivers

the end result. Whether that be a piece

of information, a summary or what have

you. This makes sense when you have

multiple domains. So you have like we

said technical, we have accounting, we

have finance, we have different

departments or specializations for our

agents that we'd have to basically

distribute to. And it's also helpful

because if you have a specific tool that

can only or should only be invoked with

a specific path or a specific

department, this is helpful to segregate

all those different paths. And it'll

also help prevent misfires where an

agent uses a tool it shouldn't have used

or thought it should have used or ends

up doing a whole process without

executing the very tool it needed to

come to that conclusion. which if you

use something like NAD, you'll notice

all the time if you actually watch

what's happening with the AI agent node,

it will sometimes use a tool then not

use the tool then decide to end up using

the tool last minute and then you get

the end result assuming that the whole

process was correct to begin with. So

like the example, this is great for

customer service, enterprise automations

and healthcare triage, especially if you

have some form of healthc care front-end

receptionist that is a voice agent that

takes calls and it either routes it to a

specific department. It basically does a

booking or does something along the

lines of answering questions on when are

you open, what services do you have? So,

these could be different parts of that

specific chain. The pros here is that

you have specialization, scalability,

and efficiency. But on the con side,

because you have multiple possible

paths, you can always route to the wrong

path. And in the real world, it's less

likely for that to happen because if you

have someone on the phone asking you

clarifying questions, they literally

won't let you pass until they know

exactly where to route you. And if they

don't know, they'll probably ask their

manager. So with that same analogy, it

might make sense to add to that workflow

some form of manager agent that assesses

the decision of the initial agent. But

one of the many things to look out for

is this specific one here, which is

being prone to edge cases. So you could

have a case that comes out of nowhere.

And it's good to have some form of

confidence interval or confidence

marker. So you can basically quarantine

or add in a human in the loop if there's

one case that just can't be properly

tagged. And again, one of the best

applications for this is likely in

customer service or anything that's

front-facing from a business standpoint.

Now, the next one is parallelization,

which in plain English means splitting a

large job into independent chunks that

can be processed at the same time by

multiple workers. And when we say

workers here, that is a proxy for

agents. And the analogy here is like

having 10 people each read different

chapters of a book simultaneously, then

combining all the summaries at the end.

So each one works on one separate

chapter, then we put it together to

create the book. In practice, this looks

meaty, but it's actually not too

difficult. So you have some form of

large input. Then that input is

analyzed. Then that analysis determines

how you're going to split that big task.

So imagine you're working at a company

and the CEO of that company tells you go

and reduce our customer churn for our

SAS platform by 20%.

Now that 20% is a huge ambitious goal

and what you have to do is take that

goal break it down into independent

units in this case subtasks that can

lead to accomplishing that goal. So for

example you could run some form of

survey across those customers to see why

are people leaving. Maybe you have some

exit interviews so you have a better

understanding of why or what problems

might lie in the underlying SAS

platform. In the same way, the agent

here has to check resources, see what

resources it has available to it, and

then once it sees what it's dealing

with, it can spawn up parallel workers.

And each of these workers can work on

subtasks that lead to accomplishing that

bigger goal. So you can think of each

one of these workers as employee agents

where each one retries and works until

it succeeds its specific task. And if it

fails, it keeps going in a loop until it

goes through. And then once you collect

all the results from all of the workers,

you then normalize them, which just in

plain English means you make them into a

same format. So it's like having apples,

oranges, and pineapples. You want to

make sure that all of them are apples or

all of them are pineapples or all of

them are oranges. And then you merge

those results. You simplify it to a

single output. Then you generate a

summary. And what providence here is

like you're citing which parts of this

final output came from which workers. So

if you understand where the failure

point is, you can go and have a

conversation with that specific worker,

which in this case means adjusting the

prompt or adjusting the system for that

worker to make sure you get the right

coordinated result. So this is helpful

to use with largecale data,

time-sensitive operations where you need

to break something down very quickly and

you want some way to draft some agents

to help you break it up. And then web

scraping is a good example cuz web

scraping has multiple processes. You can

go on a page, inspect the elements, see

whether or not it's HTML or how you want

to use JavaScript. Then you break it

down into different processes. Maybe you

crawl different pages. So you can think

of a very meaty process in this regard.

where it fits are things like document

processing, data enrichment, research

automation, and testing frameworks. And

in terms of the pros and cons if you put

them side by side. So pros are, like I

said before, specialization, it can

scale because technically like a

company, you can keep adding more

employees, but in this case, you don't

need to raise venture capital funding to

do that. You can just add more

resources. But you'll see this con

happen over and over again in these

agentic design patterns where as you add

more employees naturally a normal

company it adds more complexity adds

more layers adds more drama you might

need to hire HR so in the same way you

might need to bring in an agentic

version of HR to now manage these

workers better and then obviously

unifying all of the outputs from the

workers sounds easy when I say it but

when you're tactically doing this in the

real world it's sometimes hard to equate

or equalize all of those same parameters

and variables. So from the real world

applications, I would say the news

aggregation service is the one I've seen

as well as document intelligence systems

as well. The next one is reflection. And

reflection is exactly what you would

anticipate it to be where you generate a

first draft, then you have a critic

review it against quality standards and

then based on the feedback, you revise

and you improve. And you essentially

repeat this until you meet your quality

standards. So the analogy here would be

like writing an essay, having a teacher

review it for you, and then making

improvements until you finally get a

passing grade. In terms of general

structure, not too tricky. So you have

some form of initial request, you

generate a first draft, then you have

initial output, then you have this

critic agent that goes through the

output and assesses what needs to happen

to make this better. So in this case, we

could apply quality rubrics where you

literally create a rubric for this agent

to assess the output. You run unit tests

which are predisposed tests that you've

put together for edge cases and things

that you're looking for. And then let's

say it is the essay example. In this

case, it would be grammar and logic

check. So assuming that all of these

pass and meet the quality bar, then if

it meets the criteria, we go down this

path here. You accept the output. You

record success patterns. You update any

prompts or rules if anything's needed

and you're good to go. But if it doesn't

pass, then you generate some structured

feedback. Very similar. So imagine

having one agent generate structured

feedback that goes back into that loop

to the original agent and goes back and

forth until you finally reach all of

those quality standards. Now let's say

there's a fundamental flaw in your

workflow or automation you're building.

You then have to ideally have a max

count. So loop through until you meet

the standards. You have three times very

similar to school where you couldn't

endlessly submit an essay in like grade

five or six. You would have maybe one or

two tries and then after that the grade

is the grade. So when to use this is if

you have to really keep track of quality

control and you have complex reasoning

tasks or tasks that are more creative

where you want to use the chaotic

feature of a language model but you

don't necessarily want to have a chaotic

result that's unpredicted every single

time. Where this fits is anything around

really content generation. So content

legal academic writing product

descriptions for products. So imagine

you had an agentic system. You have an

Amazon FBA store. you have thousands of

products, you're trying to find ways to

write descriptions for each one of them

that's not formulaic and AI slop. This

could be an example where you adjust and

use this pattern. The pros is

essentially you're focusing on quality,

which is awesome. What is not awesome is

the cost. Any form of API throttling. So

let's say you're ripping requests over

and over again for each product and you

have 10,000 products and you're running

them all at the same time. You can have

ways where the API will just time out.

So this specific pattern needs a lot of

planning. And like I said, anything

related to content generation is where

this would be really helpful. So the

next one is tool use. And this one is

straightforward as well. So when the AI

needs external information or actions,

it discovers available tools, checks

permissions, and then calls the right

tool with proper parameters. So it's

like a chef who needs ingredients,

checks what's available in the pantry,

then verifies they can use it, and then

retrieves and actually uses it in the

recipe. So tactically, you have a user

make a request. You analyze those

specific requirements. you discover what

tools you have available. So in this

case, let's say we have an agentic tool

that has access to the web search API,

database query tool, calculator

function, file system access, and other

APIs. We then select which tool should

be used. And this alone can be tricky

depending on what kind of generative AI,

if you're using a reasoning model or

what have you. And then you match the

capabilities to the need. So you do a

safety check. Did I basically choose the

right tool for the job? If it passes,

you prepare the tool call. You execute

the tool call. If it doesn't work for

whatever reason, again adding some logic

here as to whether or not or how many

times it should loop through. And then

as you go through, you have the parse

tool output. You have a fallback method.

You can do normalization with the

language model where you have the

language model take the outputs of this

automation and then basically configure

in a way or format it in a way that's

easier for it to interpret or use. Now,

if you don't use the right tool and you

fail, then ideally there's some form of

reason. So, you deny access to using the

tool with a reason saying you use the

wrong tool and let's log this so that

someone like me can intervene and add

some more flavor and change the

structure of the agentic workflow to

work a little bit better. So, tool use

is used all over the place. So, the

applications are endless. So, I won't

touch on that too much, but anywhere

where it's multi-step is much more

helpful. where it fits, research

assistance, data analysis, customer

service, content management, and in

terms of pros and cons, you have the

quality improvement, you have error

reduction, and on the con side, you

have, like we said before, if we have a

misfire and we use a tool and it says it

passed, but it shouldn't have passed,

then you carry that same mistake over

through the entire workflow. So you can

think of this as starting a math

equation back in elementary school and

you're doing division and you divide

incorrectly in step one. Everything you

do after that point will be wrong

because your essential first step was

wrong too. Now for the rest of them

since you're getting the gist of this

I'm going to just glance through the

real world applications. You'll

basically get it into where it fits.

Planning straightforward again where you

have some form of big goal and you

create a step-by-step plan. This is what

I do personally when I use things like

clawed code or cursor. I don't write

code for like 40 minutes or AI doesn't

write code for 40 to 50 minutes. I plan

and plan and plan until it's ready to go

and I know exactly what's going to

happen. Then I let it run. And even

then, sometimes the AI manages to still

hallucinate some parts, but it's a much

better way to compartmentalize

everything you're trying to do and

execute it in the most efficient way

possible. So this one is like planning a

road trip with checkpoints, monitoring

traffic and routing where needed. So in

this case, you have some form of goal

input and then you break it down into

milestones. You create what's called a

dependency graph and then you check your

constraints. So in this case, if it's

data oriented, it could be data

availability. It could be authorization,

could be budget limits, could be

deadlines of any form. And then you

generate a step-by-step plan. You assign

which agent or agents should be used and

what tools of those agents and then you

just go and execute each step. So

similar structure to prompt chaining

except you're not necessarily carrying

over the output of the previous one to

the next one. You're basically going

through sequentially until you get to

step number n. N could be six steps, 10

steps. You track your progress and

assuming your goals are met, then you go

through the acceptance criteria and

you're good to go. Otherwise, like we

said, you're going to see this theme

recurring. You have some form of backup

where if this doesn't work, you analyze

what happened and assess whether or not

there's new information. If so, if this

is an edge case, like I said before,

maybe it deserves a human in the loop.

Otherwise, you want to escalate that

issue, handle the exception, and then

you're good to go. So, this is

especially helpful with things like goal

oriented workflows where you have again

ambitious goal, but you want to break it

up into substeps to get to that goal. So

this is good for project management,

software development or research

projects. And in terms of the pros and

cons, the big pro is that you have very

strategic execution because the more

time you spend planning or the more time

the agent spends planning, it has more

clarity on exactly what it should do.

And by nature of that, it makes your

entire workflow or automation a lot more

adaptable to new variables, new

environments. The biggest con though is

the setup and the complexity and

coordinating all those agents to make

sure that each one has the right tool,

each one has the right system prompt and

that you have the proper fallback

mechanism if things don't go right. This

next one is multi- aent collaboration

and this is one that you would expect

and you see all the time especially with

those humongous anend workflows with

seven agents, six sub agents and you

have that whole network. So the crux of

this one is that you have multiple

specialized agents working together on a

different part of a complex task

coordinated by some central manager. In

many cases they share a common memory

which is important here because if you

share a common memory then your memory

mechanism whatever that is an MCP server

any form of function has to be well

structured so that all the memories

don't overlap but you focus on the

proper memories that need to persist.

And my analogy here is like having a

film crew where the director coordinates

while camera, sound, and lighting

specialists each handle their part

sharing the same script and timeline. On

the multi- aent side, you have some form

of complex task and then because of that

task, you have to define specialist

roles. So you might have an agent that

literally just decides what other agent

should be chosen which is similar to the

idea before where we had that operator

but in this case the operator has to be

an agent where depending on what we're

looking for in the task it decides okay

let's use a research agent or the

analysis agent or what have you and then

in your infrastructure you should have

some form of shared resources whether

that be shared memory stores artifacts

version control and then once you have

the coordination protocol as it's

referred to you have your orchestrator

ator go through and then the coordinator

manages the flow. It assigns it to each

agent. It assigns each task to the right

agent. So imagine you had a a sauna

board or a Jira board and you have a

bunch of tickets. The coordinator is

essentially tagging each ticket to one

or more agents and after each one

finishes they have some proverbial

contract until they go to the next agent

and each one goes through checks off and

the contract again using my analogy of

tickets. Each ticket on a project

management software has criteria,

acceptance criteria. Assuming that

acceptance criteria has been met, then

you can go to the next stage. And then

there's an overall acceptance test. If

it passes, you're good to go. If it

fails, maybe you go and run a

simulation. You loop back. You make sure

and see where did the coordination fail.

And again, you can set some form of max

here where it doesn't keep retrying for

infinity. So out of all of these, one of

the best applications of this is for

iterative refinement, which really lends

itself well to AI or general product

development where there's multiple

phases, multiple tickets, and then

different ways to solve the same

problem. So software development,

product development, financial analysis,

content production or creation and

research projects is where this shines.

And in terms of the pros and cons, the

pros like before you have the ability to

specialize and you have parallel

processing. But on the con side, once

again, all of these systems need to be

set up and tested and tested over time

as these language models evolve and

drift. The last section acts as a great

segue for memory management where this

is classifying incoming information as

short-term conversation, episodic

events, or long-term knowledge. And you

store each type appropriately with

metadata like recency and relevance. And

this is exactly how your brain keeps

track of things briefly. Some like

specific memories or permanent

knowledge, things that you will never

forget. And one thing I would say here

is that there are so many tools and MCP

servers trying to solve this. And I've

yet to find something perfect because I

noticed that depending on what you're

trying to build agentically, memory

management is really contextually

specific on what you're trying to

remember and what is not worth

remembering. But the main idea is you

have some form of user interaction. You

capture information and then you decide

what kind of memory would this be? Is

this something I have to remember in the

long run? Is this short-term memory? Is

this knowledge that I have to store in

perpetuity forever? So is this episodic

memory? Is this long-term memory? Is

this something I'm just going to keep

for the remainder of this session? So

that's why it says here, is context

window full? If yes, then you compress

whatever it is you're trying to remember

or you compress your existing memories

because you don't have to hold on to

them anyway after the session. Now, if

you do need to store them, then you need

to index them and then add metadata, add

a recency score, create frequency or

topic tags so it's easy to retrieve

those. Think of something like a vector

database of sorts where you need some

way to generate your top five results.

based on a single question. And on the

shorter term memory side of things, you

want to retrieve a memory if it's

relevant. You want to query your memory

store. Maybe you want to apply some

filters by rule, time horizon, or topic

match. Then you pick the right memories

that you should use. And then you

process the request. And if privacy is

an issue, this is where you deal with

that. Whether you redact anything from

that memory or you save a different

version of that memory, then if so, you

update your memories and then you

continue the interaction. So this is not

a a gentic pattern of its own. This is

more so a subset of where you'd use it

in other gentic patterns. So the main

use case with long-term memory

management is conversational continuity.

So ideally if you talk to Claude, you'd

be able to have that conversation with

Chad GBT with the exact same context

without having to reexlain who you are,

what you do, or what you're trying to

accomplish. This is awesome for

experiences that require tailoring. So

customer service, personal assistance.

I'd say the biggest application that I

could see is educational assistance or

platforms where they learn that you

struggled with concept A. So when you go

to concept B, it knows that you have a

weakness with concept A. So it basically

overexlains parts of concept B that are

dependent on understanding concept A.

The pro is obviously context

preservation over time. But on the con

side, you want to make sure that as you

store memories, you're not compromising

security. You're not over storing

memories. You have a way to flush out

older memories or you have a system to

determine when a memory is indeed old.

The next one is learning and adaptation

where this is collecting feedback from

user corrections, ratings, and outcomes.

You want to clean and validate the data

to remove noise and then you use it to

update prompts, policies, or examples.

It's like adjusting a recipe based on

customer feedback and taste tests. So

essentially, you'd have some form of

system operation and you collect

feedback from a feedback source. That

could be some form of correction from a

user, quality ratings, automated

evaluations, or some form of rubric or

task outcomes. You then take these

quality signals or these feedback

signals. You do a quality check and then

you either dn noiseise it, you clean it.

If it's something malicious, like you're

a restaurant and they say there's

cockroaches everywhere, but there's not

a cockroach in sight. So, you maybe

disregard but you log that specific

review. Additionally, you want to make

sure that your main system doesn't have

any fluff or noise in it. So, you go

through this process and then you decide

how is this going to be quote unquote

learned? Am I going to update the

prompts associated with my workflow? Am

I going to update my policies examples

in the prompts? If you're doing a

multi-shot prompt, am I going to update

existing preferences in the tool or

product itself? Am I going to fine-tune

a model? This is very rare that you'd

want to do that, but it is an option

that you can use. Then after this, you

do some form of AB testing. you monitor

the performance after taking in the

feedback to see has this course

corrected this agent to do a better job

at whatever. So naturally this is a

great system to use if you need feedback

incorporation and you need to have some

way to have a feedback loop and stimuli

taken into the system so that the system

learns in whatever way learning is for

you whether it's the prompt itself or

the knowledge base or any form of policy

that your agents adhere to. So where it

fits is similar to memory management

anywhere where there is a tailored

service where you are receiving feedback

from a customer or an avatar. So the pro

and con is that you have continuous

improvement but on the con side you have

training costs right. So every single

time you're updating a prompt you're

having probably a language model do

that. So these things become a

combinatorial cost problem where as you

keep adding more and more checks or

feedback loops you're also adding more

cost. And now you could learn something

wrong. Right? So what if someone said

the restaurant is full of cockroaches

and now your system learns that it has

cockroaches. So it says something like

warn people before they book with your

restaurant that there are cockroaches

everywhere. So you could learn the wrong

thing. So you want to make sure that you

have some checks and balances against

that. The next one is goal setting and

monitoring. And this one is basically

defining specific measurable goals. A

lot of times they call these smart

goals. Specific, measurable, achievable.

Two other ones I I forget right now. I

think realistic and then time based.

Yeah, you have measurable goals with

deadlines and budgets and then as work

progresses, you continuously monitor

metrics and compare to targets and it's

like having a GPS that sets a

destination, monitors your progress, and

then recalculates when you're off

course. So, how the system works is you

have some form of objective that is

defined. Then you create these smart

goals. So, you have everything that I

mentioned before. You set your

constraints. Let's say your your main

constraints that you deem as the most

important are time and resource and

budget. Then you define some metrics or

KPIs, key performance indicators for

this agent. Then you go through some

quality gates. Quality gates is

essentially just double-checking that

everything's in line. You begin your

execution. You go through continuous

monitoring. You track progress, create

checkpoints, have status events. You

collect those metrics. You compare them

to targets. And then you go through this

entire rest of the system. If the system

starts to drift because it's not

adhering to your KPIs or your metrics,

that's where we go and analyze the cause

and you decide what needs to happen. Do

you need more resources? Do you need to

adjust the plan? Do you need to modify

the scope in any way? And then if it

does pass, then you continue the

execution. You make sure that your goal

is achieved, whatever that goal was. And

then if it isn't achieved, you escalate

it. Otherwise, it's successful. And

theoretically here, you could generate

some form of report summarizing

everything that happened. Then you have

the goal achieved. In terms of where and

when to use this, this is a more

advanced technique. So you'd use this

for complex projects, really autonomous

operations you're going for and

strategic execution. And on where it

fits, it's for, let's say, sales

pipelines, very sophisticated pipelines,

system optimization or cost management.

I would likely use it only in these two

occurrences. There's probably simpler

ways to create a sales pipeline. And on

the pros is you try to be as efficient

as possible with your resources. But on

the con side, you have potentially goal

conflicts or rigid constraints

throughout your system where you have to

run this quite a few times to catch any

not only edge cases but any rigidities

that pop up depending on the variability

of your input. Next up, we have

exception handling and recovery. And I

could summarize this whole sticky note

in one line, which is this is just the

way that you catch errors in your

agentic workflows. So this is an agentic

pattern to help catch issues in your

other agentic patterns. So essentially

what you're trying to do is you do

something, you add safety checks, you

make the call to these services or tools

or both. Then you assess whether or not

it worked. If it didn't work, you take

that error, you catch it, and then you

have to assess and classify what kind of

error is it. Is it a permanent error?

Meaning it's something that's not going

to resolve itself. If so, it's good to

have a plan B in your workflow. If it's

a temporary error, then try again. Wait

a bit. So, sometimes we call this back

off or exponential back off where it

waits 1 minute. Let's say there's a

timeout with an API or you've sent too

many requests, maybe it backs off.

That's why it's called exponential

backoff and it goes back and tries again

in a minute. But obviously, you want to

cap out how many times it should try

because it could be that what you think

is a temporary failure is a permanent

failure. So, for a critical response,

you'd have an emergency response. You'd

save your current work, alert the team,

determine whether or not it's safe to

continue, and then you keep going until

you get to the point where you can

continue working and you're unblocked.

Otherwise, you need to maybe do a full

stop and reassess the entire system and

see where the issue lies. In terms of

backup options, this could be using a

simpler method, using saved data, using

default answers, or getting again that

human in the loop to assist. Then you

start the recovery process, which flows

into the exact same recovery process

from before. Now in terms of when to use

this, you can use this pretty much in

every pattern, but specific ones are the

ones where you need a lot of focusing on

error handling where errors are more

prone to happen like systems that are

actually in production, quality

assurance, cost management, and anything

where there are vital and critical

mistakes that you have to account for.

So this is one of those patterns that

would be a good use case for enterprise

AI deployments because there are so many

fail safes and plan B's and C's. So the

pros and the cons are obviously that you

have more performance visibility. You

can see exactly what's happening, what's

failing, why it's failing, and have

areas of recourse if it fails. And then

you have more user trust naturally

because you have more fallbacks in

place. But on the con side, there is a

lot of infrastructure and complexity to

make this happen. And sometimes you

might have a lot of false alarms. So

depending on how many times you get an

alert, you should be very judicious or

very specific about when or what is

worth an alert. So you don't get alert

fatigue. It's kind of like the story

about the sheep that cried wolf. When a

cried wolf multiple times and the wolf

was actually there and they ignored the

wolf. So the same analogy applies here.

This next one is human in the loop which

like the name says is adding a human in

the loop where there's low to high risk

depending on the situation or most

importantly edge cases. So this one we

can kind of like breeze through because

it's pretty straightforward or you have

some form of agent processing, you have

a decision point and one of those

decisions could be that a review is

needed or you need to actually step in

and intervene. So a good actual tactical

example is imagine you're using some

form of agentic browser or agent mode in

chat GBT. At some point it will realize

it needs you to step in to add your

credentials to log in to your email to

Upwork to whatever service it is. And

that's where you have a review cue and

then you prioritize by urgency if it's

multiple things. You present in the UI

like agent mode and chat GBT where it

physically tells you Mark please

intervene and take over and then give me

back control once you're done. It shows

full context displays differences.

There's usually some form of timer. And

on the human side they can decide

whether to deny, edit or take over or

whether to approve. And assuming that

the human approves, it goes through the

rest of the workflow. And assuming that

no more intervention is required, then

this process is complete. So you want to

use human loop anywhere where you have a

highstake decision, you have regulatory

compliance, where you can't leave it up

to a generative AI model to hallucinate,

and when you want to catch things like

edge cases. So it fits everywhere.

There's only a few examples here like

content moderation and medical

diagnosis, but it fits everywhere you

can imagine. So the pro is that you have

more trust in the system because you

know exactly where the failure points

are and what the next course of action

is with a human when you reach that

failure point. When you're adding human

in the loop, you're naturally adding

more latency or more buffer time to that

system because you're it has to wait for

you. So if you take 10 minutes to

intervene, then that's the system

running 10 minutes longer than it should

be. This next pattern is very familiar

to a lot of you where it's basically

just rag. It's knowledge retrieval. So

just to define it, it's indexing

documents by parsing, chunking, and

creating searchable embeddings.

Literally rag. So it's like having a

librarian and you want to categorize or

index a series of information and

systems. So this one is pretty

straightforward where you have a user

query. You have some sources that you've

ingested. You've parsed those documents,

categorized them, embedded them, which

again means in plain English, you take

words, you turn them into vectors, you

store vectors into library. So when you

ask a question, you try to match the

vector of the question to the vectors in

your library with the closest match. And

then in terms of chunking strategies,

you have fixed size chunks, semantic

boundaries, context aware chunks.

There's all kinds of different ways to

do this. Then you generate embeddings

like I said, you store it in something

like a vector database. You get the

query. Is there if there's any form of

rewriting for that query to make sure

that you can get a better match, then

you would do that in the system. you

would retrieve the top matches. So this

is called top K. You could have five

matches, you could have 10 matches. Just

be aware that the more matches you add

to the system, the more that the

language model can choose and

hallucinate from. In terms of reranking,

this is where you would reassess all the

vectors and better organize them through

scoring them and optimizing them to get

better matches. So you can have more

grounded responses. You have citations

potentially. You obviously have to test

your rag and if it fails then you have

to go through adjust whatever parameters

need to be adjusted. Then if it passes

you deliver the response and then maybe

you have some form of metrics that you

score on like precision or recall. Then

you optimize the system and then your

rag technically would be complete. So

you want to use this wherever you have

document knowledge needs and that could

be small or large really depends on the

scenario. So this fits anywhere where

you have enterprise search, customer

support, research assistance, any form

of documentation you need to split up

and use. But rag is something a lot of

us know about. So this is not too hard

to wrap your head around. So the pros is

that you can add more accuracy and

scalability to your system, but it can

come at the cost that you have to not

only build infrastructure but maintain

that infrastructure, which means

maintaining those vectors that you

accumulate over time. This next one is

definitely worth refining which is

called inter agent communication. And

this is basically where you have agents

communicate through a structured

messaging system with defined protocols

and then you have message including IDs

for tracking expiration times and

security checks. So analogy here is it's

like an office email system with red

receipts, security clearances and spam

filters that prevent reply all

disasters. So this is where you have

language models talking to other

language models. So from a system

perspective, this is where you could

have multiple AI agents speak to one

another and then you have to decide how

they should communicate. So either they

have one boss, one that manages all the

other agents, which is sometimes really

helpful to have because you have a

single vector of failure that everything

can report to. The next one is that

everyone is equal, meaning everyone has

a say at the table. It is a pure agentic

democracy which sounds great but in

practice really hard to dial in because

you're always dealing with the risk of

hallucination and misfiring. And then

you have potentially like a big thread.

Imagine you have a school community for

agents and all of them are looking at

the board or the pinned posts and that's

how they communicate. They communicate

as comments on those pin posts. So in

this case that you set up communication

rules, how they can speak, how they can

object, what happens when they have

conflict with one another. In terms of

message rules themselves, you either

have to track numbers for each message.

You create an expiration for each

message. So let's say you have a

conversation and you're now at 100

messages between all of the agents. You

probably don't want to maintain or store

the third message from a singular agent

unless that's one of the only things

it's set. If there are important

messages, then you need a system to mark

which one is important. So this is where

you can get a lot of spaghetti where you

have agents on tops of agents. Then you

have language models assessing those

agents. So the number of potential

points of failure is very high. So

what's interesting here is that you can

even create a system where you can

designate which agent is allowed to

speak. Then you verify their identity.

You check what they can or can't do.

Depending on that, you can give them the

green light for communication. They send

a message, deliver it to a prescribed

agent. The agent gets the message,

processes that message, and then it

determines, do I need to reply or do I

need to execute the thing that the agent

told me to do? But this is where it gets

messy where you want to assess, do you

have any problems? If so, you could have

an endless loop and when do you stop

that loop? If all the agents keep

talking to each other and it just

doesn't stop, you need some mechanism to

make it stop. If an agent gets stuck, do

you have a mechanism to unstick the

agent? If it's stuck at firing a tool or

it's stuck on one particular point that

it keeps looping back to. If there are

messages that are way too long, then

maybe you remove those old messages from

the context to keep it going. Then you

alert a human. And this is again where

human in a loop is very helpful because

you might need one to just push the

conversation along. Otherwise, if life

is all good, you can keep going. You can

save the conversation history. You can

create an activity report. But genuinely

looking at the system, I've never seen a

company that has implemented this one at

all, two properly, or three in a really

scalable manner. This would make a

really good YouTube video, but not a

really good production system. So, in

terms of when to use this, I would

personally tell you you probably don't

want to do this, even though it looks

beautiful. It sounds great. Unless

you're trying to build a prototype

system of automating an entire company

with just agents, I'm sure it's possible

with some implementation. But you

probably could do much more useful

things with that time that are more

deterministic and reliable because as

language models change, you'd have to

basically create your own framework for

how all of these systems should work. I

don't think you'll be able to pull

something off the shelf like a crew AI

and be like, "Cool, this is the system

that we're going to depend our whole

company on." If you are going to do it

then enterprise level makes sense

because you need tons of resources,

engineers, you need proper production

and for the other ones you can see here

one of the use cases that popped up was

smart city systems. So this is very

complex. This is at a very very high

level. Now the one key pro here I want

to dial in on is fault isolation. So in

this system if you manage it properly

you can know exactly which agent is the

culprit for a particular issue or what

happens when all the agents go back and

forth and have conflict. you can

basically root everything that happens.

Whereas in a real company, sometimes you

can't pin down exactly why something

didn't work. Was it a personnel issue?

Was it issues within the personnel? You

can't necessarily have full big brother

intelligence over what's happening. But

here you can. And the cons speak for

themselves where you have a lot of

complexity, a lot of debugging. You have

to see all the states of the agents at a

particular point in time. You have to

keep track that the context of the

conversation isn't getting overloaded,

that the agents themselves are speaking

the same language literally or at least

the language that you've designated them

to have. This next pattern might be new

to a lot of you and it's called

resourceaware optimization. What it

means is analyzing a task complexity and

then routing to appropriate resources.

So simple tasks use cheap, fast models,

but complex tasks use powerful but

expensive models. Think of something

like GPT5 where there was a huge uproar

because we lost all of our models. Then

we got either quick thinking, kind of

thinking, hard thinking or like

professional thinking. Each one of those

would route your request in chat GBT to

the model that it thought would be the

best suited for that particular outcome.

So the analogy here is a playful one

where it's like choosing between

walking, a bus or a taxi depending on

the distance, the urgency or the budget.

So you get a task and then based on that

task's complexity, you set a budget.

That budget could be a token limit, a

time constraint, a money budget on how

much you're willing to spend for that

kind of inference or that kind of API.

Then you have a router agent classify

that complexity, whether it's simple,

medium, complex, and if it's unknown, it

has to run a quick test to maybe check

the confidence of how sure it is as to

whether it's simple or complex. If it's

simple, then maybe it goes to a small

model. If it's medium, then a standard

model. And naturally, if it's more

complex, then a reasoning model of

sorts. And then once you execute the

tasks, you monitor resources. You look

at the token count and response time as

well as API costs. Maybe you have some

form of function that's keeping track of

the rolling cost. And as long as it's

within limits, then you're good to go.

It keeps continue processing until the

task is complete. And then you finally

get whatever the outcome is, a report or

something along those lines. And if

you're not within the limit, then you

need some form of optimization. So

either you need to cut away from the

context in your prompts for the agents

or you need to take advantage of

something called prompt caching where

essentially you have a language model

physically cache results for up to an

hour. So you can keep referencing and

sending that prompt over without having

to send all that context over and over

again. Then naturally, one of the best

fallbacks would be just to switch to a

cheaper model across the board. If

you're finding that even your complex

cases could be solved by potentially

chaining multiple LLMs and this is where

you start having a combination of design

patterns where knowing about that prompt

chaining in number one is helpful now

because now you have different ways that

you can pivot and implement your system.

So this is useful to use when you have

costs sensitive operations, high volume

processing, or you have budget

constraints and you have a very large

system where you need to keep track of

every single dollar being passed through

because maybe you're running this at

thousands or millions. And you won't

most likely see this workflow for a

momand pop business or a small medium

business. This will be more enterprise

and larger size platforms. The pro is

naturally cost reduction. That was what

all the uproar around GPD5 was is it was

seen as a costcutting act to route as

many requests as possible to the

cheapest language model while still

charging you that 20 bucks a month. So

that was a pro. But in terms of the con

is you have complexity increase. You

have tuning challenges. It's hard to

necessarily know how often it's going to

go to simple versus complex. So your

system and your rubric for what is

complex and what is simple has to be

really robust and iron tight. And at the

same time, you'll still have edge cases.

So what does that system that looks at

confidence interval look like? All of

this needs a lot of planning, a lot of

resources, and a lot of testing. The

next one are reasoning techniques. So

this one in plain English means choosing

the right method for the right problem.

So chain of thought for step-by-step

logic. Tree of thought, a very

interesting technique. It's actually one

of my favorite for different use cases

that need creativity and imagination for

exploring multiple paths. So this one is

like solving a puzzle by trying

different strategies until one finally

works. So while you might not find this

fun, I find this one particularly fun.

So you have a complex problem and then

you want to find a reasoning method to

help you solve set problem. So you can

either go sequential where you have

chain of thought which is very similar

to prompt chaining where you break it up

into steps. You do step one, you think,

you reason, then you conclude and then

step two or the possible second path

could be branching where you have tree

of thought. Very interesting technique

for you to take a look at. You generate

literal branches of thought. You explore

each one of those paths. You evaluate

which one seems the most viable. And

then you do what's called pruning. And

pruning is essentially if you have many

branches, you cut off the dead branches

or the ones you want to be dead because

you have a path forward you've decided

on. And then you have a few other

methods where you combine multiple of

these methods and you combine it with

self-consistency. You generate multiple

solutions. So multiple solution paths

and you score them. And then you have a

few other ones where you have

adversarial where you have a debate

method where you have a proponent agent

and an opponent agent. It's kind of like

having your mini parliament where you

have two agents go back and forth until

one wins and exchange arguments and then

based on those arguments you decide what

is the best path forward. So the key

thing here is that you basically do all

of these and then you score all the

solutions here and then based on the

rubric that you come up with, you run

tests, you validate logic, you rank the

candidates of which method is the best

based on your specific complex problem.

You then select the best one and you can

either combine all of them or you can

create one single one. So you could say

I just want to use tree of thought based

on this rubric or you know what I think

that I'll do the prompt chaining and

then train of thought because I see some

synergies here. Little disclaimer here

is knowing exactly how these methods

work is very fundamental to actually

making this work. So this is on the end

of the spectrum in my opinion. This is

advanced. So in terms of when to use

this, like I said, advanced technique.

So only for very complex things,

mathematical reasoning, strategic

planning at scale if you really need it.

But nine times out of 10, you won't need

it. But this is a very interesting

workflow to get into once you graduate

to that level of prompt engineering. So

out of all of these, one of the most

interesting applications could be both

legal analysis and medical diagnosis

because some of these problems in both

of these domains are very meaty and very

complex and need very creative ways to

break them down. The pros of this method

is that you're very exhaustive and

robust in your process. But the cons is

that you have a lot of token consumption

complexity. There is such a thing as

overthinking with language models the

same way that you and I can overthink as

well. So that can increase your latency,

explode your cost and combinations. So

even though this is cool, it's not

necessarily cool for every use case.

It's not cool for 90% plus of use cases.

To me, this is highly experimental and

you do this if you have a lot of

bandwidth or free time or willing to put

some resources behind this to see

whether or not it makes sense. This next

one is about evaluation and monitoring.

And we're finally back to normal English

words that we can understand. So this is

about setting up quality gates and

golden tests before deployment and

continuously monitoring accuracy,

performance, cost and drift in

production. And what drift is is when

you have the same model or the same

suite of models output one response but

over time that response degrades or gets

worse or more unpredictable. In terms of

the analogy to conceptualize it, you can

think of it as a factory quality control

system that checks products at every

stage. So you can imagine an assembly

line where one person is taking care of

the wheels or one person is taking care

of the door and making sure that the

actual cover of the car is proper etc.

So how this could work is you could have

some system deployed and then you define

some quality gates. So the quality

criteria could be accuracy metrics, it

could be performance SLAs's, it could be

compliance, it could be user experience.

Then for each one you have the specific

metric. So for accuracy metric there

should be some golden test sets. For

performance, it could be some

performance benchmarks. And then you

keep going depending on the specific

type of metric. And depending on what

you decide on, it could be all of them.

You create a test suite where you do

unit tests, contract tests, integration

tests, you have some critical path

tests. And this is very comprehensive

again and very robust of a testing

system. And you want to assess whether

or not your case actually deserves

something like this. And in terms of

analyzing patterns, the whole point of

this is to do things like detecting

drifts, finding regressions from the

mean, which means that if the mean is

the average, if the average thing stops

happening, and you find something that's

two standard deviations or very

different from what's expected, this is

what's called a regression. And this

also gives you the ability to look for

anomalies, identify trends, and then you

can set a threshold as to when you

decide that any of these or all of these

have failed. And if so, you can do

something like alerting a team. They

investigate the issue. Again, you have a

human in the loop there. And you keep

going. And ideally, you conduct periodic

audits to make sure that your systems,

your mechanisms, your evaluation sets

all are up to date and as expected. So,

this is definitely some form of quality

assurance that you'd want to employ with

production grade systems. And where this

might make sense again is enterprise,

SAS, healthcare, especially the finance

industry might benefit from this and

very large scale e-commerce. So, one of

the biggest pros here is naturally that

you have more reliability, but the

corresponding con is not only alert

fatigue, but also performance impact

where you need a system that's so robust

that can handle this level of scrutiny

and testing on a very large scale. So,

when I hear things like AI is going to

take everyone's job, I start laughing

because I don't know of a single AI

framework that could do this kind of

infrastructure setup at scale. I've

never seen it. I don't think we will see

it from just language models, at least

for a long time. So guardrails and

safety patterns are derivative somewhat

of what we just saw before. So this

one's about checking all the inputs for

harmful content, personal info or

injection attacks. So this is much more

top offunnel of that entire

infrastructure. So you're classifying

risk levels and apply appropriate

controls. So the main analogy here is

airport security where you have multiple

checkpoints where someone asks you for

things like your passport, your boarding

pass, and then as you go through their

job is to make sure to look for threats.

So when it comes to your input being

received, you then have to sanitize that

input. Then you want to check what that

input is. Is it some form of personal

identifiable information or PII, in

which case you want to detect it and

maybe redact it. So if it's someone's

SIN number, maybe you take off the whole

SIN number or you hash it or you replace

all the numbers with apostrophes or

asterisks or whatever, but you want to

find a way to mask anything that's very

secure that shouldn't be going into your

system. The next one could be injection

detection. They rhyme for a reason. So

if someone's trying to break into your

system, get access to your tables and

doing something called like a SQL

injection where they try to retrieve all

the data in your tables of your

application. And this could be related

also to malicious content. So in both

cases, you either want to filter this or

you want to block it entirely. And this

is where you do risk classification

where you assess is this low risk,

medium or high. And if it's high, nine

times out of 10 you should involve a

human in the loop. And then depending on

the severity of low to medium then you

could either process it normally or

process it with additional conditions or

constraints then you execute the task.

You have some form of output moderation

where you check the policies the ethical

guidelines the compliance brand safety

you create a safety score and then if

that score is above a threshold then you

have tool restrictions or you put it in

a sandbox environment and then if there

is nothing above the threshold then you

just allow the input and the system

keeps on going. So a system like this

would be used especially when PR is on

the line. Something public facing, a big

system representing the government would

be on the line. This is where you'd need

all of these checks and balances to make

sure that very few people can send an

input that is malicious that won't be

caught downstream in the system.

Ideally, the more upstream you can find

the issue, the sooner you can make sure

that the rest of the system is not

compromised. So from having built for

enterprise, I can tell you that one of

the best vectors for malicious

injections is anything with an open text

box or chat bots, which is why I

typically recommend as well as my team

that if you create an application that's

customerf facing and you have thousands

or tens of thousands of users, then

doing a pre-prompted strategy is

probably better where you have already

canned responses or canned prompts you

can click on where there's no open text

box. you can only go through a series of

clicking through a journey. The pros is

you definitely get a lot more risk

mitigation. This is better for

compliance and brand protection and user

safety most importantly. But the cons is

you could have some false positives

where things that look malicious aren't

malicious and vice versa. You're

obviously going to have some user

frustration if the system is being

adding way too much friction in the

process. But you have to balance that

level of friction with your need for

safety, which obviously safety should

take precedence. All right, we're almost

there. This is the second last design

pattern which is prioritization. So this

is about scoring tasks based on value,

risk, effort and urgency. So the

strategy in this pattern is you build

something called a dependency graph to

understand what needs to happen first.

What in sequence needs to happen before

the next following actions can follow

after. And if you want one of my

beautiful analogies, it's like having an

emergency room triage system that

handles the most critical cases first,

but it makes sure that everyone gets

seen eventually. So basically you have a

task and then you build this dependency

graph and this is what it could look

like where you have a task list. You

have task one 2 3 4 all the way to not

infinity but maybe 100 tasks. Then you

score each task based on a series of

scoring factors. So some of these

factors could be dependency count. So

how many things are affected by this

thing being solved or not being solved.

time sensitivity, effort required, risk

level, business value, and all of them

go together to get some form of overall

priority score. Once you yield that

priority score, that's where you have

something where you multiply value and

effort times urgency by risk. And

obviously, you can make this priority

formula whatever you want. But in this

case, this is the template you can use

to do that. So then you rank the tasks

based on the scores. You have an initial

order right here. Then you have a

scheduling strategy where either you're

doing something like load balancing,

task aging, applying quotas, and

depending on what it is we're actually

applying this for, it goes through this

process. It gets prioritized in a queue.

Then you execute the top task, you go

through, you then double check whether

or not priorities have shifted after

changing the first task. So once you

execute the top task, you shouldn't

necessarily go to the next sequence of

events. You should assess whether or not

there is a new priority. If there is a

new priority, then maybe you push

forward whatever was next. You save the

state and then you go to the new event

section. You recalculate the priorities

accordingly. So to make this a lot more

tactical and less airy fairy in the sky.

Imagine you were starting your day out

and your number one goal was to go to

the gym, then come back home and eat,

and then go to work. But what if you

went to the gym, you left the gym, there

was a huge accident on the highway or

the street, and now you're an hour late.

Maybe you skip going back home to

nourish yourself and you go to a

drive-thru along the way before you go

to work. In this case, doing action

number one presented environment number

two where you had to reassess the state

and then change the course of action.

So, using that example as a segue,

dynamic environments could be one major

application of where this makes sense.

where your initial plan might change

because the first thing you do might

cause a ripple effect of additional

variables coming into the equation that

change the next natural action that you

should do. So this would make sense in

task management systems, customer

service, manufacturing, healthcare and

devops. So the key value here is

obviously adaptability and transparency.

But the downside would be something like

context switching where maybe you assess

every single time especially if you're

using generative AI based agents and it

reassesses the next natural action or

the new priority in a different light in

one run versus the other run. So not

having a deterministic way to assess

whether or not you should go off course

and reassess the priorities becomes the

hardest part of the system, especially

if the environment or the dynamic

environment you're applying this in has

edge cases and variables all the time.

And last but not least, you have

exploration and discovery where this in

plain English is starting by broadly

exploring the knowledge space across

papers, data, and expert sources and

identifying patterns and clustering them

into themes. And this one is like a

detective gathering clues from

everywhere, finding patterns, then

focusing on the most promising leads. So

this one starts out with a research

goal. Then from that goal, you explore

your sources, whether it's domain

experts, data sets, academic papers.

Then you compile all that information

into one spot. You map the what's called

knowledge space. You identify the key

areas of interest and then you go to

cluster the themes. And what clustering

means in plain English is that you have

a series of data points that you can

converge and bring together to be able

to assess apples to apples, oranges to

oranges and see if there are patterns

are existing. Once you assess those

patterns, you then go through some

selection criteria. We look at some form

of a novelty score, potential impact,

knowledge gaps, feasibility. And the

whole point of this is to pick where you

should actually explore and you should

target. And once you know that and you

dial in, this is kind of like a research

agent design pattern where it's just

researching what is worth pursuing. And

once you do that deep investigation, you

extract some artifacts. These artifacts

could be conceptual models, they could

be expert contacts, they could be

curated data sets, bibliographies,

whatever it is contextually specific

that you're doing. And once you

synthesize these insights, you extract

key insights, add some open questions,

and maybe generate some hypotheses, you

go through and loop until you come to a

point where you conclude your

exploration and you have a generated

report if that is the output you're

looking for. You document your findings

and then you recommend the next steps.

So if I zoom out for a second, you can

imagine this as the system responsible

for things like perplexity deep

research, claw deep research. Anything

that has to go the next natural mile

will take 40 minutes, spin up multiple

agents to execute that research and

scope out what is worth looking at

versus what's not worth looking at.

Which citations are worth including in

the final analysis versus not. So this

is a full research agent design pattern.

And with that, there should be no

surprise that the best place to use this

is for research projects as well as

anywhere where you need to do some form

of really detailed competitive analysis.

And where it fits is research of all

kinds including academic R&D

departments. And one really cool use

case is drug discovery. Now the key

thing here is innovation enablement

where the agent can decide what is worth

pursuing or what topic and what angle of

that topic is worth diving into. And

then on the con side, the obvious con is

that it's timesensitive, very resource

heavy. There's a lot of generative AI

being used here and also sifting through

very large documents and zooming through

to see what is relevant and what's not

relevant. And I know this was a longer

video, but now that sums up 20 different

design patterns and there are 21 in the

book itself, but I excluded MCPS just

because I have covered it over and over

again. But wait, we're not done yet. I

do have a free gift for all of you. So,

all of this work I put together is in

this repository that I made available in

the second link in the description

below. It includes all the patterns I

mentioned from this book as well as a

series of aski art. And as art is one of

my new things I'm nerding out on where

it basically breaks down what this looks

like step by step. And then if you go

back to the last one, if you go to the

mermaid diagrams, this covers a lot of

the diagrams that I went through in

detail. So you have access to everything

that I put together and it will help you

really level up your agentic

understanding so you can apply this and

be a master of the craft. Now if this

very long video was helpful for you, it

helped save you the time to read the

book, then I'd super appreciate if you

left a comment down below so that the

algo can give this some extra love. And

the best thank you you can give me is

sharing this with someone else to

increase the visibility of the video as

well as the channel. And if you want to

go even deeper on things like agentic

patterns and prompt engineering and

everything that's involved in becoming

the super AI generalist of your dreams,

then check out the first link in the

description below. I run a community

where I put my heart and soul and do all

this kinds of stuff every single day

pretty much. So check out that first

link and maybe I'll see you inside. I'll

see you in the next one.

Master ALL 20 Agentic AI Design Patterns [Complete Course]

Mark Kashef

61 days ago

1:03:41

Agentic AI Systems

Rank #4

Description

Join My Community to Level Up ➡ https://www.skool.com/earlyaidopters/about 🚀 Gumroad Link to Assets in the Video: https://bit.ly/4nzkIVy 📅 Book a Meeting with Our Team: https://bit.ly/3Ml5AKW 🌐 Visit Our Website: https://bit.ly/4cD9jhG 🎬 Core Video Description What if you could learn the 20 agentic design patterns the pros actually use—without wading through a 400-page manual? In this practical 63-minute breakdown, I translate a Google engineer’s book into plain English and show you exactly how to apply each pattern in real workflows. You’ll see where each architecture shines, the tradeoffs that matter (cost, latency, failure modes), and quick ways to combine patterns for robust systems—so you can ship reliable agents faster and avoid over-engineering rabbit holes. Expect concise TL;DRs, labeled visuals, and a free repo packed with diagrams, ASCII flows, and Mermaid files to help you implement immediately. [Main Topic]: A practical, plain-English guide to 20 agentic design patterns [Key Benefits or Outcomes]: Understand when/why to use each pattern, reduce hallucinations and cost, add safety/quality gates, route work across models/agents, and ship production-ready automations with fewer retries and rollbacks [Tools or Techniques Covered]: Prompt chaining, routing, parallelization, reflection loops, tool use, planning/orchestration, multi-agent collaboration, memory management, learning/feedback, goal tracking, exception handling, human-in-the-loop, RAG, inter-agent comms, resource-aware model routing, reasoning strategies (CoT/ToT, debate), evaluation & monitoring, guardrails/safety, prioritization, exploration/discovery ⏳ TIMESTAMPS: 00:00 – Intro: Why agentic patterns separate pros from beginners 00:36 – What you’ll get: TL;DRs, visuals, free resources 00:54 – Pattern 1: Prompt Chaining (assembly-line steps & validations) 05:42 – Pattern 2: Routing (smart triage to specialist agents) 09:30 – Pattern 3: Parallelization (split, normalize, merge) 13:16 – Pattern 4: Reflection (critic → revise → pass) 15:51 – Pattern 5: Tool Use (discover, authorize, execute, fallback) 18:19 – Pattern 6: Planning (milestones, dependencies, constraints) 20:49 – Pattern 7: Multi-Agent Collaboration (manager + roles + shared memory) 23:45 – Pattern 8: Memory Management (short/episodic/long-term, retrieval) 26:42 – Pattern 9: Learning & Adaptation (feedback → prompts/policies/tests) 29:17 – Pattern 10: Goal Setting & Monitoring (KPIs, drift, course-correct) 31:34 – Pattern 11: Exception Handling & Recovery (classify, backoff, fallbacks) 34:11 – Pattern 12: Human-in-the-Loop (review cues & approvals) 36:01 – Pattern 13: Retrieval (RAG): parse, chunk, embed, rerank 38:14 – Pattern 14: Inter-Agent Communication (protocols, IDs, expiry) 43:08 – Pattern 15: Resource-Aware Optimization (route by cost/complexity) 46:35 – Pattern 16: Reasoning Techniques (CoT, ToT, self-consistency, debate) 49:57 – Pattern 17: Evaluation & Monitoring (golden sets, SLAs, drift) 52:44 – Pattern 18: Guardrails & Safety (PII, injection, sandboxing) 56:04 – Pattern 19: Prioritization (value×effort×urgency×risk, re-order) 59:29 – Pattern 20: Exploration & Discovery (map space, cluster, probe) 62:17 – Free Repo & Diagrams (ASCII + Mermaid) 63:08 – Final CTA: Share, comment, and join the community #AgenticAI #AIAgents #PromptEngineering #RAG #LLMEngineering #Automation #MCP #AIDesignPatterns #Evaluation #Guardrails #Routing #Parallelization #Reflection #AIforBusiness #WorkflowAutomation

Watch on YouTube