Building Agentic AI Workloads – Crash Course | DailyDevLists

Loading video player...

Full Transcript

15,586 words • EN

Hello everyone, welcome to ML 105. This

is going to be a quick overview of

agentic AI and agents.

So my name is Raali and I'm a machine

learning architect at Tech42. We are a

consulting company that specializes in

generative AI and machine learning.

I have a PhD in neuroscience and

bionformatics from McGill University in

2017. You can find some of my

peer-reviewed article on PubMed, Google

Scholar, Bioarchchive, and Archive.

I transitioned into industry about 5

years ago now and stayed within the

data, AI, ML, and cloud ecosystem. I'm

an AWS hero and a gold jacket

ambassador. And on a more personal note,

I'm a mother of two young girls.

What I wanted to do today for this

session is to give you an overview of

generative AI and then move into the

specifics of what agents are, when to

use them and not to use them, how to

design an agent, how to implement one,

talk a little bit about architectural

patterns, evaluations, and then we'll

talk about um how they might or might

not affect careers.

So artificial intelligence is not a new

field. It is thought to have started in

the 1940s with some of the earlier

papers attributed to McCulla and Pittz.

In 1943,

they wrote an article called a logical

calculus of the ideas imminent in

nervous activity which proposed a

mathematical model of biological

neurons.

In 1950, Alan Turing wrote his paper

computing machinery and intelligence

which proposed the touring test a

benchmark for machine intelligence. In

1956, the Dharmmouth workshop

established AI as a field of study and

John McCarthy coined the term artificial

intelligence.

We saw an AI winter between the 1950s

and the 1980s. In the 1990s, we saw a

machine learning renaissance with um

Jeffrey Hinton being awarded his PhD in

artificial intelligence from University

of Edinburgh in 1978.

I write some books that stated that when

this young PhD candidate was deciding on

his field of study, uh AI was thought to

be a career suicide because it was a

niche uh topic and a lot of people

advised him against it. Uh, of course,

fast forward 50 years today. Uh, Jeffrey

Hinton is one of the godfathers of AI.

He's gone to win the touring award and

the Nobel Prize among many other awards

and he's gone to change the world

really.

Um, in the 1990s we saw a lot of machine

systems winning in gamings uh in all

sorts of games against human contenders.

So in 1997, IBM's Deep Blue defeated

Gary Kasparov in chess. And in 2011, IBM

Watson won the TV game show Jeopardy. Of

course, that all um increases enthusiasm

about the potential of the field.

In 2010, we saw a deep learning boom.

In 2012, AlexNet, uh which is a deep

learning network that came from Jeffrey

Hinton's lab, performed well in an image

recognition competition. This of course

revived the field and the excitement

around all of the potential that uh

neural nets can have. In 2012, Deep

Mind's uh Alpha Go defeated the reigning

champion in the game of Go.

Today, we're living through a generative

AI boom. I'm sure you noticed from the

two letters AI being plastered on every

product that you purchase. Um some

notable events there is in 2017 the

transformer paper was released by Google

researchers and researchers at the

University of Toronto. This became the

blueprint for a lot of the large

language models we use today. And in

November 2022, OpenAI released chatbt

for the general public which really uh

got a lot of attention.

And I remember the deep learning boom. I

was in the field back then and I

remember all the excitement that

everybody in data or computer systems

had around AI. Of course, it was still a

niche um it was still very niche because

it required

very technical skill sets to be able to

build these models yourself, train them,

use them. But the generative AI boom is

very different. It has really

popularized AI in ways that made it

percolate across

all of society. And so even my mother

who has never touched a computer knows

what Shachi PT is and and uses it on her

phone um to discuss certain things. I've

heard uh young students brag on the

streets on how they've used Chachi PT to

um cheat on some exams maybe or people

in various jobs tell me how they've used

Chach to help with resumes and ideas and

just making more things things more

efficient and so unlike previous decades

of machine learning and AI the

generative AI boom is really more uh

popular and democratized.

So last year I gave a free uh code camp

lecture on machine learning fundamentals

that went more into the technical detail

of what machine learning is and all of

the details. If you're interested in

that, you can find that also on YouTube.

Um I don't want to go over a lot of the

information that I went over in that

course. So if you find that there's

missing background information that you

need, you might very much find it there.

So what differentiates the generative AI

boom that we're seeing today from

traditional ML on a technical

um plane?

So artificial intelligence sits on three

pillars. So it sits on um the pillar of

algorithms. So these are the

mathematical models that

map input to output.

Then you've got data which is used to

tune those algorithms for uh these

models to learn from. And then you've

got compute to be able to run these

training um and inference uh systems.

And so artificial intelligence really

rei relies on data algorithms and

compute.

And I want to show you how

machine learning and generative AI

differ across three these three pillars.

And so at the level of data training

data sets have

grown in orders of magnitude in size. So

in machine learning we used to talk

about megabytes to gigabytes of data. If

you had about a million examples in uh

machine learning then that's usually

thought to be sufficient. Now, of

course, that depends on the size of your

model, how clean your data is, what the

task you're trying um to learn is, but

generally we used to talk about

megabytes to gigabytes of data, and a

million examples used to be plenty.

Today, in generative AI, these models

are trained on the internet, practically

all of the human knowledge that we've

collected, and that's on the order of

terabytes to pabytes of data. And it's

estimated that these LLM see about 15

trillion tokens. And so again, that's

orders of magnitude greater than we used

to have than we do have in machine

learning.

In terms of model size, again, machine

learning models are usually in the

thousands to millions of parameters,

whereas the genetic AI models are in the

billions to trillions of parameters. So,

uh, GPD4, for example, is estimated to

be at 1.8 8 trillion uh parameters

large.

And at the level of compute, we've seen

of course advancements in um in chips uh

in CPUs and GPUs and how fast they can

go. But I think the biggest advancement

has been in being able to run sequential

uh models in parallel. So in machine

learning uh sequence models like uh RNNs

or LSTMs needed to to run serially and

that limited their scale and scope. Uh

whereas the transformer paper one of the

biggest advantages I think of that paper

was being able to parallelize the

training which of course made all of

this possible.

So this is one of my favorite resources

in this uh domain. And so if you take a

look at this now, it's a few years old.

I think it stops at 2020. But this shows

you some of the advancements that have

to happen on all three pillars. So the

algorithmic pillar, the data pillar, and

the compute [clears throat] pillar for

us to be able to get to where we are

today.

So what happens when you supercharge

data, algorithms, and compute all at the

same time? Well, you get magic. So you

get these Hulk models, these big

foundational models we're calling them,

that seem to make sense of human

knowledge or at least human language.

They need they seem to un to be able to

use human language uh well.

And so we've all had chats with with

Chachi Piti or Anthropic Claude or any

of the models out there and they make

sense. you can have a very good

conversation

and I'm not only talking in terms of

like sent sentence structure uh but they

are able to have a normal conversation

like another human would. Now, of

course, there's issues that we deal with

like hallucinations and accuracies, and

there's a lot of debates on whether

these models understand the world or

language or human culture. And that's a

huge debate that there's people on

either side of the spectrum or whether

they're capable of reasoning, but

it's we can agree that they're able to

use language quite well. So they can

read, they can write and um they can

extract information for example.

And so this I think was surprising to a

lot of people that

we would get to models that can do this

by simply making the models bigger and

shoving a whole lot of data in them.

The other thing that has happened is

we've gone from tasks that are specific

in machine learning to tasks that are

general in generative AI. So in machine

learning, the way we work is you think

of a particular task, a very specific

task you want to solve. You collect that

data specific to that task. You choose a

model that is capable of doing that task

and you train that model. And um it's

[clears throat] usually task by task.

And with generative AI because these

models understand human language, there

is a generality that has emerged from

that. So there's a lot of emerging tasks

that are capable to of doing by the

simple fact that they do understand

human language. So they can read, write,

summarize. We've seen them use as

assistants. They can get take

instructions because they can take

instructions. Um there's a lot of in uh

tasks that they can do. So we've moved

from tasks that are specific to more

general task execution

and we've seen the rise of model as a

service. So a machine learning model

used to be trained used to require

technical skills. You again you chose a

model, you cleaned up your data, um you

trained your model, you optimized it and

then you ran inference on it. And you

needed a lot of skills to do that. But

it was it was affordable because

everything fit on local uh hardware.

Today, these generative AI models have a

price tag of anywhere between a few

hundred,000 to a few billion dollars.

And so this made it prohibitive for

general people uh to create their own.

And so we've got uh big labs, big

companies like um Anthropic, Open AAI,

Amazon

um and others that have the capability

to train these models and they put them

out as a service. So these foundational

models we call them are capable of doing

a lot of things and these companies are

putting them out as um a pay as you go

uh pricing service.

And I want to point out that to get

here, we needed advancements in all

three pillars. If you had um a really

large model but not enough data, then

you wouldn't be able to fine-tune the

parameters correctly for that model to

learn enough. If you had

a lot of data but a very small model

then what would happen is that model is

uncapable

of storing and learning from all that

data. This which is what we call bias in

ML. And if you had data and you had

algorithms and we still had to run these

things serially it would have would have

taken more time than would be feasible

for us to push the advancements as fast

as we can. So GPT4 for example um is

estimated to have run for about 3 months

in in [clears throat] training that's

with the parallelization

and so we are here today because we can

we can get these very large models put a

whole lot of data in them and train them

uh on these compute systems in parallel.

So generative AI, agentic systems and

the spectrum of autonomy. So if you

think of agency, if you Google agency,

agency is thought to be the ability to

make choices, act intentionally and um

have some sort of control. And all these

systems in generative AI have some level

of agency or autonomy. In an LLM, the

agency comes in the output. So when you

call an LLM, when you invoke an LLM,

these foundational HULK models, the

response they give you is very

open-ended. These models tend to be

tokento token probabilistic generators.

And depending on certain configurations,

there could be vast changes in that

output. And so there is a lot of control

the LLM has uh over the output.

As these systems evolved, we started to

use these LMS in a loop mimicking a

chatbot.

Then um we started to see workflows

where we use them in bigger systems with

predefined uh steps.

And then the 2025 was called the year of

agents. We'll talk a lot more about what

specifically agents are and how they're

different from workflows, but these are

these autonomous systems again that have

more control over um the flow of uh an

application.

Today we're seeing deep agents which

again have more control

over your file system. They can spawn

other agents. They might have control

over uh a browser or so.

And of course, we're going to increase

the spectrum of these autonomous agentic

systems by giving them more and more

capabilities, agency, and autonomy.

And of course, the pinnacle of

[clears throat] the field is AGI,

artificial general intelligence. Um, I

find that there's not a clear definition

or an agreement in the field about what

AGI is or is not. um even experts in the

field can't agree whether uh AGI

is possible or what it is or what it is

what we're looking for but of course let

let's put that as the pinnacle in the

field and as things become clearer I

will let you know next year if we have a

better definition

so as we move through these systems

there is more and more system autonomy

we're giving them more agency all the

from the agency over output for an LLM

to the control flow in an agent. And of

course, as the system gains more

control, there's less and less human

intervention.

Um, the level of control or autonomy is

of course a spectrum and to avoid

bickering over what an agent is or where

that line of autonomy lies, uh, Andrew

Ang coined the term agentic systems. um

as an umbrella term for all of these

systems to acknowledge that there's some

agency everywhere.

So some um agentic milestone timelines

of course I mentioned that the

transformer paper which is the um the

foundation

uh algorithm for a lot of these models

today was released in 2017.

Chachi PT was released in 2022.

Um, agents started to appear more in

2023 with the React paper that um merged

reasoning and action together. And uh

today we are in uh January 2026 I'm

filming. So really this is about 2 3

years old uh not more than that. And the

reason I'm showing you this is because I

want you to realize that this is a

cutting edge field. Um things are

evolving before our eyes, right? So

there is not um

there is not a an authority that can

decide on certain things. What we're

seeing is many many different companies

come up with different ideas. There's a

lot of things in the literature. There's

a lot of companies trying different

things and the field is still very young

and it's maturing before our eyes. We're

practically bring uh uh building this

bridge as we cross it. And the reason

this matters is because one you have to

be very careful about how you use it in

applications understanding the the the

kind of the the vagueness that comes

with the systems being uh still

evolving. And then the other thing I

want to point out is that the time stamp

really matters. Um what we might know

today might not have been very clear 6

months ago. something we thought was

important 6 months ago was might not be

super important today. So when you look

at any resource in generative AI or

agents look at the time stamp because

knowledge is evolving very very fast and

the generative AI story in general has

been very compressed meaning that um

there is a lot of excitement there's a

lot of hype and there's a lot of money

uh being poured into this and this

causes really really fast evolution in

the space and so really look at the time

stamp and understand that um again this

is all h unfolding before our eyes.

So what is an agent?

I looked at a lot of definitions from

different resources and again there's

many many different ways of looking at

it but this is the one I like. So a

generative AI agent is a software entity

designed to perceive its environment,

make decisions and take actions to

achieve specific goals. And um again so

this is a software system and what

happens is it's able to plan the brain

of the system is an LLM. It's a

foundational model and it helps it to

plan a task. So you you give it a task.

You say okay you solve this for me or

give me this answer or help me with

this. And it uses the LLM to plan and

decompose that task.

It's able to act. It has tools and then

it's able to observe the output of those

tools and then it goes back and it plans

acts and observes. So it's a loop of

plan act observe plan act observe until

the solution is achieved. And so this is

what this looks like in pseudo code. Um

so we've got this while loop here. It

takes user input. So this is the actual

task that it's supposed to do. And then

it invokes an LLM which again is the

brain of the agent. And this is supposed

to give you um a plan of how to solve

this task. So it decomposes the task

into its subtasks. And then while

the if the response has a tool, if it

wants an action, then that action is

invoked. It's executed and then the

response is sent back to the LLM. So the

LLM both is the observer and the

planner.

And then that loop goes on and on and on

until there's no more action to be done.

and then you get your final response. So

again to show you what this looks like

um in another perspective, there's a

user and they invoke an agent with a

specific request. That agent goes into a

loop and this loop invokes an LLM with

the task. That LLM gives a response, a

reasoning or a tool call.

And then depending on this um if it's a

tool execution then the tool is executed

and then the response is sent back and

then it's sent back to the LLM to see

what further action needs to be taken

and we keep looping in this system until

there is no more action to be done and

the s the answer uh is resolved and then

it's sent back to the user.

So how is an agent different from a

workflow? Well, I want to demonstrate

this by giving you um a task for

example. So, let's say I traveled with

my kids to a new city and what I want to

do is I want to fill our time there with

activities.

And so, we have a list of activities

that we pref that we like to do. And

what we want to do what I want to do is

I want to check the activities available

in that particular city. I want to go to

the websites of these activities and

check if they're available for a

particular time and date. And then I

want to check our own calendars for

availability for time and date. And if

there is um a correspondence, then I

want to book the activity and pay for

it. And then I would add the activity to

the calendar. Of course, I could code

all of this in code. I could just code

it um with any language. But let's say I

want to do it um in a workflow that

involves an LLM. So what I want to do is

I'm going to ask the LLM for all of the

popular activities in Montreal with

their websites.

I'm going to take that list and I'm

going to call this function uh activity

availability that's going to go scrape

that website and check when the or call

an API for that website and then tell me

if um a particular time slot is

available. And then I'm going to check

our own calendars for availability. And

then I'm going to keep doing that loop

until I find activities that fit our

schedule. And then I'm going to call

this function book activity. And then

I'm going to update the calendar with

this function. And so a workflow is a

set of um predetermined steps in a

particular sequence that's coded up.

Now, if I want to do this with an agent,

what I'm going to do is I'm I have this

agent and I'm going to say, well, you

are a booking activity

uh agent, choose activities based on

customer preferences, book an activity

uh and update the calendar. You have

these following tools. So, I'm going to

give it all of these functions and tools

that I had pre uh set in in the

workflow, [clears throat] but I'm just

going to tell it these tools are

available. I don't tell it how to solve

the problem. I don't tell it how to do

anything. What I do tell it is book me

activities in Montreal for these

particular days and it's going to do

everything it needs to do with uh by

using its tools to solve that problem.

And again, I'm going to give you an

example of this after. And so the main

difference between an agent and a

workflow is that an agent has dynamic

control flow um of the execution devised

by the LLM at runtime. So this is not

predetermined pre-coded paths. These are

determined by the agent by the LLM at

runtime. Whereas workflows are static

predefined coded graphs. So if you're

going to take anything

from this lecture, it's this that agents

have dynamic control flow um devised by

the LLM at runtime whereas the pre uh

the workflow is predefined coded graphs.

So agents are becoming very popular. Of

course they are a general use uh

technology. They can fit across

verticals and so we are seeing them in

customer service, HR, R&D. We're seeing

them across the board at different

companies. Um some pros of agents and

these are true for uh computer systems

in general is that um they're available,

right? They're available 24/7, 365 days

a year. They don't need uh breaks beyond

maintenance windows. Um they're

multilingual. So these LLMs support over

200 languages uh right off the b right

out of the box. And so you really don't

need um any extra work to support

different languages. Efficiency. um they

do improve response times

because a system that's well set up and

has all of its resources and data set up

um in an efficient way is going to be

faster than humans looking through uh

databases and HR

they're consistent so they um although

the the execution path of agents is

quite large it still is smaller than the

difference between humanto human

variation I

They're convenient. They offer uh

self-service options at any time and

they scale. Any computer system that's

well done should be able to scale um uh

very fast and well. And um in terms of

cost, computer systems should be cheaper

than humans.

Now, in terms of cons, um one of the

bigger things I think is that they're

not human. I I think this is

understated. Um, and I've seen a lot of

companies go for

AI first customer service this year,

which I have to say has been very, very

frustrating. Um, these systems are good.

They're powerful, but if you want to set

them up, you have to set them up very

well. An agent that breaks and does not

work and is slow is very, very, very

frustrating, which defeats the point.

And I think from my perspective anyway,

I really enjoy human contact. And so um

they're not human. These agents are not

human. And to me that's that's a con.

Um the technology is still maturing. So

the LLM themselves are still increasing

um in in potential. We're still learning

what they can and cannot do and how we

can uh optimize them more. So, a few

months ago, we were dealing with his

hallucinations, which I think um we see

less and less of, but we're still

dealing with like context window size

and and all sorts of um things that they

uh have issues with. The application

space is still maturing. So, not only

the technology itself is moving and

evolving, how we use it, we're still

learning how to do that. what um what

are things that it doesn't do very well

that we need to uh cover up for? What

how do we build safe ethical

applications on top of it? How do we um

get the human the user experience to the

point where it needs to be? Again,

technology is maturing, but the

application space is also still maturing

and cost. So it's true that a computer

system is usually cheaper than a human,

but an agent is usually more expensive

than other software systems. So again,

it depends on the size and how you ar

how you architect and what you do, but

usually they do come out a little bit

more expensive than other systems.

So patterns and antiatterns for agents,

when to use or not to use agents. So

we've got our workflow right here and

we've got the agent right here. And

again um as I mentioned the workflow is

predetermined steps whereas the agent

has control over the flow um of the path

the execution path. And so if you have a

missionritical or error sensitive

application or field you should be

leaning more towards um a workflow. And

again as you go through agents the agent

has more control and humans have less

control. And so if you are in a mission

critical state, you want more human

control and so you should be leaning

towards um workflows. If you're in a

regulated industry or need deterministic

outcomes, you could you should probably

lean towards um workflows. If you're

latency sensitive, agents do add a

little bit of latency. So again,

workflows might be a better option. If

you're cost sensitive, it's easier to

estimate the costs of a workflow than it

is for an agent. And um but if you're

looking for performance, agents do tend

to for perform on average better just

because of that loop because there's a

loop over uh the information they do on

average to perform better than a

workflow. And of course, if you are if

you're okay with flexibility or you

don't know exactly how to solve a

problem, then um an agent might be a

better option. And if you're comfortable

with model driven decision-m or

appreciated, then an agent is a better

solution. Now again, all of these things

can be dealt with in either system.

There's ways to get over all of these,

but again, this is just a an an idea of

of um what would work better.

And so these are some questions to ask.

Um so if is the application mission

critical, error sensitive or in a highly

regulated industry is the task path

predictable or can be predefined. So do

you know how to solve the problem in the

sequence of events that need to happen?

Is the value of the task worth the cost?

That's very important. I think people

are undermining this question. And is

latency critical? And depending on these

answers, I would say use an agent in

cases where error is tolerable,

openminded,

the execution path is harder to code and

cost is not an issue and latency can be

tolerated.

Okay, so components of an agent. So

let's look at agents um more as a deep

dive and look at actually what makes an

agent.

So I've looked at a lot of different

references and again this is um the

field is still evolving. You see a lot

of references and a lot of people um

trying to define things in their own

words. Uh what I've tried to do is

compile a list of resources. And what I

want to show you is the elements that

have consistently come up over and over

in many references. And so almost

everybody agrees that an agent needs to

have a purpose or a goal, right? it it's

solving a task and so it has to to have

that goal. It needs to be able of it

needs to be capable of reasoning or

planning. It needs to be able to

decompose that task into this subtasks

and be able to plan the execution. It

needs to have memory to be able to um

have a long discussion

and it needs to have tools or actions to

be able to um solve things on your

behalf. Now there's [clears throat] a

lot of extra things that you see in

different references. Of course you can

have guard drills and communication for

multi- aent systems and you can add

learning so that these systems learn

from experience and there's so many

other things that you can add. But I

think the four mentioned above are the

ones that most people agree on. The

agent needs to have a purpose, needs to

be able to reason and plan. It needs to

have u memory and it needs to have some

tools or actions.

So what does that look like? Okay, so

we've got this system, this computer

system, and we're saying it needs to be

able to reason or plan. And what this

comes down to is an LLM. Okay, so that

part of the system is an LLM.

we we're saying it needs to have a

purpose, a goal or mission or identity,

a task it needs to solve. And this comes

in the form of a system prompt. And

again, I'll give you examples of what

this looks like. The tools or actions

that it has usually come down to

functions or API calls. And uh memory

can come in different forms. It could be

short-term memory or long-term memory.

So let's let's dig into each of these

four in a bit more detail. So choosing

an LLM again the LLM is the brain of the

agent. Okay. So um it's going to help it

uh understand the task break it down and

it's going to help it evaluate the

outputs of the tools and uh so think of

it as the brain of the system. So how do

you choose an LLM? Well you need to

consider several criteria in choosing a

model. The task complexity matters

right? If you're using an agent to solve

simber tasks, then maybe you can choose

a smaller model from a cost point of

view. If it's a more complex task, then

you might need to be to use a bigger

model. These models should have

reasoning capabilities. Not all models

are trained to have to reason. So, uh

reasoning capabilities of course make

for better um agentic models.

the context window matters because you

want to be able to fit more information

uh for this um

for for the agent. Some models are have

the capability of tool calling. So this

is the way that the model um uh chooses

a tool call. If it doesn't if a model

cannot does not have tool calling

capabilities, that's not really too much

of a problem and that you would have to

explain to it how to return a tool call.

But again um if it does then that just

makes life a little easier.

You can look at latency of models. This

is going to be part of an application

and so of course the latency of the

model affects the overall application

latency and so faster models might be

better uh for your application and of

course cost. These are foundational

models that you usually pay per token

unless you're hosting your own. And so

you want to have an idea of the cost of

using that model and whether you're

comfortable with that.

Compliance and data privacy also play a

role and that you want to um know if

your field requires any compliance um

regulations and [snorts] that of course

could mean that you would host your

model or use a service or if you would

to use a service whether you would need

to read uh the provider agreement to see

what if it aligns with your

requirements.

Now, there's a lot of information online

and there's a lot of different uh

benchmarks and leaderboards online that

um can help. Some of them are very

specific to agents which uh could be

helpful. So, this one from Hugging Face,

for example, is a uh an agent

leaderboard and they'll tell you

depending on the different

uh verticals or other things that you

want, what models um are working best

[clears throat] for agents.

So we said that uh this agent needs a

character or an identity. This usually

comes in the form of a system prompt.

The system prompt is um kind of um a

definition. Imagine you've got a like a

junior intern. The system prompt is

really that one, you know, like the the

page you give that intern about what

they're who they are, what they're

expected to do, and what um what tasks

they need to do. So, so for example,

here we've got an agent and we say,

okay, you are a financial adviser, you

are eloquent and professional. For this

one, uh we say you are a medical

assistant, you are caring and

empathetic. And here we say, you are a

teen adviser, you are young and hip. And

again, these are This is a these are

instructions in the English language.

They're just um natural language

statements. And of course here I've put

only two sentences but in production of

course these are longer. They can

include anything from the tone of voice

um who you are, what to do, what not to

do, uh very specific instructions in

specific situations. And so usually

these system prompts are a lot larger in

production.

And so assistant prompt is an agent

character persona plus its purpose and

task and instructions.

So

an agent needs memory and uh the reason

it needs memory is because uh LMS are

stateless and again I'm going to demo

this when we start playing with code a

little bit but LMS themselves do not

retain information as you talk to them.

The

agent has three types of memory. There's

the intrinsic memory and what this is is

these are model parameters. So this is

the information that was retained from

the training process and unless you

retrain that [clears throat] model

then that is stable. The intrinsic

memory is stable. It doesn't change u

but it does change from model to model.

Another form of memory is short memory.

This is within session memory. Okay. So

um and what this looks like is it's the

context window. What usually happens and

I'm going to demo this again with code

is what ends up happening is what you do

is you append the conversation at the

end of the context window and so the

agent has an a running um

uh

JSON of the conversation that's

happening and so it can recall uh

earlier information. There's a lot of um

effort there's it's art and science of

how to run context management. So what

goes in the context window? What

information needs to be retained? What

information needs to be dropped or

summarized or compressed because the

context window is limited in the amount

of tokens that you can put in it. Uh

what you put in it becomes important and

[clears throat] that's called context

management.

Another form of memory is long-term

memory and this is across session. So,

this is let's say you've um you're

talking to your customer service agent

and you're talking to a customer. You

want to be able to collect long-term

information about that customer, what

their complaint last time was, what

their preference is, what their

information is, and that comes in the

form of external storage usually that

the agent has access to.

And as the short-term uh memory

um gets [clears throat] clogged, there's

information um that is moved to the

external storage. And so um what

information is stored, how it's indexed,

how it's retrieved. Again, that becomes

very important.

So moving on to tools, agents have tools

and tools are

interesting because LM have limitations

that tools can help overcome. So these

LMS

um are stateless and we fix that with

the memory component. But the LMS

themselves have a cutoff date after

their training. So if an LLM was trained

in 2022 for example, then it doesn't

know any information beyond 2022. It

doesn't know who the president is if if

he or she were elected after 2022. It

doesn't have information about the time

or the date or the weather anywhere. And

all of this limitation of the LLM we can

um overcome by adding tools by giving

the LM tools that can provide that

information.

And so agents can take several actions

and it could be capability extensions.

It could be it could be um a function

call or an API to do something. It could

be knowledge augmentation. So it could

be retrieving data or context from

databases. And it could be

orchestration. So it could be calling

other agents or communicating with

systems.

And so tools can come in different

forms. They could be again function

calls in any language, could be an API

call, could be data retrieval from an

external database. We're seeing a lot of

browser actions, um code executions,

file system control. And again, as

[clears throat] we move um more and more

we push that agency and autonomy

spectrum to the system, we're going to

see we're going to be able to give it

more and more control. So implementing

an agent, what does that look like from

a code point of view? So I've written

some code, very very basic code in

Python to show you some things. I want

to show you a single LM call. Um how do

you invoke one of these um LLMs if you

have not done so before? I want to show

you how you could put that in a loop and

mimic a chatbot for example.

I want to show you a very simple agent

again from uh scratch in Python no

frameworks how you would add memory and

then I want to show you how you would do

an agent in in one of the frameworks. So

like lank chain we'd add memory to that

and then I'll show you some

architecture. So this this repository is

is open source. Um you can find all that

code there.

So this is what's in the repository. So

we're going to start with the LLM call.

And what I want you to see here is that

uh there's no frameworks. It's just

boto3. So I'm using AWS. I'm using the

model anthropic cloud 4.5 and I'm using

bedrock API to call that model.

And so I'm taking a user input

and then I'm sending it to the LLM

through the converse API. The converse

API is really nice from Bedrock because

all these models, they come from

different providers and so they have a

different expectation of input. They

have different parameters and different

JSON structures. And what the Converse

API tries to do is to standardize that

for you and it makes it a lot easier to

switch this model without changing the

the LLM call. And so we're calling this

um LLM here, this model with a converse

API with our query. And then we're

taking the output and just printing it

out. So it's very simple uh basic code.

So let me run this. And so this is I'm

going to say hello. Actually this is a

single call. So I'm going to say uh tell

me more

about Montreal.

And so it's going to come here and tell

me uh Montreal is Canada's largest

second the second largest city and it's

in Quebec and so on. Okay. And as you

see I get my prompt back because this is

a single call. And so now I've got my

prompt back. Now what I want to do here

is take a look at this second code which

is in a loop. So this is the exact same

code. Okay, exact same code, exact same

model, exact same API. The only

difference is that there's this while

true loop. And so now we're doing we're

running in a loop until the user enters

quit and then everything else is the

same. Okay. And so now if I run this

I'm gonna say

hello

my name is roller.

So this is my answer.

Okay. So let me run this again

also. Hello

my name is

Then the system is going to greet me. Hi

Rola, nice to meet you. What can I do

for you? And I got again the I didn't

get my prompt back. I can speak to it

again. I'm going to say tell me more

about Montreal.

And it's going to tell me a little bit

more about the city. And I get I get the

prompt back. Well, I want to demonstrate

a couple of things. I want to show you

that there is no memory. These things

are stateless. There's no memory. So I

did uh give it my name at the beginning.

It um greeted me by name. But now if I

say what is my

name?

Then it's going to say I don't know your

name. You haven't told me yet. So the

these things are stateless. The other

thing I want to show you is it does not

have an understanding of things beyond

its cutoff point. So, what is the day

today?

It's going to say, "I don't have access

to current date. I don't have real uh

time information." That's true if we ask

it what time it is or

what the weather is like in Montreal

and so again I do not have access to

real time information

and so now we're creating an agent and

again this is the exact same code with a

few changes I'm trying to make

incremental changes to the code so that

you see the difference and so again I'm

using the same model same converse same

uh bedrock API and what I've created

here is some tools so this is a

calculator tool very simple this is a

mocking a get weather tool which just

returns um some information based on

city

and this is a get date tool and then a

get time tool. Again, we've asked the

model uh what the what the date is, what

the time is or what the weather is in in

in New York right now and it doesn't

have that information because of its

cutoff point. It doesn't have access to

real-time information.

And so the converse call that we've done

before, it's the same one. I just

packaged it in a call lm function. And

then we've got a tool execution function

to to execute the the tool if the system

decided that it needs a tool. And this

is what the system prompt is. Um you are

a helpful personal assistant. Um based

on the user's message, decide if you

need a tool to or respond directly. You

have access to these tools.

Okay.

And so

what we're going to do is we're going to

call the LLM with the with the user

input and the system prompt and then

we're going to parse the output and if

the output has a tool call

then we're going to execute that tool

call and then we're going to go back and

do the same thing. Okay, so this is in a

loop. It's in a while true loop and

that's what we're going to do. We're

take going to take the user input. We're

going to send it to LM, see if the LM

needs a tool executed. If it does,

execute that tool, and then return it

back to the LM in a loop until

[clears throat] we get an answer. Okay,

so let me run this. And so again, I'm

going to do the same thing. I'm going to

say "Hello

my name is Rola."

And so again, it greets me. And I'm

going to say, "What time is it?"

And so when I say, "What time is it?"

It's using a tool. It's using the get

time tool and it's telling me that it's

1:43.

Okay. What

date is it today?

Again, it's going to call this tool get

date and it's going to give me the date.

It's January 2nd, 2026.

Now, if I say

um where is Montreal located?

Now this is information you see here. So

it it didn't use a tool. It this is

information from the module itself. So

you can see here if it's using a tool I

have this symbol here uh where it's

using a tool. Here it answering from the

actual parameters of the model itself.

And I can ask what is the weather like

in New York now?

And then it's calling the tool to get

weather. Now I want to show you that um

I didn't do anything special here. I

defined some tools. I got some I I

created some tools with reasonable names

and um a good dock string, but I didn't

plug it in in any particular way. I just

told the agent that these exist and

somehow it knows how to use them. And

what we've done here in the loop is

we've asked some questions about the

date, the time, the weather that it

couldn't answer. And so we've

supplemented this LLM with some tools to

create this agent. Now I still want to

um ask what is my name?

And you can see I don't know your name.

I haven't uh we haven't been introduced

yet. And so we're going to add memory.

And so here again, this is the exact

same code from before. This is all of

the same tools. And here this is the

call lm function and then the execution

tool execution function. And then

there's this function to update memory.

And this is doing a very simple thing

where it's appending every conversation,

every user input or assistant input. Uh

it's appending it back to the JSON.

Okay? So, it's a very crude way of doing

it where you're just tagging previous

conversation on top at the end of the um

current ones. And so, it's the exact

same code, but what we're doing is now

we're calling the lm and we're adding

the history. It has a history to it.

Okay.

And so, we're going to run this now and

we're going to say hello,

my name is

Okay, so it greeted me. What is the date

today?

It's January second. What time is it? So

again, it's using all these tools that

we gave it. I can ask it um

again it use the tool and if I can ask

it something that it knows itself from

its internal uh parameters

its internal knowledge base then here it

did not use a tool is the system um

and it tells me is a fascinating city

which is true I agree. Okay. So now what

I wanted to show you here is I want to

ask it what is my name?

And now it's going to say your name is

Rola. You told me that at the beginning

of the conversation and that's true. And

we're going to say another thing is um

let's say summarize

our interactions. [clears throat]

And so it's going to say here's the

summary of our conversation. you

introduced yourself and we greeted each

other. You asked about the date and time

and then you asked about the weather and

then we you asked me about Montreal. And

so now it has an understanding of all of

the steps that have happened previously.

And that has and that the way that works

is because we've appended these

conversations at the end of um in the

context window. We've added them to the

context window. Okay.

Okay. So I want to show you now how we

would create this agent um with a uh

with a framework. So again all of this

has been just base Python

but these there are um

frameworks that help us build these

agents. We have to we don't have to do

it from scratch. And so we import the

create agent from Langchain. Again, same

model, still going through bedrock,

but now we have these tools. We've added

this decorator to them.

Same stuff. And then we've got this tool

list. Now I've I just added all of my

tools into a tool array. And then we've

create this agent with the create agent

function. We give it an LLM. We give it

a tool. And we give it a system prompt.

And this is what I want you to

understand is um from the components of

an agent. What matters is to understand

um what is important for an agent. All

of these frameworks that will allow you

to create an agent will have slots for

these important uh components. And so

you can see here the LLM is the brain of

the agent, the tools and the system

prompts. And in the second one, we've

not added memory here, but we're going

to add it in the second one. But then

when you go to the documentation, you're

going to see that you have slots for all

of the things that you can do.

And so here we've we invoke the agent.

So with the the what the framework does

is it of course makes it a lot easier um

to build these. There's no tool

execution. It takes care of that. Here

I've removed the tool execution. Um, so

it removes all of the little glue code

that you need because they take care of

that behind the scenes. So we've we're

just going to create the agent and then

we're going to invoke the agent with the

user uh input. [clears throat] So again,

what time is it?

And it's going to use the tool execute

it and give me a time. Okay. And again,

you can test all of the examples. You

have access to this code. But the idea

is to see that these frameworks make

life easier. Now what's important to

know about frameworks is that because

the system is evolving these tools

um their stability can be in question.

So um lang chain for example changed

their version to 1.0 Oh about a month

ago now and that kind of changed a lot

of how the code is written and what is

supported. Of course they are going to

uh support uh the previous versions for

I think a year or two but again you have

to know that the models themselves can

be deprecated as new models come in and

um the frameworks themselves can change

very dramatically. So when um as an

architect when I build a system based on

foundational services I know that the

expiry date of the system

is quite far in the future. I know that

if it's based on good foundational

architecture and good services that that

system is going to to work well for a

very long time. That is not true when I

build a generative AI system. And that's

not through a fault of my architecture

or engineering. It's really about um the

the the models themselves being

deprecated or changed, the frameworks

themselves uh being changed and the

because there's a lot of um there's just

a lot of evolution in the field. So just

be mindful of that. Okay. So now we're

going to create the same exact agent,

but we're going to add memory. So we've

got this insaver memory and then you can

see that everything is the same but then

here we're adding a checkpointter um

this is short-term memory that we're

adding. Okay. So let me show you the one

uh with memory. Uh hello.

Okay. It greets me. What time is it?

And then it gets the tool and then I

want to say what is my name

and it can recall my name. Okay. And so

again it's a lot um simpler with these

frameworks to add memory.

[clears throat]

Okay.

So again this code is online and you can

play with it.

Um what I've shown you is very very

simple because I want you to get the

fundamentals of it. But of course

there's a lot of topics that um

there's a lot of extra complexities and

layers on top of this. So you there's a

lot that goes into model selection,

prompt engineering, context engineering,

data management, uh what data do you

store, how do you rank it, how do you

index it, how do you retrieve it, um

memory and what types of memory you

want. um tooling, interfaces,

architectural choices, deployment

approaches, security and compliance,

orchestration. At the end of the day,

these are systems like any other system

and you have to take uh quite a bit of

decisions that that um for that system.

Okay. So, agentic architectural

patterns. So, if you know the predefined

sequence of events that need to happen

to solve a particular problem, then you

can code that up in a workflow. If you

don't if the solution space is too large

or if you think an agent would do a

better job then you can build a single

agent and we just built one using Python

or lang chain

or if the problem is too complex then

you can build a multi- aent system and

there's a lot of different patterns that

are there's a lot of anecdotal examples

across different

um companies and what they're doing

there's very few that are emerging as

repetitive

Um, this is again a space that is

evolving and so with within 6 months we

should be able to see more things that

are repeating across different

companies. But for now I'm just showing

you the two that we're seeing over and

over. And so you can build hierarchical

supervisor supervise systems. Um, this

is an system where there's a supervisor

agent that speaks to more specialized

agents but usually they cannot speak to

each other. or there's the swarm pattern

where you've got several agents all of

which can speak to uh one another. And I

want to show you what the difference um

this makes. So what I want to do here

with this notebook is to show you the

difference between the swarm

architecture and the supervisor

architecture. So I'm just going to run

all of these cells so we don't have to

wait for them. So we import some

libraries here. We import uh we install

some libraries here. we import pandas

and set up um its printing settings.

There's some utility functions that help

me extract some information and print it

out.

And what I've got here is a single

agent. So again, we're using length

chain and I've got these three functions

add, multiply, and divide. And we create

an agent and um that uses the claude 4.5

as a brain. And it has access to these

tools, the add, multiply, and divide.

and we tell it you are a math expert and

I'm invoking it with this long

expression. Now if the math expression

is not too long the um LM can do it

without using any tools or any agents.

So I try to make it a little longer and

you see it's it solves it and it shows

us the breakdown.

Now here what happens is uh I'm creating

a supervisor architecture with three

agents. So, I've got the add tool, the

multiply tool, the divide tool, and I

create an agent with each of these

tools. So, I create agent. I create an

addition agent that only has the add

tool, a multiplication agent that has

the multiply tool, and then a division

agent that has the divide tool. And I

create a supervisor agent that has

access to these three agents, and I tell

it that you are a supervisor, a team

supervisor managing math experts. For

the addition, use the add agent, the

multiplication, use the multiply agent,

and the division, add the divide agent.

and then I invoke it with the exact same

tool.

So what I want to show you here is what

actually happens. So we've got a

supervisor who gets the question and

then it transfers it to the addition

agent and then the addition agent does

its job and then throws it back to the

supervisor agent.

Then the supervisor agent sends it to

the multiply agent.

And then the multiply agent does their

job and sends it back to the supervisor

agent. And then the supervisor agent

sends it to the divide agent which again

sends it back to the supervisor agent

which gives us the final results. And so

the way it works is again the three

agents don't have interactions among

each other and they can only speak to

the supervisor agent. So things go down

the chain, the task gets executed, it

goes back up, the decision is made where

next to go, and then you've got this hop

back and forth uh between the

specialized agent and the supervisor

agent. This, of course, this interaction

cost us about 16 total hops

um with 10 agent actions and six

transfers. It cost us about um 8,000

input tokens and a total of 700 output

tokens. Now let's solve that same

problem with the swarm architecture. So

here again I've got the add multiply

divide and again I've created agents

that are um have one tool. So this is

the add agent. It has the add agent but

it can also speak to the multiply agent

and the divide agent. This is the um

multiply agent. It has the multiply tool

but it can also speak to the add agent

and divide agent. And then the divide

agent has the divide tool and it can

speak to the other two agents. Okay. And

then we create the swarm architecture

with the three agents with the default

active agent being the ad agent. So we

are speaking the first agent to to to

speak to is the ad agent and then we

compile it and then we invoke it again

with the exact same uh examples. And so

what we would see here

is that the add agent receives our our

query.

It does it adds what it needs to add and

then it uh transferred to the multiply

agent. The multiply agent again does

what it needs to do and then it

transfers it to the divide agent and

then the divide agent um solves its part

and then out gives us the output. And so

this interaction because everybody can

speak to everybody cost us eight

different um interactions

[clears throat]

with only two transfers. It cost us five

about 5,000 input tokens and 500 output

tokens. So you could see the same

problem with the different the two

different architectures and what it

would cost.

And so which one should you use? Well, a

lot of people discourage against multi-

aent systems. So, what I showed you in

the code is we created a single agent

with three tools or we created three

different agent each with one tool in a

supervisor, a hierarchical architecture

or a swarm architecture. Of course,

these are toy examples. We wouldn't

create a single agent with a single

tool, but just for you to to to kind of

see the difference. Um people discourage

the use of multi- aents because there's

a lot of overhead in um in transfers in

managing memory in managing information

across systems. Of course, if the

task is complex enough for a single

agent, then a multi- aent might be

useful. Okay. So the advice is I think

that if you can get away with a single

agent then you should try to get away

with a single agent just because of the

overhead needed in setting up memory and

um just overhead of a multi- aent

system. However, if the system gets too

complex that um other specialized agents

might be useful then you can move to a

multi- aent system. Again, you have to

remember that there's a lot of

limitations on the context window on um

if you clog the context window with too

many tools with too many instructions,

then we do see performance degradation

and so specialized agents might be

better used. Now, in the multi- aent

domain, um should you use supervisor or

swarm? I've heard different opinions.

Um, smaller teams with with simple tasks

prefer Swarm just because of what I

showed you in terms of less overhead and

transfer up and down. However, I've also

heard that as the task gets too big, the

solution space in a swarm is a lot

bigger than a supervisor just because

there's so many different uh paths that

can be taken with um with more transfers

being possible. And so with complexity,

it does seem like a supervisor um

solution might be easier to debug. But

again, you would have to depending on on

the size of your task and how you set it

up, it's best to experiment with your

own um example. And again, architectures

are changing. So we're seeing um

[clears throat]

agents that can spawn other agents. We

we are seeing examples of agent as a

tool. We're seeing a lot of different

patterns that are emerging. But again

nothing is as yet stuck. Um and so maybe

another 6 months I I will revise this.

Um so let's talk about agent interface

uh protocols um standardization and

interoperability. So agents need to

interface with tools. They need to be

able to call tools and use their

outputs. They need to be able to

interface with data sources and

databases. They need to be able to talk

to a user and they need to be able to

talk to other agents in multi- aent

systems. And in the examples that we've

just built, a lot of it is based on the

English language. So in selecting a

tool, the agent is only relying on the

the naming the name of the function and

its dock strings. Um and so if tools

become very similar, it might be

confusing.

And so what [clears throat] companies

are doing are try they're trying to

create standardization protocols at the

interfaces of agents. So for agents

using tools and data, anthropic

released uh the model context protocol

MCP which

>> [clears throat]

>> um is supposed to standardize

how we use tools and uh data at the

agent agent um interface

uh Google released agent to agent or A2A

protocol and between humans and um bots

uh and agents. There's the agent user

interactions or AGUI which came from a

collaboration between Copilot, Crew AI

and Lang Chain. Now again, these are

also changing and there might be more

out there. I think the one that's really

interesting is MCP. Um last month uh

Anthropic donated MCP to the Linux

Foundation under a new

suborganization

that is also uh chaper owned by OpenAI.

So OpenAI uh Anthropic and the Linux

Foundation are going to be incubating

MCP and uh so that's going to definitely

make it uh more popular

soon. So the main purpose of all of

these is to really uh ensure smooth

handoffs and to ensure interoperability

and reuse. So when you're building your

own agent, you're building your own

tools from scratch. Uh you're setting up

your systems. But if you want to build

another system, then you can you would

have to set up a lot of the systems over

and over again. If however we build

tools with the same interface, then you

can think of it as plugandplay. Think of

it as a USB

um or an HDMI portal. If we can make

sure that the interfaces are the same,

then we can plug and play different

systems. And today we do have um MCP

hubs where people can put tools and we

can uh use some of these tools and that

will help us just grow the ecosystem a

lot faster.

Evaluating agents. So how do we evaluate

agents? Well, there are several layers

to agentic systems, right? And at the

heart of it, the brain of the agent is

an LLM. It's a foundational model. On

top of that is an agent system. So what

that means is beyond the LLM, which is

the brain of the agent, you have tools,

you have memory, you have um maybe some

guard drills that you've added or

communication protocols or or or and

then on top of that, agentic system is a

deployed application, right? you've uh

you package this up and then you put in

front of users. And so you have to

understand the layers of the onion as

you evaluate. There's evaluations that

can be um asked or done at each of these

levels. So from an LLM point of view,

you want to know well is the LM

following instructions? Is it capable?

Is it capable of doing that task? Is it

is there is it accurate? Is it

hallucinating? hallucinating, is it

consistent, is it toxic, do I need

guardrail? So, there's questions that

can be asked on the LLM level itself.

Then there's questions on the agentic

system. So, is there proper

decomposition of the task? Is the

execution efficient? Um, is it choosing

the the correct tools? Is it retrieving

the correct information if it's

retrieving information? Is it uh

completing the tasks successfully? and

so on. On the application level, these

are just software application systems

the same as any other system. So you'd

look at overall performance, error rate,

latency, scalability, cost efficiency,

uh access and identity, UIU, UX and so

on.

So in terms of evaluating the output of

LLM or agents, there's three main ways

we do that. You can use codebased eval.

So this is coding up um evaluations the

same the same way we do in um in

workflows or any other language. There's

LMS as a judge. So you can use LMS to

judge the output of other LLMs or

there's human evaluators or annotators.

And there's different ways of doing

this. Um and you could choose whichever

you're comfortable with. Of course,

codebased evals are going to be a lot

cheaper and a lot more um consistent

than having uh an LLM as a judge or

human. It's going to be um just more

repeatable. And so some questions that

are important is is is your output

quantitative or qualitative? Um is it

discreet or um or not? So can you do you

know what the output should be ahead of

time? Um, is it deterministic? Is there

a ground truth? Can you compare it to

something? Um, are you cost-sensitive?

And if you know in advance what your

output needs to be or you have something

to compare it to, then I highly

recommend a codebased eval just because

of the price tag um, and the consistency

of it. If not, then you might be able to

use LM as a judge or humans. of course

with LLM as a judge likely being cheaper

than human evaluators.

So agent challenges um

in terms of models again like I said the

models themselves are uh are evolving

the application space is evolving how we

use things what we're learning with time

is changing and so some of the

challenges that we have with models is

really the output evaluations these

these models have a mind of their own

and their output is quite open-ended and

so evaluating

um what the output is. Sometimes in a

single sentence, a single word can can

change the meaning drastically. And so

being able to evaluate open-ended uh

outputs is is not easy. There's of

course still model limitations in terms

of their ability of of what they

understand or don't understand, their

context window, um what they're capable

of, their cut off limitation. For

example, hallucinations can still be a

problem in terms of um agents um

evaluating the path that it needs to

take to solve a particular problem.

Context management um is also an active

field of of study right now. It could be

convoluted debugging just because of the

layers of the system

and the freedom it has in devising its

own solution. Price estimate for agents

can be an issue because of the loop.

There's a loop and it can go as many

times in a loop as it needs to to figure

out the problems. And so estimating a

price tag um can be difficult. There

could be compounding error. So if the

task is really large and it takes a

wrong turn, it could compound its errors

and not get to a right to a proper

solution. It can get stuck in loops. For

example, we might have integration

issues at with the tools. we might get

errors or it might not choose the right

tool. But again, a lot of these things

are dealt with with the framework. So,

lang chain and others have a way to stop

getting stuck in loops and other issues

that might might occur. Framework

stability, like I mentioned, could be an

issue because it's an evolving field.

Models themselves can be deprecated and

libraries can change quite drastically

from version to to version. And um one

of the also bigger challenges is

business value. Um there's a lot of

debate on whether these systems are

bring the business values that everybody

is expecting from them.

Uh but that is yet to be seen.

So

there's a lot of um issues that we saw

in 2025. For example, you might have

heard that the replic agent reportedly

deleted production even after a code

freeze.

The the response of the agent when I

read it was pretty funny. I think it

said something like um I'm sorry it was

a bad decision, but I [clears throat]

panicked. Something about anxiety and

stress. And I'm like, you're an agent.

You don't have anxiety and stress. But

that's the that's the thing about um

reading so much of the human literature

and the human language is now they're

replying the way a human would. There's

a lawsuit against open AI

from parents who um claim that cha

helped their uh son commit suicide. I

think now there are guard rails against

that. Uh the last time I asked uh GPT

about um something that had to do with

suicide, it it sent a lot of information

about um the suicide hotline and other

things. So that I think has been

regulated

um from for many models. Air Canada was

found liable for the chatbot's bad

advice and I think they

Air Canada did not want to pay

the the difference because of the

chatbot and then the I think they were

made liable and judge forced them to pay

uh the difference and then there's the

claims that of course um money is being

set on fire. So, $40 billion of uh

generative AI and agent products not

bringing value back. Now, if you're

interested in in these and in

understanding more of what's going on,

there is an AI incident database that um

tries to keep track of all of this and

um you can see here they're at a,323

incidences. Some of these are minor in

terms of hallucinations and inaccuracies

and some of them are major in terms of

um you know things that have to do with

with ethics and legalities and and uh

human health and well-being.

But all of this to say that again the

the AI potential is there. I think

everybody sees that there's a lot of

potentials that these systems can bring

on many different levels. But the

technology is getting there. technology

is still maturing. We're not there yet

from a from a technological point of

view. Both like I said from

the the model and the technology

perspective and from an application

space, how do we use these models? Um

progress is rarely linear. uh we will

have to just dabble with the technology,

try different things, hit a few walls

and then just deal with the

[clears throat] consequences and then

buffer against uh the issues that these

systems have.

And so it's best to use AI as a junior

assistant. It's not ready yet. It's

still maturing. And so treat it, use it

um it could be very powerful. could give

you ideas but use it as an a junior

[clears throat] assistant. So start with

readonly access to tools and systems,

add human approvals for very critical

steps and then enable comprehensive

logging to be able to see um the traces

of what is going on.

Will agents take my job? There's a lot

of questions about wh whether AI will

replace humans and at what rate they

would do so. And of course, um this is

all unfolding.

Uh but maybe it will, maybe not just

yet. Depends on what your job is and the

complexity

um of it. But there is this um

there's this research article that uh

Microsoft research put out in July 2025

and what they did is they measured they

had a they looked at jobs and

um divided they gave them an index of

how likely they would be replaced by AI.

And you could see here the top 40

occupations with the highest AI

applicability score. So these might be

uh replaced and you can see some here

that are interesting. They seem this

list seems to be more cognitive

intellectual. So you could see here uh

proof readers, editors,

um I think I saw there's mathematicians

here, data scientists, analysts, uh web

developers, and then here is the list of

the bottom uh 40 occupations. So, you've

got more um you know, nursing

assistants, but there's more physical or

manual labor. So, you could see

dishwashers, for example,

um roofers, uh floor sanders, and

finishers. Now, what does this mean?

Would I tell my girls not to be

mathematicians and look into floor

sanding? Um, no. Not yet. I don't think

so.

>> [clears throat]

>> I think change is coming for sure. Every

huge technology that has come um has

changed the job market in in in

different ways and the only constant is

change. We know that and jobs have

changed in different ways across um

societal evolution. Um but it's not

clear yet how things will change. We

know they will change. Do I think do I

realistically think that AI will replace

mathematicians? No, I do not. I think

mathematics is a little bit more complex

um and of at least cutting edge

mathematics

[clears throat] and you know um

bordering on the lines of philosophy and

um is not something that will be

replaced in my own opinion. But I do

think that you know uh technical writers

and analysts and so on are going to see

a change in how they work. At least

this seems to align with the Moravex

paradox the idea that what is hard for

human is easier for AI and and vice

versa. So for humans um humans crawl

start to crawl by the age of you know a

few months and then um they start to

walk by their first year and most humans

can jump by their second or third year

of life. Uh this is also true for

animals that they can some animals can

walk uh moments after they're born. We

do think of things that are more

intellectual like philosophy, gaming or

uh chess for example the more cognitive

abilities as more selective. Not

everybody

uh can can do philosophy very well or

play chess very well but almost every

human can jump. And this um goes back to

human evolution in that and this is true

for animals as well that sensory motor

skills are some of the older things that

we evolved to do whereas the

intellectual and cognitive abilities in

the animal and kingdom and in humans is

is a more recent um addition.

However,

uh AI seems to be the opposite in that

some of the biggest

um wins in AI were in cognitive and

intellectual

um pursuits. So, some of the first wins

were were against chess and gaming and

Atari and bow and so on. Whereas now we

still struggle with uh making bots that

can walk through a stage without uh

fumbling.

How does that change things? Does that

change how you should think of careers

and stuff? Um, I don't think so. I think

it's very interesting. Um, but

we'll see how we just have to keep an

eye on how things are moving.

I do want to talk uh specifically about

software development because I am in the

field and um it's one of the fields that

is seeing drastic uh changes. So this is

a talk from Andre Karpathy

who um gave this talk to the Y

combinator which was very interesting

and he mentions that software

development was stable and didn't change

for about 70 years and then it had it

saw two very vast changes in the last

two decades. And so we started with

software development with software code

in let's say the 1940s where we were

able to program computers. And so you've

get you get this system you set a series

of rules you create a function uh let's

say get sentiment anal get the sentiment

and then you write the series of rules

and conditionals. If this then do that

while this is true do that. Um that

would give you that result.

And then in in the 2010s

we got into what is being called

software 2.0. This term again was was

coined by Andre Karpathi himself where

now you don't code the rules. What you

do is you get this system this model

this algorithm and you give it the input

you give it the output and it learns the

rules.

And you've got this model and most uh

frameworks you would use model.fitit to

train the model and then model.predict

to the outcome of the model.

And today we're seeing software 3.0 or

generative AI where you use the English

language to uh program a system. And so

in these LMS, you speak to it and you

say, "Okay, well give me the sentiment

of this paragraph." and you give it the

paragraph and it'll come back with a

sentence. And so software development

has changed

from writing code yourself to training

models that figure out what the rules

are or what the transformation between

output and input is to now these

foundational models that you can speak

to in natural language. And today if you

are a

young student learning computer science

and coming out at the job market you

should understand that these three

paradigms of working now exist and you

should have an idea of how to use each

of them. Again, this is a really

interesting talk by Andre, which you can

find here. And this sentiment was also

uh re-echoed from with the Burner

Vogel's reinvent 2025 talk uh called the

Renaissance developer. And it goes

through how the field is changing and

how to uh deal with the changes.

So, weathering the storm, what should

you do to really uh protect yourself

against

um all of the changes that are

happening? Well, I think my first advice

is to learn AI. Don't fear it, right? AI

does not have to replace people. It can

amplify them. At the end of the day, AI

is a tool and how we use it as

individuals and as a society will write

the future. And so my first advice is

really try to learn AI is not it's not

super complicated especially with

generative AI. I know there's a lot of

um technical aspects to it but there it

is being popularized to the extent where

um it is accessible. There's a lot of

really good uh material out there for

free and so it's worth a try.

The other thing that I find it important

is that fundamentals don't fade. Okay?

So physics math biology chemistry

these things are not going to go away.

The foundational of our world are not

going to go away. How we build systems,

architecting, um just the foundational

fundamentals of every field, those are

not going to go away. We might be able

to get a production boost through AI,

but the foundations will always be

important. We will always need databases

to store information um and retrieve it.

We're always going to need to understand

networking and identity um and access

and so on. And what you need to think

about when using AI is the

[clears throat]

the systems are not there yet. And so

you need to know your fundamentals to be

able to direct these systems uh towards

doing the the task. Well,

my third advice is move the up the

abstraction ladder. Okay, you need to

define the problem. You need to design

the solution. You need to own the

outcomes. AI is not going to do that.

What you have with AI today are really

good junior assistants that can that are

really good with syntax. And what they

can do is that they can write code. They

tend today as of 20, you know, early

2026, they can write they do write code,

but they do tend to write code that is

pretty um it could be convoluted. It

could be very long. It's over, it's

super, it writes in a very defensive

way. So, it gets into these uh try uh

try catch statements that are super um

it's trying to guard against everything

and it's just super long for things that

are not um necessary. So, it's creating

tech debt in in some ways. And so, what

you need to do is you need to define the

problem, you need to decide design the

solution, you need to own the outcomes.

And what you do is you use this

assistant

with very clear instructions and then

you read the outcome and you make sure

that it's concise and that it's doing

the right thing. And so you need to

understand all of your foundations and

your system.

Um think in systems. So you need to be

able to see the bigger picture,

understand your system components, the

integration points, what can and cannot

be done and what should and should not

be done. The context window today is not

large enough for us to put all of that

information, a full code base in a

system and have AI do it. We're seeing

as the context window fills up, we see

serious degradation. And today, the way

a lot of us have tried to use AI is you

put the whole code base and it fixes

something here and it breaks something

there and you keep going and it breaks

some and you fix this and it breaks

something there. And so the best way to

work with AI is for you to own your

outcomes, for you to see the bigger

picture. You are managing a junior

um helper that's good with syntax but

still needs a lot of um guidance. That's

how you should think about it. Be a

polymath. So this is advice from uh

Verer's Verer Vogel's uh lecture. You

need to broaden your of knowledge.

Again, you have to think of yourself as

now a supervisor of these um helpers.

And so, you need to be able to have an

understanding of what they're doing. You

need to be able to learn very fast and

to have a broader skill set.

Some niches are more difficult for AI.

So, AI can only know what's in the uh

when what's in its training data and it

so it doesn't do cutting edge very well.

It doesn't come up with novel idea. You

have to remember these are tokento token

probabilistic generators. It's it's

looking at its understanding of the

whole whole human um knowledge base and

it's using that to understand

the probability of different tokens and

so it constructs sentences token by

token which of course there's all sorts

of problems that this could cause. It's

very greedy. there's no planning aspects

and there's so many different ideas

about how the next generation of um

models need to be and Dr. Yan Lakun has

been arguing for world models for a long

time. He keeps saying that these LLMs

are a um you know on are an off-ramp in

the highway of AI uh studies. They are

impressive. We have to say they are

impressive in in what they're able

[clears throat] to to generate and

there's a lot of applications that have

been built on top of them but they are

still tokento token probabilistic

generators. There's no thought behind

them. There's no they don't understand

our world. They don't understand

physics. They don't understand a lot of

things. They're still a lot of them are

based on text and not um other

modalities like images which is how we

learn. And so there's this new

um

class I think of of models that we're

going to see emerge and Dr. Yon Lun had

left left uh Meta this year to create

his own uh startup focusing on world

models that he called AMI. So I'm super

excited to see what's going to happen

there. But this is all to say these

tokento token generators are not going

to come up they're unlikely to come up

with cutting edge um novel ideas which

is why I'm saying they're not going to

be uh replacing mathemat mathematicians

anytime soon. Of course they can do math

from math we already know but I don't

think they're going to come up with new

physics and math theories. That's highly

unlikely. I'd be surprised if I see

that.

And uh my last advice is to focus on the

human element. Like I said, a very a con

from my perspective to agents and

agentic system is that they are not

humans. [clears throat] I still would

rather deal with a human, a person to

solve my customer complaints. Um the

reason we

have as a species taken over the planet

in comparison to any other species is

our ability to work together

collectively. Uh we create this human

fabric that is able to learn from each

other both horizontally and vertically

across generations and we we regulate

each other. There's a lot about the

human nature that is very special. I'm

I'm very very interested in humans as a

species in what makes us who we are in

what um there's some humans that are

angels without wings and there's some

that are that make questionable

decisions and humanity is very very

interesting and I think that we've

evolved together in a very special way

and what makes us humans is um is part

of our nature.

And so I would suggest you focus on the

human element. You build trust with your

clients, build um build [clears throat]

connections with your teams, network

with people. AI is never going to do

that. Not yet. At least it is an in it

is an outsider to the human race. And so

I think if you want to get ahead in an

AI world, focus on the human element.

So, these are some references that I

really like um and that I've used. All

of the ref like all of the references

that I've used within the slides are

there for you to use. Um but these

[clears throat] are some that I like. I

highly respect Dr. Andrew. He's one of

the few people that I follow

with I have a lot of respect for him. He

is one of those people who was at the

border of academic [clears throat]

research and industry and applications

and he has this really cool course

called agentic AI on deep learning which

I highly highly recommend. I think from

all the resources I've seen this one um

is the one I've liked the most. Um I

like Chip Huan very very much. I have

both her books. She has a book on

machine learning and one on AI

engineering. She's one of those people

who writes um very simply down to earth

but she's very comprehensive. I

absolutely love her. I have her books

probably in every form in printed form

and PDF audio books in every way. She is

definitely somebody to check out. I

follow Andre Kurapathy as well. Um

brilliant person down to earth. There's

a lot of hype in AI. There's a lot of

people sewing, a lot of confusing ideas,

a lot of things that are just out there

in terms of just wanting drama. Andre is

not one of those. So, Andrew, Chip, and

Andre are people who are really down to

earth, who are not um just flaming fires

and drama. And Andre is very good at

explaining very complex things. He has

his YouTube channel and he's building an

educational um

institution or or service which I'm

super excited about. There's a few

there's a really some really good

courses on Corsera that uh you might

like and all of the important players in

AI have their own academy. So Langchain,

Anthropic Nvidia Deeplearning.AI they

all have an academy um that has free

information that you can check out. I

really like the deep learning.ai course

catalog. I find it's very useful and um

I really like Stanford University

classes. They put a lot of classes

online and these are computer science

technical computer science classes that

are given at Stanford that they um put

on YouTube which I find super super

useful.

And then uh this is the email

introduction that I have done last year

if you're interested. [clears throat]

And this is just the Linux Foundation uh

article for the donation of MCP.

And that's it. I really really hope this

was useful. You can find me on LinkedIn.

[clears throat] And uh the code base and

the slides are all found online uh on my

GitHub. This is the course from last

year. And again, I work for Tech42. We

do specialize in Genai and agent. If you

want to talk agents with a with with a

set of nerds, uh you can find me and my

colleagues at tech42. And that's it.

Building Agentic AI Workloads – Crash Course

freeCodeCamp.org

55 days ago

1:40:24

Agentic AI Systems

Rank #1

Description

This course, from Rola Dali, PhD, provides a comprehensive overview of agentic AI, defining agents as software entities that use LLMs to perceive environments, make decisions, and execute actions to achieve specific goals. It explores the critical distinction between static workflows and dynamic agentic systems, emphasizing how LLMs serve as a reasoning "brain" to decompose tasks at runtime. Through practical Python demonstrations, the course covers essential components like system prompts, tools, and memory, while also comparing architectural patterns such as Supervisor and Swarm. Finally, the session addresses the future of technology by discussing emerging interoperability protocols like MCP and the shifting paradigms of software development in an AI-driven world. Slides and Labs: https://github.com/rdali/ML105_Agents Profile: https://www.linkedin.com/in/roladali/ ❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp ⭐️ Contents ⭐️ - 0:00:00 Introduction and Speaker Background - 0:01:15 A Brief History of Artificial Intelligence (1940s–Present) - 0:05:43 Traditional Machine Learning vs. Generative AI - 0:06:35 The Three Pillars of AI: Algorithms, Data, and Compute - 0:11:08 Specific Tasks vs. General Task Execution - 0:14:41 Defining Agency and the Spectrum of Autonomy - 0:18:00 Agentic Milestone Timeline (2017–2026) - 0:20:31 What is a Generative AI Agent? - 0:23:04 Agents vs. Workflows: Dynamic Flow vs. Static Paths - 0:26:18 Pros and Cons of Agentic Systems - 0:29:59 Patterns and Anti-patterns: When to Use Agents - 0:32:36 The Core Components of an Agent - 0:34:55 Choosing the Right LLM for Your Agent - 0:37:38 Crafting Identity with System Prompts - 0:39:00 Understanding Memory: Intrinsic, Short-term, and Long-term - 0:41:26 Enhancing Capabilities with Tools and Actions - 0:43:09 Hands-on Implementation: From Single LLM Call to Python Agent - 0:52:18 Adding Memory and History to Your Custom Agent - 0:54:53 Building Agents with Frameworks (LangChain) - 0:57:17 The Evolving Landscape of Models and Frameworks - 1:00:15 Agentic Architectural Patterns: Supervisor vs. Swarm - 1:01:41 Case Study: Single Agent vs. Supervisor Architecture - 1:04:48 Deep Dive: Swarm Architecture Performance - 1:06:08 When to Choose Multi-agent Systems - 1:09:05 Interface Protocols: MCP, A2A, and AGUI - 1:12:06 How to Evaluate Agentic Systems (LLM vs. System vs. App) - 1:13:53 Evaluation Methods: Code-based, LLM-as-a-Judge, and Human - 1:15:25 Current Challenges: Hallucinations, Cost, and Debugging - 1:18:15 Real-world Incidents and the AI Incident Database - 1:21:28 Career Impact: Which Jobs are Most at Risk? - 1:23:41 Software 3.0: The Evolution of Development Paradigms - 1:29:00 Weathering the Storm: Strategies for the Future - 1:33:40 Beyond LLMs: World Models and the Future of AMI - 1:37:15 Recommended Resources and Closing Thoughts 🎉 Thanks to our Champion and Sponsor supporters: 👾 @omerhattapoglu1158 👾 @goddardtan 👾 @akihayashi6629 👾 @kikilogsin 👾 @anthonycampbell2148 👾 @tobymiller7790 👾 @rajibdassharma497 👾 @CloudVirtualizationEnthusiast 👾 @adilsoncarlosvianacarlos 👾 @martinmacchia1564 👾 @ulisesmoralez4160 👾 @_Oscar_ 👾 @jedi-or-sith2728 👾 @justinhual1290 -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news

Video Details

Category

Agentic AI Systems

Featured Date

January 6, 2026

Quality Rank

#1

AI Recommended