Loading video player...
Hello everyone, welcome to ML 105. This
is going to be a quick overview of
agentic AI and agents.
So my name is Raali and I'm a machine
learning architect at Tech42. We are a
consulting company that specializes in
generative AI and machine learning.
I have a PhD in neuroscience and
bionformatics from McGill University in
2017. You can find some of my
peer-reviewed article on PubMed, Google
Scholar, Bioarchchive, and Archive.
I transitioned into industry about 5
years ago now and stayed within the
data, AI, ML, and cloud ecosystem. I'm
an AWS hero and a gold jacket
ambassador. And on a more personal note,
I'm a mother of two young girls.
What I wanted to do today for this
session is to give you an overview of
generative AI and then move into the
specifics of what agents are, when to
use them and not to use them, how to
design an agent, how to implement one,
talk a little bit about architectural
patterns, evaluations, and then we'll
talk about um how they might or might
not affect careers.
So artificial intelligence is not a new
field. It is thought to have started in
the 1940s with some of the earlier
papers attributed to McCulla and Pittz.
In 1943,
they wrote an article called a logical
calculus of the ideas imminent in
nervous activity which proposed a
mathematical model of biological
neurons.
In 1950, Alan Turing wrote his paper
computing machinery and intelligence
which proposed the touring test a
benchmark for machine intelligence. In
1956, the Dharmmouth workshop
established AI as a field of study and
John McCarthy coined the term artificial
intelligence.
We saw an AI winter between the 1950s
and the 1980s. In the 1990s, we saw a
machine learning renaissance with um
Jeffrey Hinton being awarded his PhD in
artificial intelligence from University
of Edinburgh in 1978.
I write some books that stated that when
this young PhD candidate was deciding on
his field of study, uh AI was thought to
be a career suicide because it was a
niche uh topic and a lot of people
advised him against it. Uh, of course,
fast forward 50 years today. Uh, Jeffrey
Hinton is one of the godfathers of AI.
He's gone to win the touring award and
the Nobel Prize among many other awards
and he's gone to change the world
really.
Um, in the 1990s we saw a lot of machine
systems winning in gamings uh in all
sorts of games against human contenders.
So in 1997, IBM's Deep Blue defeated
Gary Kasparov in chess. And in 2011, IBM
Watson won the TV game show Jeopardy. Of
course, that all um increases enthusiasm
about the potential of the field.
In 2010, we saw a deep learning boom.
In 2012, AlexNet, uh which is a deep
learning network that came from Jeffrey
Hinton's lab, performed well in an image
recognition competition. This of course
revived the field and the excitement
around all of the potential that uh
neural nets can have. In 2012, Deep
Mind's uh Alpha Go defeated the reigning
champion in the game of Go.
Today, we're living through a generative
AI boom. I'm sure you noticed from the
two letters AI being plastered on every
product that you purchase. Um some
notable events there is in 2017 the
transformer paper was released by Google
researchers and researchers at the
University of Toronto. This became the
blueprint for a lot of the large
language models we use today. And in
November 2022, OpenAI released chatbt
for the general public which really uh
got a lot of attention.
And I remember the deep learning boom. I
was in the field back then and I
remember all the excitement that
everybody in data or computer systems
had around AI. Of course, it was still a
niche um it was still very niche because
it required
very technical skill sets to be able to
build these models yourself, train them,
use them. But the generative AI boom is
very different. It has really
popularized AI in ways that made it
percolate across
all of society. And so even my mother
who has never touched a computer knows
what Shachi PT is and and uses it on her
phone um to discuss certain things. I've
heard uh young students brag on the
streets on how they've used Chachi PT to
um cheat on some exams maybe or people
in various jobs tell me how they've used
Chach to help with resumes and ideas and
just making more things things more
efficient and so unlike previous decades
of machine learning and AI the
generative AI boom is really more uh
popular and democratized.
So last year I gave a free uh code camp
lecture on machine learning fundamentals
that went more into the technical detail
of what machine learning is and all of
the details. If you're interested in
that, you can find that also on YouTube.
Um I don't want to go over a lot of the
information that I went over in that
course. So if you find that there's
missing background information that you
need, you might very much find it there.
So what differentiates the generative AI
boom that we're seeing today from
traditional ML on a technical
um plane?
So artificial intelligence sits on three
pillars. So it sits on um the pillar of
algorithms. So these are the
mathematical models that
map input to output.
Then you've got data which is used to
tune those algorithms for uh these
models to learn from. And then you've
got compute to be able to run these
training um and inference uh systems.
And so artificial intelligence really
rei relies on data algorithms and
compute.
And I want to show you how
machine learning and generative AI
differ across three these three pillars.
And so at the level of data training
data sets have
grown in orders of magnitude in size. So
in machine learning we used to talk
about megabytes to gigabytes of data. If
you had about a million examples in uh
machine learning then that's usually
thought to be sufficient. Now, of
course, that depends on the size of your
model, how clean your data is, what the
task you're trying um to learn is, but
generally we used to talk about
megabytes to gigabytes of data, and a
million examples used to be plenty.
Today, in generative AI, these models
are trained on the internet, practically
all of the human knowledge that we've
collected, and that's on the order of
terabytes to pabytes of data. And it's
estimated that these LLM see about 15
trillion tokens. And so again, that's
orders of magnitude greater than we used
to have than we do have in machine
learning.
In terms of model size, again, machine
learning models are usually in the
thousands to millions of parameters,
whereas the genetic AI models are in the
billions to trillions of parameters. So,
uh, GPD4, for example, is estimated to
be at 1.8 8 trillion uh parameters
large.
And at the level of compute, we've seen
of course advancements in um in chips uh
in CPUs and GPUs and how fast they can
go. But I think the biggest advancement
has been in being able to run sequential
uh models in parallel. So in machine
learning uh sequence models like uh RNNs
or LSTMs needed to to run serially and
that limited their scale and scope. Uh
whereas the transformer paper one of the
biggest advantages I think of that paper
was being able to parallelize the
training which of course made all of
this possible.
So this is one of my favorite resources
in this uh domain. And so if you take a
look at this now, it's a few years old.
I think it stops at 2020. But this shows
you some of the advancements that have
to happen on all three pillars. So the
algorithmic pillar, the data pillar, and
the compute [clears throat] pillar for
us to be able to get to where we are
today.
So what happens when you supercharge
data, algorithms, and compute all at the
same time? Well, you get magic. So you
get these Hulk models, these big
foundational models we're calling them,
that seem to make sense of human
knowledge or at least human language.
They need they seem to un to be able to
use human language uh well.
And so we've all had chats with with
Chachi Piti or Anthropic Claude or any
of the models out there and they make
sense. you can have a very good
conversation
and I'm not only talking in terms of
like sent sentence structure uh but they
are able to have a normal conversation
like another human would. Now, of
course, there's issues that we deal with
like hallucinations and accuracies, and
there's a lot of debates on whether
these models understand the world or
language or human culture. And that's a
huge debate that there's people on
either side of the spectrum or whether
they're capable of reasoning, but
it's we can agree that they're able to
use language quite well. So they can
read, they can write and um they can
extract information for example.
And so this I think was surprising to a
lot of people that
we would get to models that can do this
by simply making the models bigger and
shoving a whole lot of data in them.
The other thing that has happened is
we've gone from tasks that are specific
in machine learning to tasks that are
general in generative AI. So in machine
learning, the way we work is you think
of a particular task, a very specific
task you want to solve. You collect that
data specific to that task. You choose a
model that is capable of doing that task
and you train that model. And um it's
[clears throat] usually task by task.
And with generative AI because these
models understand human language, there
is a generality that has emerged from
that. So there's a lot of emerging tasks
that are capable to of doing by the
simple fact that they do understand
human language. So they can read, write,
summarize. We've seen them use as
assistants. They can get take
instructions because they can take
instructions. Um there's a lot of in uh
tasks that they can do. So we've moved
from tasks that are specific to more
general task execution
and we've seen the rise of model as a
service. So a machine learning model
used to be trained used to require
technical skills. You again you chose a
model, you cleaned up your data, um you
trained your model, you optimized it and
then you ran inference on it. And you
needed a lot of skills to do that. But
it was it was affordable because
everything fit on local uh hardware.
Today, these generative AI models have a
price tag of anywhere between a few
hundred,000 to a few billion dollars.
And so this made it prohibitive for
general people uh to create their own.
And so we've got uh big labs, big
companies like um Anthropic, Open AAI,
Amazon
um and others that have the capability
to train these models and they put them
out as a service. So these foundational
models we call them are capable of doing
a lot of things and these companies are
putting them out as um a pay as you go
uh pricing service.
And I want to point out that to get
here, we needed advancements in all
three pillars. If you had um a really
large model but not enough data, then
you wouldn't be able to fine-tune the
parameters correctly for that model to
learn enough. If you had
a lot of data but a very small model
then what would happen is that model is
uncapable
of storing and learning from all that
data. This which is what we call bias in
ML. And if you had data and you had
algorithms and we still had to run these
things serially it would have would have
taken more time than would be feasible
for us to push the advancements as fast
as we can. So GPT4 for example um is
estimated to have run for about 3 months
in in [clears throat] training that's
with the parallelization
and so we are here today because we can
we can get these very large models put a
whole lot of data in them and train them
uh on these compute systems in parallel.
So generative AI, agentic systems and
the spectrum of autonomy. So if you
think of agency, if you Google agency,
agency is thought to be the ability to
make choices, act intentionally and um
have some sort of control. And all these
systems in generative AI have some level
of agency or autonomy. In an LLM, the
agency comes in the output. So when you
call an LLM, when you invoke an LLM,
these foundational HULK models, the
response they give you is very
open-ended. These models tend to be
tokento token probabilistic generators.
And depending on certain configurations,
there could be vast changes in that
output. And so there is a lot of control
the LLM has uh over the output.
As these systems evolved, we started to
use these LMS in a loop mimicking a
chatbot.
Then um we started to see workflows
where we use them in bigger systems with
predefined uh steps.
And then the 2025 was called the year of
agents. We'll talk a lot more about what
specifically agents are and how they're
different from workflows, but these are
these autonomous systems again that have
more control over um the flow of uh an
application.
Today we're seeing deep agents which
again have more control
over your file system. They can spawn
other agents. They might have control
over uh a browser or so.
And of course, we're going to increase
the spectrum of these autonomous agentic
systems by giving them more and more
capabilities, agency, and autonomy.
And of course, the pinnacle of
[clears throat] the field is AGI,
artificial general intelligence. Um, I
find that there's not a clear definition
or an agreement in the field about what
AGI is or is not. um even experts in the
field can't agree whether uh AGI
is possible or what it is or what it is
what we're looking for but of course let
let's put that as the pinnacle in the
field and as things become clearer I
will let you know next year if we have a
better definition
so as we move through these systems
there is more and more system autonomy
we're giving them more agency all the
from the agency over output for an LLM
to the control flow in an agent. And of
course, as the system gains more
control, there's less and less human
intervention.
Um, the level of control or autonomy is
of course a spectrum and to avoid
bickering over what an agent is or where
that line of autonomy lies, uh, Andrew
Ang coined the term agentic systems. um
as an umbrella term for all of these
systems to acknowledge that there's some
agency everywhere.
So some um agentic milestone timelines
of course I mentioned that the
transformer paper which is the um the
foundation
uh algorithm for a lot of these models
today was released in 2017.
Chachi PT was released in 2022.
Um, agents started to appear more in
2023 with the React paper that um merged
reasoning and action together. And uh
today we are in uh January 2026 I'm
filming. So really this is about 2 3
years old uh not more than that. And the
reason I'm showing you this is because I
want you to realize that this is a
cutting edge field. Um things are
evolving before our eyes, right? So
there is not um
there is not a an authority that can
decide on certain things. What we're
seeing is many many different companies
come up with different ideas. There's a
lot of things in the literature. There's
a lot of companies trying different
things and the field is still very young
and it's maturing before our eyes. We're
practically bring uh uh building this
bridge as we cross it. And the reason
this matters is because one you have to
be very careful about how you use it in
applications understanding the the the
kind of the the vagueness that comes
with the systems being uh still
evolving. And then the other thing I
want to point out is that the time stamp
really matters. Um what we might know
today might not have been very clear 6
months ago. something we thought was
important 6 months ago was might not be
super important today. So when you look
at any resource in generative AI or
agents look at the time stamp because
knowledge is evolving very very fast and
the generative AI story in general has
been very compressed meaning that um
there is a lot of excitement there's a
lot of hype and there's a lot of money
uh being poured into this and this
causes really really fast evolution in
the space and so really look at the time
stamp and understand that um again this
is all h unfolding before our eyes.
So what is an agent?
I looked at a lot of definitions from
different resources and again there's
many many different ways of looking at
it but this is the one I like. So a
generative AI agent is a software entity
designed to perceive its environment,
make decisions and take actions to
achieve specific goals. And um again so
this is a software system and what
happens is it's able to plan the brain
of the system is an LLM. It's a
foundational model and it helps it to
plan a task. So you you give it a task.
You say okay you solve this for me or
give me this answer or help me with
this. And it uses the LLM to plan and
decompose that task.
It's able to act. It has tools and then
it's able to observe the output of those
tools and then it goes back and it plans
acts and observes. So it's a loop of
plan act observe plan act observe until
the solution is achieved. And so this is
what this looks like in pseudo code. Um
so we've got this while loop here. It
takes user input. So this is the actual
task that it's supposed to do. And then
it invokes an LLM which again is the
brain of the agent. And this is supposed
to give you um a plan of how to solve
this task. So it decomposes the task
into its subtasks. And then while
the if the response has a tool, if it
wants an action, then that action is
invoked. It's executed and then the
response is sent back to the LLM. So the
LLM both is the observer and the
planner.
And then that loop goes on and on and on
until there's no more action to be done.
and then you get your final response. So
again to show you what this looks like
um in another perspective, there's a
user and they invoke an agent with a
specific request. That agent goes into a
loop and this loop invokes an LLM with
the task. That LLM gives a response, a
reasoning or a tool call.
And then depending on this um if it's a
tool execution then the tool is executed
and then the response is sent back and
then it's sent back to the LLM to see
what further action needs to be taken
and we keep looping in this system until
there is no more action to be done and
the s the answer uh is resolved and then
it's sent back to the user.
So how is an agent different from a
workflow? Well, I want to demonstrate
this by giving you um a task for
example. So, let's say I traveled with
my kids to a new city and what I want to
do is I want to fill our time there with
activities.
And so, we have a list of activities
that we pref that we like to do. And
what we want to do what I want to do is
I want to check the activities available
in that particular city. I want to go to
the websites of these activities and
check if they're available for a
particular time and date. And then I
want to check our own calendars for
availability for time and date. And if
there is um a correspondence, then I
want to book the activity and pay for
it. And then I would add the activity to
the calendar. Of course, I could code
all of this in code. I could just code
it um with any language. But let's say I
want to do it um in a workflow that
involves an LLM. So what I want to do is
I'm going to ask the LLM for all of the
popular activities in Montreal with
their websites.
I'm going to take that list and I'm
going to call this function uh activity
availability that's going to go scrape
that website and check when the or call
an API for that website and then tell me
if um a particular time slot is
available. And then I'm going to check
our own calendars for availability. And
then I'm going to keep doing that loop
until I find activities that fit our
schedule. And then I'm going to call
this function book activity. And then
I'm going to update the calendar with
this function. And so a workflow is a
set of um predetermined steps in a
particular sequence that's coded up.
Now, if I want to do this with an agent,
what I'm going to do is I'm I have this
agent and I'm going to say, well, you
are a booking activity
uh agent, choose activities based on
customer preferences, book an activity
uh and update the calendar. You have
these following tools. So, I'm going to
give it all of these functions and tools
that I had pre uh set in in the
workflow, [clears throat] but I'm just
going to tell it these tools are
available. I don't tell it how to solve
the problem. I don't tell it how to do
anything. What I do tell it is book me
activities in Montreal for these
particular days and it's going to do
everything it needs to do with uh by
using its tools to solve that problem.
And again, I'm going to give you an
example of this after. And so the main
difference between an agent and a
workflow is that an agent has dynamic
control flow um of the execution devised
by the LLM at runtime. So this is not
predetermined pre-coded paths. These are
determined by the agent by the LLM at
runtime. Whereas workflows are static
predefined coded graphs. So if you're
going to take anything
from this lecture, it's this that agents
have dynamic control flow um devised by
the LLM at runtime whereas the pre uh
the workflow is predefined coded graphs.
So agents are becoming very popular. Of
course they are a general use uh
technology. They can fit across
verticals and so we are seeing them in
customer service, HR, R&D. We're seeing
them across the board at different
companies. Um some pros of agents and
these are true for uh computer systems
in general is that um they're available,
right? They're available 24/7, 365 days
a year. They don't need uh breaks beyond
maintenance windows. Um they're
multilingual. So these LLMs support over
200 languages uh right off the b right
out of the box. And so you really don't
need um any extra work to support
different languages. Efficiency. um they
do improve response times
because a system that's well set up and
has all of its resources and data set up
um in an efficient way is going to be
faster than humans looking through uh
databases and HR
they're consistent so they um although
the the execution path of agents is
quite large it still is smaller than the
difference between humanto human
variation I
They're convenient. They offer uh
self-service options at any time and
they scale. Any computer system that's
well done should be able to scale um uh
very fast and well. And um in terms of
cost, computer systems should be cheaper
than humans.
Now, in terms of cons, um one of the
bigger things I think is that they're
not human. I I think this is
understated. Um, and I've seen a lot of
companies go for
AI first customer service this year,
which I have to say has been very, very
frustrating. Um, these systems are good.
They're powerful, but if you want to set
them up, you have to set them up very
well. An agent that breaks and does not
work and is slow is very, very, very
frustrating, which defeats the point.
And I think from my perspective anyway,
I really enjoy human contact. And so um
they're not human. These agents are not
human. And to me that's that's a con.
Um the technology is still maturing. So
the LLM themselves are still increasing
um in in potential. We're still learning
what they can and cannot do and how we
can uh optimize them more. So, a few
months ago, we were dealing with his
hallucinations, which I think um we see
less and less of, but we're still
dealing with like context window size
and and all sorts of um things that they
uh have issues with. The application
space is still maturing. So, not only
the technology itself is moving and
evolving, how we use it, we're still
learning how to do that. what um what
are things that it doesn't do very well
that we need to uh cover up for? What
how do we build safe ethical
applications on top of it? How do we um
get the human the user experience to the
point where it needs to be? Again,
technology is maturing, but the
application space is also still maturing
and cost. So it's true that a computer
system is usually cheaper than a human,
but an agent is usually more expensive
than other software systems. So again,
it depends on the size and how you ar
how you architect and what you do, but
usually they do come out a little bit
more expensive than other systems.
So patterns and antiatterns for agents,
when to use or not to use agents. So
we've got our workflow right here and
we've got the agent right here. And
again um as I mentioned the workflow is
predetermined steps whereas the agent
has control over the flow um of the path
the execution path. And so if you have a
missionritical or error sensitive
application or field you should be
leaning more towards um a workflow. And
again as you go through agents the agent
has more control and humans have less
control. And so if you are in a mission
critical state, you want more human
control and so you should be leaning
towards um workflows. If you're in a
regulated industry or need deterministic
outcomes, you could you should probably
lean towards um workflows. If you're
latency sensitive, agents do add a
little bit of latency. So again,
workflows might be a better option. If
you're cost sensitive, it's easier to
estimate the costs of a workflow than it
is for an agent. And um but if you're
looking for performance, agents do tend
to for perform on average better just
because of that loop because there's a
loop over uh the information they do on
average to perform better than a
workflow. And of course, if you are if
you're okay with flexibility or you
don't know exactly how to solve a
problem, then um an agent might be a
better option. And if you're comfortable
with model driven decision-m or
appreciated, then an agent is a better
solution. Now again, all of these things
can be dealt with in either system.
There's ways to get over all of these,
but again, this is just a an an idea of
of um what would work better.
And so these are some questions to ask.
Um so if is the application mission
critical, error sensitive or in a highly
regulated industry is the task path
predictable or can be predefined. So do
you know how to solve the problem in the
sequence of events that need to happen?
Is the value of the task worth the cost?
That's very important. I think people
are undermining this question. And is
latency critical? And depending on these
answers, I would say use an agent in
cases where error is tolerable,
openminded,
the execution path is harder to code and
cost is not an issue and latency can be
tolerated.
Okay, so components of an agent. So
let's look at agents um more as a deep
dive and look at actually what makes an
agent.
So I've looked at a lot of different
references and again this is um the
field is still evolving. You see a lot
of references and a lot of people um
trying to define things in their own
words. Uh what I've tried to do is
compile a list of resources. And what I
want to show you is the elements that
have consistently come up over and over
in many references. And so almost
everybody agrees that an agent needs to
have a purpose or a goal, right? it it's
solving a task and so it has to to have
that goal. It needs to be able of it
needs to be capable of reasoning or
planning. It needs to be able to
decompose that task into this subtasks
and be able to plan the execution. It
needs to have memory to be able to um
have a long discussion
and it needs to have tools or actions to
be able to um solve things on your
behalf. Now there's [clears throat] a
lot of extra things that you see in
different references. Of course you can
have guard drills and communication for
multi- aent systems and you can add
learning so that these systems learn
from experience and there's so many
other things that you can add. But I
think the four mentioned above are the
ones that most people agree on. The
agent needs to have a purpose, needs to
be able to reason and plan. It needs to
have u memory and it needs to have some
tools or actions.
So what does that look like? Okay, so
we've got this system, this computer
system, and we're saying it needs to be
able to reason or plan. And what this
comes down to is an LLM. Okay, so that
part of the system is an LLM.
we we're saying it needs to have a
purpose, a goal or mission or identity,
a task it needs to solve. And this comes
in the form of a system prompt. And
again, I'll give you examples of what
this looks like. The tools or actions
that it has usually come down to
functions or API calls. And uh memory
can come in different forms. It could be
short-term memory or long-term memory.
So let's let's dig into each of these
four in a bit more detail. So choosing
an LLM again the LLM is the brain of the
agent. Okay. So um it's going to help it
uh understand the task break it down and
it's going to help it evaluate the
outputs of the tools and uh so think of
it as the brain of the system. So how do
you choose an LLM? Well you need to
consider several criteria in choosing a
model. The task complexity matters
right? If you're using an agent to solve
simber tasks, then maybe you can choose
a smaller model from a cost point of
view. If it's a more complex task, then
you might need to be to use a bigger
model. These models should have
reasoning capabilities. Not all models
are trained to have to reason. So, uh
reasoning capabilities of course make
for better um agentic models.
the context window matters because you
want to be able to fit more information
uh for this um
for for the agent. Some models are have
the capability of tool calling. So this
is the way that the model um uh chooses
a tool call. If it doesn't if a model
cannot does not have tool calling
capabilities, that's not really too much
of a problem and that you would have to
explain to it how to return a tool call.
But again um if it does then that just
makes life a little easier.
You can look at latency of models. This
is going to be part of an application
and so of course the latency of the
model affects the overall application
latency and so faster models might be
better uh for your application and of
course cost. These are foundational
models that you usually pay per token
unless you're hosting your own. And so
you want to have an idea of the cost of
using that model and whether you're
comfortable with that.
Compliance and data privacy also play a
role and that you want to um know if
your field requires any compliance um
regulations and [snorts] that of course
could mean that you would host your
model or use a service or if you would
to use a service whether you would need
to read uh the provider agreement to see
what if it aligns with your
requirements.
Now, there's a lot of information online
and there's a lot of different uh
benchmarks and leaderboards online that
um can help. Some of them are very
specific to agents which uh could be
helpful. So, this one from Hugging Face,
for example, is a uh an agent
leaderboard and they'll tell you
depending on the different
uh verticals or other things that you
want, what models um are working best
[clears throat] for agents.
So we said that uh this agent needs a
character or an identity. This usually
comes in the form of a system prompt.
The system prompt is um kind of um a
definition. Imagine you've got a like a
junior intern. The system prompt is
really that one, you know, like the the
page you give that intern about what
they're who they are, what they're
expected to do, and what um what tasks
they need to do. So, so for example,
here we've got an agent and we say,
okay, you are a financial adviser, you
are eloquent and professional. For this
one, uh we say you are a medical
assistant, you are caring and
empathetic. And here we say, you are a
teen adviser, you are young and hip. And
again, these are This is a these are
instructions in the English language.
They're just um natural language
statements. And of course here I've put
only two sentences but in production of
course these are longer. They can
include anything from the tone of voice
um who you are, what to do, what not to
do, uh very specific instructions in
specific situations. And so usually
these system prompts are a lot larger in
production.
And so assistant prompt is an agent
character persona plus its purpose and
task and instructions.
So
an agent needs memory and uh the reason
it needs memory is because uh LMS are
stateless and again I'm going to demo
this when we start playing with code a
little bit but LMS themselves do not
retain information as you talk to them.
The
agent has three types of memory. There's
the intrinsic memory and what this is is
these are model parameters. So this is
the information that was retained from
the training process and unless you
retrain that [clears throat] model
then that is stable. The intrinsic
memory is stable. It doesn't change u
but it does change from model to model.
Another form of memory is short memory.
This is within session memory. Okay. So
um and what this looks like is it's the
context window. What usually happens and
I'm going to demo this again with code
is what ends up happening is what you do
is you append the conversation at the
end of the context window and so the
agent has an a running um
uh
JSON of the conversation that's
happening and so it can recall uh
earlier information. There's a lot of um
effort there's it's art and science of
how to run context management. So what
goes in the context window? What
information needs to be retained? What
information needs to be dropped or
summarized or compressed because the
context window is limited in the amount
of tokens that you can put in it. Uh
what you put in it becomes important and
[clears throat] that's called context
management.
Another form of memory is long-term
memory and this is across session. So,
this is let's say you've um you're
talking to your customer service agent
and you're talking to a customer. You
want to be able to collect long-term
information about that customer, what
their complaint last time was, what
their preference is, what their
information is, and that comes in the
form of external storage usually that
the agent has access to.
And as the short-term uh memory
um gets [clears throat] clogged, there's
information um that is moved to the
external storage. And so um what
information is stored, how it's indexed,
how it's retrieved. Again, that becomes
very important.
So moving on to tools, agents have tools
and tools are
interesting because LM have limitations
that tools can help overcome. So these
LMS
um are stateless and we fix that with
the memory component. But the LMS
themselves have a cutoff date after
their training. So if an LLM was trained
in 2022 for example, then it doesn't
know any information beyond 2022. It
doesn't know who the president is if if
he or she were elected after 2022. It
doesn't have information about the time
or the date or the weather anywhere. And
all of this limitation of the LLM we can
um overcome by adding tools by giving
the LM tools that can provide that
information.
And so agents can take several actions
and it could be capability extensions.
It could be it could be um a function
call or an API to do something. It could
be knowledge augmentation. So it could
be retrieving data or context from
databases. And it could be
orchestration. So it could be calling
other agents or communicating with
systems.
And so tools can come in different
forms. They could be again function
calls in any language, could be an API
call, could be data retrieval from an
external database. We're seeing a lot of
browser actions, um code executions,
file system control. And again, as
[clears throat] we move um more and more
we push that agency and autonomy
spectrum to the system, we're going to
see we're going to be able to give it
more and more control. So implementing
an agent, what does that look like from
a code point of view? So I've written
some code, very very basic code in
Python to show you some things. I want
to show you a single LM call. Um how do
you invoke one of these um LLMs if you
have not done so before? I want to show
you how you could put that in a loop and
mimic a chatbot for example.
I want to show you a very simple agent
again from uh scratch in Python no
frameworks how you would add memory and
then I want to show you how you would do
an agent in in one of the frameworks. So
like lank chain we'd add memory to that
and then I'll show you some
architecture. So this this repository is
is open source. Um you can find all that
code there.
So this is what's in the repository. So
we're going to start with the LLM call.
And what I want you to see here is that
uh there's no frameworks. It's just
boto3. So I'm using AWS. I'm using the
model anthropic cloud 4.5 and I'm using
bedrock API to call that model.
And so I'm taking a user input
and then I'm sending it to the LLM
through the converse API. The converse
API is really nice from Bedrock because
all these models, they come from
different providers and so they have a
different expectation of input. They
have different parameters and different
JSON structures. And what the Converse
API tries to do is to standardize that
for you and it makes it a lot easier to
switch this model without changing the
the LLM call. And so we're calling this
um LLM here, this model with a converse
API with our query. And then we're
taking the output and just printing it
out. So it's very simple uh basic code.
So let me run this. And so this is I'm
going to say hello. Actually this is a
single call. So I'm going to say uh tell
me more
about Montreal.
And so it's going to come here and tell
me uh Montreal is Canada's largest
second the second largest city and it's
in Quebec and so on. Okay. And as you
see I get my prompt back because this is
a single call. And so now I've got my
prompt back. Now what I want to do here
is take a look at this second code which
is in a loop. So this is the exact same
code. Okay, exact same code, exact same
model, exact same API. The only
difference is that there's this while
true loop. And so now we're doing we're
running in a loop until the user enters
quit and then everything else is the
same. Okay. And so now if I run this
I'm gonna say
hello
my name is roller.
So this is my answer.
Okay. So let me run this again
also. Hello
my name is
Then the system is going to greet me. Hi
Rola, nice to meet you. What can I do
for you? And I got again the I didn't
get my prompt back. I can speak to it
again. I'm going to say tell me more
about Montreal.
And it's going to tell me a little bit
more about the city. And I get I get the
prompt back. Well, I want to demonstrate
a couple of things. I want to show you
that there is no memory. These things
are stateless. There's no memory. So I
did uh give it my name at the beginning.
It um greeted me by name. But now if I
say what is my
name?
Then it's going to say I don't know your
name. You haven't told me yet. So the
these things are stateless. The other
thing I want to show you is it does not
have an understanding of things beyond
its cutoff point. So, what is the day
today?
It's going to say, "I don't have access
to current date. I don't have real uh
time information." That's true if we ask
it what time it is or
what the weather is like in Montreal
and so again I do not have access to
real time information
and so now we're creating an agent and
again this is the exact same code with a
few changes I'm trying to make
incremental changes to the code so that
you see the difference and so again I'm
using the same model same converse same
uh bedrock API and what I've created
here is some tools so this is a
calculator tool very simple this is a
mocking a get weather tool which just
returns um some information based on
city
and this is a get date tool and then a
get time tool. Again, we've asked the
model uh what the what the date is, what
the time is or what the weather is in in
in New York right now and it doesn't
have that information because of its
cutoff point. It doesn't have access to
real-time information.
And so the converse call that we've done
before, it's the same one. I just
packaged it in a call lm function. And
then we've got a tool execution function
to to execute the the tool if the system
decided that it needs a tool. And this
is what the system prompt is. Um you are
a helpful personal assistant. Um based
on the user's message, decide if you
need a tool to or respond directly. You
have access to these tools.
Okay.
And so
what we're going to do is we're going to
call the LLM with the with the user
input and the system prompt and then
we're going to parse the output and if
the output has a tool call
then we're going to execute that tool
call and then we're going to go back and
do the same thing. Okay, so this is in a
loop. It's in a while true loop and
that's what we're going to do. We're
take going to take the user input. We're
going to send it to LM, see if the LM
needs a tool executed. If it does,
execute that tool, and then return it
back to the LM in a loop until
[clears throat] we get an answer. Okay,
so let me run this. And so again, I'm
going to do the same thing. I'm going to
say "Hello
my name is Rola."
And so again, it greets me. And I'm
going to say, "What time is it?"
And so when I say, "What time is it?"
It's using a tool. It's using the get
time tool and it's telling me that it's
1:43.
Okay. What
date is it today?
Again, it's going to call this tool get
date and it's going to give me the date.
It's January 2nd, 2026.
Now, if I say
um where is Montreal located?
Now this is information you see here. So
it it didn't use a tool. It this is
information from the module itself. So
you can see here if it's using a tool I
have this symbol here uh where it's
using a tool. Here it answering from the
actual parameters of the model itself.
And I can ask what is the weather like
in New York now?
And then it's calling the tool to get
weather. Now I want to show you that um
I didn't do anything special here. I
defined some tools. I got some I I
created some tools with reasonable names
and um a good dock string, but I didn't
plug it in in any particular way. I just
told the agent that these exist and
somehow it knows how to use them. And
what we've done here in the loop is
we've asked some questions about the
date, the time, the weather that it
couldn't answer. And so we've
supplemented this LLM with some tools to
create this agent. Now I still want to
um ask what is my name?
And you can see I don't know your name.
I haven't uh we haven't been introduced
yet. And so we're going to add memory.
And so here again, this is the exact
same code from before. This is all of
the same tools. And here this is the
call lm function and then the execution
tool execution function. And then
there's this function to update memory.
And this is doing a very simple thing
where it's appending every conversation,
every user input or assistant input. Uh
it's appending it back to the JSON.
Okay? So, it's a very crude way of doing
it where you're just tagging previous
conversation on top at the end of the um
current ones. And so, it's the exact
same code, but what we're doing is now
we're calling the lm and we're adding
the history. It has a history to it.
Okay.
And so, we're going to run this now and
we're going to say hello,
my name is
Okay, so it greeted me. What is the date
today?
It's January second. What time is it? So
again, it's using all these tools that
we gave it. I can ask it um
again it use the tool and if I can ask
it something that it knows itself from
its internal uh parameters
its internal knowledge base then here it
did not use a tool is the system um
and it tells me is a fascinating city
which is true I agree. Okay. So now what
I wanted to show you here is I want to
ask it what is my name?
And now it's going to say your name is
Rola. You told me that at the beginning
of the conversation and that's true. And
we're going to say another thing is um
let's say summarize
our interactions. [clears throat]
And so it's going to say here's the
summary of our conversation. you
introduced yourself and we greeted each
other. You asked about the date and time
and then you asked about the weather and
then we you asked me about Montreal. And
so now it has an understanding of all of
the steps that have happened previously.
And that has and that the way that works
is because we've appended these
conversations at the end of um in the
context window. We've added them to the
context window. Okay.
Okay. So I want to show you now how we
would create this agent um with a uh
with a framework. So again all of this
has been just base Python
but these there are um
frameworks that help us build these
agents. We have to we don't have to do
it from scratch. And so we import the
create agent from Langchain. Again, same
model, still going through bedrock,
but now we have these tools. We've added
this decorator to them.
Same stuff. And then we've got this tool
list. Now I've I just added all of my
tools into a tool array. And then we've
create this agent with the create agent
function. We give it an LLM. We give it
a tool. And we give it a system prompt.
And this is what I want you to
understand is um from the components of
an agent. What matters is to understand
um what is important for an agent. All
of these frameworks that will allow you
to create an agent will have slots for
these important uh components. And so
you can see here the LLM is the brain of
the agent, the tools and the system
prompts. And in the second one, we've
not added memory here, but we're going
to add it in the second one. But then
when you go to the documentation, you're
going to see that you have slots for all
of the things that you can do.
And so here we've we invoke the agent.
So with the the what the framework does
is it of course makes it a lot easier um
to build these. There's no tool
execution. It takes care of that. Here
I've removed the tool execution. Um, so
it removes all of the little glue code
that you need because they take care of
that behind the scenes. So we've we're
just going to create the agent and then
we're going to invoke the agent with the
user uh input. [clears throat] So again,
what time is it?
And it's going to use the tool execute
it and give me a time. Okay. And again,
you can test all of the examples. You
have access to this code. But the idea
is to see that these frameworks make
life easier. Now what's important to
know about frameworks is that because
the system is evolving these tools
um their stability can be in question.
So um lang chain for example changed
their version to 1.0 Oh about a month
ago now and that kind of changed a lot
of how the code is written and what is
supported. Of course they are going to
uh support uh the previous versions for
I think a year or two but again you have
to know that the models themselves can
be deprecated as new models come in and
um the frameworks themselves can change
very dramatically. So when um as an
architect when I build a system based on
foundational services I know that the
expiry date of the system
is quite far in the future. I know that
if it's based on good foundational
architecture and good services that that
system is going to to work well for a
very long time. That is not true when I
build a generative AI system. And that's
not through a fault of my architecture
or engineering. It's really about um the
the the models themselves being
deprecated or changed, the frameworks
themselves uh being changed and the
because there's a lot of um there's just
a lot of evolution in the field. So just
be mindful of that. Okay. So now we're
going to create the same exact agent,
but we're going to add memory. So we've
got this insaver memory and then you can
see that everything is the same but then
here we're adding a checkpointter um
this is short-term memory that we're
adding. Okay. So let me show you the one
uh with memory. Uh hello.
Okay. It greets me. What time is it?
And then it gets the tool and then I
want to say what is my name
and it can recall my name. Okay. And so
again it's a lot um simpler with these
frameworks to add memory.
[clears throat]
Okay.
So again this code is online and you can
play with it.
Um what I've shown you is very very
simple because I want you to get the
fundamentals of it. But of course
there's a lot of topics that um
there's a lot of extra complexities and
layers on top of this. So you there's a
lot that goes into model selection,
prompt engineering, context engineering,
data management, uh what data do you
store, how do you rank it, how do you
index it, how do you retrieve it, um
memory and what types of memory you
want. um tooling, interfaces,
architectural choices, deployment
approaches, security and compliance,
orchestration. At the end of the day,
these are systems like any other system
and you have to take uh quite a bit of
decisions that that um for that system.
Okay. So, agentic architectural
patterns. So, if you know the predefined
sequence of events that need to happen
to solve a particular problem, then you
can code that up in a workflow. If you
don't if the solution space is too large
or if you think an agent would do a
better job then you can build a single
agent and we just built one using Python
or lang chain
or if the problem is too complex then
you can build a multi- aent system and
there's a lot of different patterns that
are there's a lot of anecdotal examples
across different
um companies and what they're doing
there's very few that are emerging as
repetitive
Um, this is again a space that is
evolving and so with within 6 months we
should be able to see more things that
are repeating across different
companies. But for now I'm just showing
you the two that we're seeing over and
over. And so you can build hierarchical
supervisor supervise systems. Um, this
is an system where there's a supervisor
agent that speaks to more specialized
agents but usually they cannot speak to
each other. or there's the swarm pattern
where you've got several agents all of
which can speak to uh one another. And I
want to show you what the difference um
this makes. So what I want to do here
with this notebook is to show you the
difference between the swarm
architecture and the supervisor
architecture. So I'm just going to run
all of these cells so we don't have to
wait for them. So we import some
libraries here. We import uh we install
some libraries here. we import pandas
and set up um its printing settings.
There's some utility functions that help
me extract some information and print it
out.
And what I've got here is a single
agent. So again, we're using length
chain and I've got these three functions
add, multiply, and divide. And we create
an agent and um that uses the claude 4.5
as a brain. And it has access to these
tools, the add, multiply, and divide.
and we tell it you are a math expert and
I'm invoking it with this long
expression. Now if the math expression
is not too long the um LM can do it
without using any tools or any agents.
So I try to make it a little longer and
you see it's it solves it and it shows
us the breakdown.
Now here what happens is uh I'm creating
a supervisor architecture with three
agents. So, I've got the add tool, the
multiply tool, the divide tool, and I
create an agent with each of these
tools. So, I create agent. I create an
addition agent that only has the add
tool, a multiplication agent that has
the multiply tool, and then a division
agent that has the divide tool. And I
create a supervisor agent that has
access to these three agents, and I tell
it that you are a supervisor, a team
supervisor managing math experts. For
the addition, use the add agent, the
multiplication, use the multiply agent,
and the division, add the divide agent.
and then I invoke it with the exact same
tool.
So what I want to show you here is what
actually happens. So we've got a
supervisor who gets the question and
then it transfers it to the addition
agent and then the addition agent does
its job and then throws it back to the
supervisor agent.
Then the supervisor agent sends it to
the multiply agent.
And then the multiply agent does their
job and sends it back to the supervisor
agent. And then the supervisor agent
sends it to the divide agent which again
sends it back to the supervisor agent
which gives us the final results. And so
the way it works is again the three
agents don't have interactions among
each other and they can only speak to
the supervisor agent. So things go down
the chain, the task gets executed, it
goes back up, the decision is made where
next to go, and then you've got this hop
back and forth uh between the
specialized agent and the supervisor
agent. This, of course, this interaction
cost us about 16 total hops
um with 10 agent actions and six
transfers. It cost us about um 8,000
input tokens and a total of 700 output
tokens. Now let's solve that same
problem with the swarm architecture. So
here again I've got the add multiply
divide and again I've created agents
that are um have one tool. So this is
the add agent. It has the add agent but
it can also speak to the multiply agent
and the divide agent. This is the um
multiply agent. It has the multiply tool
but it can also speak to the add agent
and divide agent. And then the divide
agent has the divide tool and it can
speak to the other two agents. Okay. And
then we create the swarm architecture
with the three agents with the default
active agent being the ad agent. So we
are speaking the first agent to to to
speak to is the ad agent and then we
compile it and then we invoke it again
with the exact same uh examples. And so
what we would see here
is that the add agent receives our our
query.
It does it adds what it needs to add and
then it uh transferred to the multiply
agent. The multiply agent again does
what it needs to do and then it
transfers it to the divide agent and
then the divide agent um solves its part
and then out gives us the output. And so
this interaction because everybody can
speak to everybody cost us eight
different um interactions
[clears throat]
with only two transfers. It cost us five
about 5,000 input tokens and 500 output
tokens. So you could see the same
problem with the different the two
different architectures and what it
would cost.
And so which one should you use? Well, a
lot of people discourage against multi-
aent systems. So, what I showed you in
the code is we created a single agent
with three tools or we created three
different agent each with one tool in a
supervisor, a hierarchical architecture
or a swarm architecture. Of course,
these are toy examples. We wouldn't
create a single agent with a single
tool, but just for you to to to kind of
see the difference. Um people discourage
the use of multi- aents because there's
a lot of overhead in um in transfers in
managing memory in managing information
across systems. Of course, if the
task is complex enough for a single
agent, then a multi- aent might be
useful. Okay. So the advice is I think
that if you can get away with a single
agent then you should try to get away
with a single agent just because of the
overhead needed in setting up memory and
um just overhead of a multi- aent
system. However, if the system gets too
complex that um other specialized agents
might be useful then you can move to a
multi- aent system. Again, you have to
remember that there's a lot of
limitations on the context window on um
if you clog the context window with too
many tools with too many instructions,
then we do see performance degradation
and so specialized agents might be
better used. Now, in the multi- aent
domain, um should you use supervisor or
swarm? I've heard different opinions.
Um, smaller teams with with simple tasks
prefer Swarm just because of what I
showed you in terms of less overhead and
transfer up and down. However, I've also
heard that as the task gets too big, the
solution space in a swarm is a lot
bigger than a supervisor just because
there's so many different uh paths that
can be taken with um with more transfers
being possible. And so with complexity,
it does seem like a supervisor um
solution might be easier to debug. But
again, you would have to depending on on
the size of your task and how you set it
up, it's best to experiment with your
own um example. And again, architectures
are changing. So we're seeing um
[clears throat]
agents that can spawn other agents. We
we are seeing examples of agent as a
tool. We're seeing a lot of different
patterns that are emerging. But again
nothing is as yet stuck. Um and so maybe
another 6 months I I will revise this.
Um so let's talk about agent interface
uh protocols um standardization and
interoperability. So agents need to
interface with tools. They need to be
able to call tools and use their
outputs. They need to be able to
interface with data sources and
databases. They need to be able to talk
to a user and they need to be able to
talk to other agents in multi- aent
systems. And in the examples that we've
just built, a lot of it is based on the
English language. So in selecting a
tool, the agent is only relying on the
the naming the name of the function and
its dock strings. Um and so if tools
become very similar, it might be
confusing.
And so what [clears throat] companies
are doing are try they're trying to
create standardization protocols at the
interfaces of agents. So for agents
using tools and data, anthropic
released uh the model context protocol
MCP which
>> [clears throat]
>> um is supposed to standardize
how we use tools and uh data at the
agent agent um interface
uh Google released agent to agent or A2A
protocol and between humans and um bots
uh and agents. There's the agent user
interactions or AGUI which came from a
collaboration between Copilot, Crew AI
and Lang Chain. Now again, these are
also changing and there might be more
out there. I think the one that's really
interesting is MCP. Um last month uh
Anthropic donated MCP to the Linux
Foundation under a new
suborganization
that is also uh chaper owned by OpenAI.
So OpenAI uh Anthropic and the Linux
Foundation are going to be incubating
MCP and uh so that's going to definitely
make it uh more popular
soon. So the main purpose of all of
these is to really uh ensure smooth
handoffs and to ensure interoperability
and reuse. So when you're building your
own agent, you're building your own
tools from scratch. Uh you're setting up
your systems. But if you want to build
another system, then you can you would
have to set up a lot of the systems over
and over again. If however we build
tools with the same interface, then you
can think of it as plugandplay. Think of
it as a USB
um or an HDMI portal. If we can make
sure that the interfaces are the same,
then we can plug and play different
systems. And today we do have um MCP
hubs where people can put tools and we
can uh use some of these tools and that
will help us just grow the ecosystem a
lot faster.
Evaluating agents. So how do we evaluate
agents? Well, there are several layers
to agentic systems, right? And at the
heart of it, the brain of the agent is
an LLM. It's a foundational model. On
top of that is an agent system. So what
that means is beyond the LLM, which is
the brain of the agent, you have tools,
you have memory, you have um maybe some
guard drills that you've added or
communication protocols or or or and
then on top of that, agentic system is a
deployed application, right? you've uh
you package this up and then you put in
front of users. And so you have to
understand the layers of the onion as
you evaluate. There's evaluations that
can be um asked or done at each of these
levels. So from an LLM point of view,
you want to know well is the LM
following instructions? Is it capable?
Is it capable of doing that task? Is it
is there is it accurate? Is it
hallucinating? hallucinating, is it
consistent, is it toxic, do I need
guardrail? So, there's questions that
can be asked on the LLM level itself.
Then there's questions on the agentic
system. So, is there proper
decomposition of the task? Is the
execution efficient? Um, is it choosing
the the correct tools? Is it retrieving
the correct information if it's
retrieving information? Is it uh
completing the tasks successfully? and
so on. On the application level, these
are just software application systems
the same as any other system. So you'd
look at overall performance, error rate,
latency, scalability, cost efficiency,
uh access and identity, UIU, UX and so
on.
So in terms of evaluating the output of
LLM or agents, there's three main ways
we do that. You can use codebased eval.
So this is coding up um evaluations the
same the same way we do in um in
workflows or any other language. There's
LMS as a judge. So you can use LMS to
judge the output of other LLMs or
there's human evaluators or annotators.
And there's different ways of doing
this. Um and you could choose whichever
you're comfortable with. Of course,
codebased evals are going to be a lot
cheaper and a lot more um consistent
than having uh an LLM as a judge or
human. It's going to be um just more
repeatable. And so some questions that
are important is is is your output
quantitative or qualitative? Um is it
discreet or um or not? So can you do you
know what the output should be ahead of
time? Um, is it deterministic? Is there
a ground truth? Can you compare it to
something? Um, are you cost-sensitive?
And if you know in advance what your
output needs to be or you have something
to compare it to, then I highly
recommend a codebased eval just because
of the price tag um, and the consistency
of it. If not, then you might be able to
use LM as a judge or humans. of course
with LLM as a judge likely being cheaper
than human evaluators.
So agent challenges um
in terms of models again like I said the
models themselves are uh are evolving
the application space is evolving how we
use things what we're learning with time
is changing and so some of the
challenges that we have with models is
really the output evaluations these
these models have a mind of their own
and their output is quite open-ended and
so evaluating
um what the output is. Sometimes in a
single sentence, a single word can can
change the meaning drastically. And so
being able to evaluate open-ended uh
outputs is is not easy. There's of
course still model limitations in terms
of their ability of of what they
understand or don't understand, their
context window, um what they're capable
of, their cut off limitation. For
example, hallucinations can still be a
problem in terms of um agents um
evaluating the path that it needs to
take to solve a particular problem.
Context management um is also an active
field of of study right now. It could be
convoluted debugging just because of the
layers of the system
and the freedom it has in devising its
own solution. Price estimate for agents
can be an issue because of the loop.
There's a loop and it can go as many
times in a loop as it needs to to figure
out the problems. And so estimating a
price tag um can be difficult. There
could be compounding error. So if the
task is really large and it takes a
wrong turn, it could compound its errors
and not get to a right to a proper
solution. It can get stuck in loops. For
example, we might have integration
issues at with the tools. we might get
errors or it might not choose the right
tool. But again, a lot of these things
are dealt with with the framework. So,
lang chain and others have a way to stop
getting stuck in loops and other issues
that might might occur. Framework
stability, like I mentioned, could be an
issue because it's an evolving field.
Models themselves can be deprecated and
libraries can change quite drastically
from version to to version. And um one
of the also bigger challenges is
business value. Um there's a lot of
debate on whether these systems are
bring the business values that everybody
is expecting from them.
Uh but that is yet to be seen.
So
there's a lot of um issues that we saw
in 2025. For example, you might have
heard that the replic agent reportedly
deleted production even after a code
freeze.
The the response of the agent when I
read it was pretty funny. I think it
said something like um I'm sorry it was
a bad decision, but I [clears throat]
panicked. Something about anxiety and
stress. And I'm like, you're an agent.
You don't have anxiety and stress. But
that's the that's the thing about um
reading so much of the human literature
and the human language is now they're
replying the way a human would. There's
a lawsuit against open AI
from parents who um claim that cha
helped their uh son commit suicide. I
think now there are guard rails against
that. Uh the last time I asked uh GPT
about um something that had to do with
suicide, it it sent a lot of information
about um the suicide hotline and other
things. So that I think has been
regulated
um from for many models. Air Canada was
found liable for the chatbot's bad
advice and I think they
Air Canada did not want to pay
the the difference because of the
chatbot and then the I think they were
made liable and judge forced them to pay
uh the difference and then there's the
claims that of course um money is being
set on fire. So, $40 billion of uh
generative AI and agent products not
bringing value back. Now, if you're
interested in in these and in
understanding more of what's going on,
there is an AI incident database that um
tries to keep track of all of this and
um you can see here they're at a,323
incidences. Some of these are minor in
terms of hallucinations and inaccuracies
and some of them are major in terms of
um you know things that have to do with
with ethics and legalities and and uh
human health and well-being.
But all of this to say that again the
the AI potential is there. I think
everybody sees that there's a lot of
potentials that these systems can bring
on many different levels. But the
technology is getting there. technology
is still maturing. We're not there yet
from a from a technological point of
view. Both like I said from
the the model and the technology
perspective and from an application
space, how do we use these models? Um
progress is rarely linear. uh we will
have to just dabble with the technology,
try different things, hit a few walls
and then just deal with the
[clears throat] consequences and then
buffer against uh the issues that these
systems have.
And so it's best to use AI as a junior
assistant. It's not ready yet. It's
still maturing. And so treat it, use it
um it could be very powerful. could give
you ideas but use it as an a junior
[clears throat] assistant. So start with
readonly access to tools and systems,
add human approvals for very critical
steps and then enable comprehensive
logging to be able to see um the traces
of what is going on.
Will agents take my job? There's a lot
of questions about wh whether AI will
replace humans and at what rate they
would do so. And of course, um this is
all unfolding.
Uh but maybe it will, maybe not just
yet. Depends on what your job is and the
complexity
um of it. But there is this um
there's this research article that uh
Microsoft research put out in July 2025
and what they did is they measured they
had a they looked at jobs and
um divided they gave them an index of
how likely they would be replaced by AI.
And you could see here the top 40
occupations with the highest AI
applicability score. So these might be
uh replaced and you can see some here
that are interesting. They seem this
list seems to be more cognitive
intellectual. So you could see here uh
proof readers, editors,
um I think I saw there's mathematicians
here, data scientists, analysts, uh web
developers, and then here is the list of
the bottom uh 40 occupations. So, you've
got more um you know, nursing
assistants, but there's more physical or
manual labor. So, you could see
dishwashers, for example,
um roofers, uh floor sanders, and
finishers. Now, what does this mean?
Would I tell my girls not to be
mathematicians and look into floor
sanding? Um, no. Not yet. I don't think
so.
>> [clears throat]
>> I think change is coming for sure. Every
huge technology that has come um has
changed the job market in in in
different ways and the only constant is
change. We know that and jobs have
changed in different ways across um
societal evolution. Um but it's not
clear yet how things will change. We
know they will change. Do I think do I
realistically think that AI will replace
mathematicians? No, I do not. I think
mathematics is a little bit more complex
um and of at least cutting edge
mathematics
[clears throat] and you know um
bordering on the lines of philosophy and
um is not something that will be
replaced in my own opinion. But I do
think that you know uh technical writers
and analysts and so on are going to see
a change in how they work. At least
this seems to align with the Moravex
paradox the idea that what is hard for
human is easier for AI and and vice
versa. So for humans um humans crawl
start to crawl by the age of you know a
few months and then um they start to
walk by their first year and most humans
can jump by their second or third year
of life. Uh this is also true for
animals that they can some animals can
walk uh moments after they're born. We
do think of things that are more
intellectual like philosophy, gaming or
uh chess for example the more cognitive
abilities as more selective. Not
everybody
uh can can do philosophy very well or
play chess very well but almost every
human can jump. And this um goes back to
human evolution in that and this is true
for animals as well that sensory motor
skills are some of the older things that
we evolved to do whereas the
intellectual and cognitive abilities in
the animal and kingdom and in humans is
is a more recent um addition.
However,
uh AI seems to be the opposite in that
some of the biggest
um wins in AI were in cognitive and
intellectual
um pursuits. So, some of the first wins
were were against chess and gaming and
Atari and bow and so on. Whereas now we
still struggle with uh making bots that
can walk through a stage without uh
fumbling.
How does that change things? Does that
change how you should think of careers
and stuff? Um, I don't think so. I think
it's very interesting. Um, but
we'll see how we just have to keep an
eye on how things are moving.
I do want to talk uh specifically about
software development because I am in the
field and um it's one of the fields that
is seeing drastic uh changes. So this is
a talk from Andre Karpathy
who um gave this talk to the Y
combinator which was very interesting
and he mentions that software
development was stable and didn't change
for about 70 years and then it had it
saw two very vast changes in the last
two decades. And so we started with
software development with software code
in let's say the 1940s where we were
able to program computers. And so you've
get you get this system you set a series
of rules you create a function uh let's
say get sentiment anal get the sentiment
and then you write the series of rules
and conditionals. If this then do that
while this is true do that. Um that
would give you that result.
And then in in the 2010s
we got into what is being called
software 2.0. This term again was was
coined by Andre Karpathi himself where
now you don't code the rules. What you
do is you get this system this model
this algorithm and you give it the input
you give it the output and it learns the
rules.
And you've got this model and most uh
frameworks you would use model.fitit to
train the model and then model.predict
to the outcome of the model.
And today we're seeing software 3.0 or
generative AI where you use the English
language to uh program a system. And so
in these LMS, you speak to it and you
say, "Okay, well give me the sentiment
of this paragraph." and you give it the
paragraph and it'll come back with a
sentence. And so software development
has changed
from writing code yourself to training
models that figure out what the rules
are or what the transformation between
output and input is to now these
foundational models that you can speak
to in natural language. And today if you
are a
young student learning computer science
and coming out at the job market you
should understand that these three
paradigms of working now exist and you
should have an idea of how to use each
of them. Again, this is a really
interesting talk by Andre, which you can
find here. And this sentiment was also
uh re-echoed from with the Burner
Vogel's reinvent 2025 talk uh called the
Renaissance developer. And it goes
through how the field is changing and
how to uh deal with the changes.
So, weathering the storm, what should
you do to really uh protect yourself
against
um all of the changes that are
happening? Well, I think my first advice
is to learn AI. Don't fear it, right? AI
does not have to replace people. It can
amplify them. At the end of the day, AI
is a tool and how we use it as
individuals and as a society will write
the future. And so my first advice is
really try to learn AI is not it's not
super complicated especially with
generative AI. I know there's a lot of
um technical aspects to it but there it
is being popularized to the extent where
um it is accessible. There's a lot of
really good uh material out there for
free and so it's worth a try.
The other thing that I find it important
is that fundamentals don't fade. Okay?
So physics math biology chemistry
these things are not going to go away.
The foundational of our world are not
going to go away. How we build systems,
architecting, um just the foundational
fundamentals of every field, those are
not going to go away. We might be able
to get a production boost through AI,
but the foundations will always be
important. We will always need databases
to store information um and retrieve it.
We're always going to need to understand
networking and identity um and access
and so on. And what you need to think
about when using AI is the
[clears throat]
the systems are not there yet. And so
you need to know your fundamentals to be
able to direct these systems uh towards
doing the the task. Well,
my third advice is move the up the
abstraction ladder. Okay, you need to
define the problem. You need to design
the solution. You need to own the
outcomes. AI is not going to do that.
What you have with AI today are really
good junior assistants that can that are
really good with syntax. And what they
can do is that they can write code. They
tend today as of 20, you know, early
2026, they can write they do write code,
but they do tend to write code that is
pretty um it could be convoluted. It
could be very long. It's over, it's
super, it writes in a very defensive
way. So, it gets into these uh try uh
try catch statements that are super um
it's trying to guard against everything
and it's just super long for things that
are not um necessary. So, it's creating
tech debt in in some ways. And so, what
you need to do is you need to define the
problem, you need to decide design the
solution, you need to own the outcomes.
And what you do is you use this
assistant
with very clear instructions and then
you read the outcome and you make sure
that it's concise and that it's doing
the right thing. And so you need to
understand all of your foundations and
your system.
Um think in systems. So you need to be
able to see the bigger picture,
understand your system components, the
integration points, what can and cannot
be done and what should and should not
be done. The context window today is not
large enough for us to put all of that
information, a full code base in a
system and have AI do it. We're seeing
as the context window fills up, we see
serious degradation. And today, the way
a lot of us have tried to use AI is you
put the whole code base and it fixes
something here and it breaks something
there and you keep going and it breaks
some and you fix this and it breaks
something there. And so the best way to
work with AI is for you to own your
outcomes, for you to see the bigger
picture. You are managing a junior
um helper that's good with syntax but
still needs a lot of um guidance. That's
how you should think about it. Be a
polymath. So this is advice from uh
Verer's Verer Vogel's uh lecture. You
need to broaden your of knowledge.
Again, you have to think of yourself as
now a supervisor of these um helpers.
And so, you need to be able to have an
understanding of what they're doing. You
need to be able to learn very fast and
to have a broader skill set.
Some niches are more difficult for AI.
So, AI can only know what's in the uh
when what's in its training data and it
so it doesn't do cutting edge very well.
It doesn't come up with novel idea. You
have to remember these are tokento token
probabilistic generators. It's it's
looking at its understanding of the
whole whole human um knowledge base and
it's using that to understand
the probability of different tokens and
so it constructs sentences token by
token which of course there's all sorts
of problems that this could cause. It's
very greedy. there's no planning aspects
and there's so many different ideas
about how the next generation of um
models need to be and Dr. Yan Lakun has
been arguing for world models for a long
time. He keeps saying that these LLMs
are a um you know on are an off-ramp in
the highway of AI uh studies. They are
impressive. We have to say they are
impressive in in what they're able
[clears throat] to to generate and
there's a lot of applications that have
been built on top of them but they are
still tokento token probabilistic
generators. There's no thought behind
them. There's no they don't understand
our world. They don't understand
physics. They don't understand a lot of
things. They're still a lot of them are
based on text and not um other
modalities like images which is how we
learn. And so there's this new
um
class I think of of models that we're
going to see emerge and Dr. Yon Lun had
left left uh Meta this year to create
his own uh startup focusing on world
models that he called AMI. So I'm super
excited to see what's going to happen
there. But this is all to say these
tokento token generators are not going
to come up they're unlikely to come up
with cutting edge um novel ideas which
is why I'm saying they're not going to
be uh replacing mathemat mathematicians
anytime soon. Of course they can do math
from math we already know but I don't
think they're going to come up with new
physics and math theories. That's highly
unlikely. I'd be surprised if I see
that.
And uh my last advice is to focus on the
human element. Like I said, a very a con
from my perspective to agents and
agentic system is that they are not
humans. [clears throat] I still would
rather deal with a human, a person to
solve my customer complaints. Um the
reason we
have as a species taken over the planet
in comparison to any other species is
our ability to work together
collectively. Uh we create this human
fabric that is able to learn from each
other both horizontally and vertically
across generations and we we regulate
each other. There's a lot about the
human nature that is very special. I'm
I'm very very interested in humans as a
species in what makes us who we are in
what um there's some humans that are
angels without wings and there's some
that are that make questionable
decisions and humanity is very very
interesting and I think that we've
evolved together in a very special way
and what makes us humans is um is part
of our nature.
And so I would suggest you focus on the
human element. You build trust with your
clients, build um build [clears throat]
connections with your teams, network
with people. AI is never going to do
that. Not yet. At least it is an in it
is an outsider to the human race. And so
I think if you want to get ahead in an
AI world, focus on the human element.
So, these are some references that I
really like um and that I've used. All
of the ref like all of the references
that I've used within the slides are
there for you to use. Um but these
[clears throat] are some that I like. I
highly respect Dr. Andrew. He's one of
the few people that I follow
with I have a lot of respect for him. He
is one of those people who was at the
border of academic [clears throat]
research and industry and applications
and he has this really cool course
called agentic AI on deep learning which
I highly highly recommend. I think from
all the resources I've seen this one um
is the one I've liked the most. Um I
like Chip Huan very very much. I have
both her books. She has a book on
machine learning and one on AI
engineering. She's one of those people
who writes um very simply down to earth
but she's very comprehensive. I
absolutely love her. I have her books
probably in every form in printed form
and PDF audio books in every way. She is
definitely somebody to check out. I
follow Andre Kurapathy as well. Um
brilliant person down to earth. There's
a lot of hype in AI. There's a lot of
people sewing, a lot of confusing ideas,
a lot of things that are just out there
in terms of just wanting drama. Andre is
not one of those. So, Andrew, Chip, and
Andre are people who are really down to
earth, who are not um just flaming fires
and drama. And Andre is very good at
explaining very complex things. He has
his YouTube channel and he's building an
educational um
institution or or service which I'm
super excited about. There's a few
there's a really some really good
courses on Corsera that uh you might
like and all of the important players in
AI have their own academy. So Langchain,
Anthropic Nvidia Deeplearning.AI they
all have an academy um that has free
information that you can check out. I
really like the deep learning.ai course
catalog. I find it's very useful and um
I really like Stanford University
classes. They put a lot of classes
online and these are computer science
technical computer science classes that
are given at Stanford that they um put
on YouTube which I find super super
useful.
And then uh this is the email
introduction that I have done last year
if you're interested. [clears throat]
And this is just the Linux Foundation uh
article for the donation of MCP.
And that's it. I really really hope this
was useful. You can find me on LinkedIn.
[clears throat] And uh the code base and
the slides are all found online uh on my
GitHub. This is the course from last
year. And again, I work for Tech42. We
do specialize in Genai and agent. If you
want to talk agents with a with with a
set of nerds, uh you can find me and my
colleagues at tech42. And that's it.
This course, from Rola Dali, PhD, provides a comprehensive overview of agentic AI, defining agents as software entities that use LLMs to perceive environments, make decisions, and execute actions to achieve specific goals. It explores the critical distinction between static workflows and dynamic agentic systems, emphasizing how LLMs serve as a reasoning "brain" to decompose tasks at runtime. Through practical Python demonstrations, the course covers essential components like system prompts, tools, and memory, while also comparing architectural patterns such as Supervisor and Swarm. Finally, the session addresses the future of technology by discussing emerging interoperability protocols like MCP and the shifting paradigms of software development in an AI-driven world. Slides and Labs: https://github.com/rdali/ML105_Agents Profile: https://www.linkedin.com/in/roladali/ ❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp ⭐️ Contents ⭐️ - 0:00:00 Introduction and Speaker Background - 0:01:15 A Brief History of Artificial Intelligence (1940s–Present) - 0:05:43 Traditional Machine Learning vs. Generative AI - 0:06:35 The Three Pillars of AI: Algorithms, Data, and Compute - 0:11:08 Specific Tasks vs. General Task Execution - 0:14:41 Defining Agency and the Spectrum of Autonomy - 0:18:00 Agentic Milestone Timeline (2017–2026) - 0:20:31 What is a Generative AI Agent? - 0:23:04 Agents vs. Workflows: Dynamic Flow vs. Static Paths - 0:26:18 Pros and Cons of Agentic Systems - 0:29:59 Patterns and Anti-patterns: When to Use Agents - 0:32:36 The Core Components of an Agent - 0:34:55 Choosing the Right LLM for Your Agent - 0:37:38 Crafting Identity with System Prompts - 0:39:00 Understanding Memory: Intrinsic, Short-term, and Long-term - 0:41:26 Enhancing Capabilities with Tools and Actions - 0:43:09 Hands-on Implementation: From Single LLM Call to Python Agent - 0:52:18 Adding Memory and History to Your Custom Agent - 0:54:53 Building Agents with Frameworks (LangChain) - 0:57:17 The Evolving Landscape of Models and Frameworks - 1:00:15 Agentic Architectural Patterns: Supervisor vs. Swarm - 1:01:41 Case Study: Single Agent vs. Supervisor Architecture - 1:04:48 Deep Dive: Swarm Architecture Performance - 1:06:08 When to Choose Multi-agent Systems - 1:09:05 Interface Protocols: MCP, A2A, and AGUI - 1:12:06 How to Evaluate Agentic Systems (LLM vs. System vs. App) - 1:13:53 Evaluation Methods: Code-based, LLM-as-a-Judge, and Human - 1:15:25 Current Challenges: Hallucinations, Cost, and Debugging - 1:18:15 Real-world Incidents and the AI Incident Database - 1:21:28 Career Impact: Which Jobs are Most at Risk? - 1:23:41 Software 3.0: The Evolution of Development Paradigms - 1:29:00 Weathering the Storm: Strategies for the Future - 1:33:40 Beyond LLMs: World Models and the Future of AMI - 1:37:15 Recommended Resources and Closing Thoughts 🎉 Thanks to our Champion and Sponsor supporters: 👾 @omerhattapoglu1158 👾 @goddardtan 👾 @akihayashi6629 👾 @kikilogsin 👾 @anthonycampbell2148 👾 @tobymiller7790 👾 @rajibdassharma497 👾 @CloudVirtualizationEnthusiast 👾 @adilsoncarlosvianacarlos 👾 @martinmacchia1564 👾 @ulisesmoralez4160 👾 @_Oscar_ 👾 @jedi-or-sith2728 👾 @justinhual1290 -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news