Loading video player...
[Music]
You type let's say create a posgrql
database in AWS in cloud code or cursor
or whichever agent you use and then you
hit enter and boom it just works
database created configured running like
magic but it's not magic behind that
simple request is an intricate dance
between you and orchestrator called an
agent and a massive language model. Most
people think the AI is doing everything.
They're wrong. The AI cannot touch your
files. It cannot run commands. And it
cannot do anything on its own. Nothing.
So, how the hell does it work? How does
your intent turn into actual results?
That's what we are going to break down.
the real architecture, the three key
players, and why understanding this
matters if you're using those tools
every single day. Before we discover how
agents work, let me address a related
problem. When you're building AI
features into your own applications like
semantic search, Q&A summarization, you
typically end up bolting on fragile
services and writing glue code to
connect everything. vector database
here, embedding service there, LLM
integration somewhere else. It's a mess.
And that's where the sponsor of this
video comes in. Raven DB is a NoSQL
document database known for staying out
of your way in production. No
babysitting, no band-aids, no weird 3:00
a.m. alerts. It includes AI features
built directly into the database. Vector
search embedding generation, genai
workflows, even AI agent creation, all
native to the data layer. You're not
adding external services or managing
sync jobs. You can issue vector queries
and structured filters in the same
database call. Raven DB provides
integrations with OpenAI, Azure OpenAI
and OAMA. So agents can generate and
retrieve embeddings, summarize content
or trigger actions. Check out the link
in the description to get started. And
now let's discover how agents work.
Let's start with a simple scenario.
Imagine you're using cloud code or
cursor or any other coding agent and you
type this intent. What happens when you
hit enter? You get a posgrrisql database
in AWS. That's the promise, right? But
how the hell does that actually work?
What's happening behind the scenes to
turn that simple request into reality?
Now, before we dive in, quick
terminology here. When people say AI
agent or just agent, they usually mean
the whole damn thing, the entire system.
I'm going to be more precise here. I
will call that complete system an
agentic system. And when I say agent, I
mean specifically the orchestration
layer sitting between you and the LLM.
Now keep that distinction in mind. It is
important. So let's start by defining
the key players. First the user that's
you. You provide intent whether it's a
task request or a question. You get
responses back and you can clarify when
needed. Next, the agent. It sits between
you and the LLM acting as an
intermediary. Finally, the LLM, the
large language model. This is the brain
doing the reasoning and decision making.
Think GP5 cloud models like that. It
receives context from the agent and
generates responses. Those are basic
definitions. As we move through
increasingly complex architectures, we
will expand on what each of those actors
actually does. The simplest possible
architecture would have the user, you
sending intent to an agent, which passes
that intent to the LLM, which responds
back through the same chain. If anyone
actually built systems that way, it was
long time ago. It's way too simplistic.
We need more, much more. So, what's
missing? Let's talk about system
prompts. A system prompt is the
foundational instruction that defines
how an LLM behaves. It sets the role
the behavior, the capabilities. Agent
developers set these prompts. They're
completely invisible to you as a user.
Here's an example. Always explain your
reasoning step by step before providing
answers. It's short, but it completely
changes how the LLM responds. Now, with
the system prompts in the mix, we get a
more sophisticated setup. You send
intent to the agent. The agent sends
context to the LLM, not just raw
content. That context includes the
system prompt plus your intent. The LLM
generates a response that goes back to
the agent, then to you. This
architecture user to agent to LLM with
context is essentially what you get with
something like JGP. It's mainly for
answering questions for having
conversations, not much more. So what
exactly is context? Context is
everything sent to the LLM in a single
request. It includes the system prompt
conversation history, your intent, and
any other information the LM needs to
generate a response. Is the complete
picture the LLM sees. So that covers the
basics, the key players and how they
communicate. But this architecture still
cannot do much beyond answering
questions to actually accomplish tasks.
We need something more.
Even with system prompts and context
agents cannot do real work. They can't.
They need tools. Tools are functions the
agent can execute on behalf of the LLM.
things like reading files, executing
bash commands, searching the web
editing code. The LLM requests tools
then the agent executes them, and the
results go back into the context. So
how does the LLM know which tools to
execute? Huh? What do you think? Tool
descriptions are included in the
context. The agent sends tool names
parameters, descriptions, everything the
LLM needs to know what is available.
Then the LLM analyzes your intent, looks
at available tools, then requests
specific tools with specific parameters.
Agent executes those requests and
returns results. So with tools in the
picture, the roles expand. The user now
also reviews and approves actions when
needed. That's very important. When
tools are making actual changes, the
agent becomes an intermediary between
user, LLM, and tools. It executes tools
requested by the LLM. Crucially, it
never makes decisions. Agent never makes
decisions. It only facilitates
communication and execution. And the
LLM, well, it now generates responses
and requests tool execution, but it
never executes tools directly. It only
requests them. So now we have a user
sending intent to the agent. The agent
sends context which includes system
prompt, intent and tool descriptions to
the LLM. The LLM now knows that tools
are available. It just knows that they
are there and it generates a response
back to the agent and then to you. But
wait, how can the LLM instruct the agent
to execute tools if all it does is
generate a response that gets sent back
to you? If the LLM just responds once
how does it even trigger tool execution?
And the answer to that one is loops. An
agent loop is the cycle where agent
sends context to the LLM. The LLM
responds with either tool requests or
final answer and then the agent executes
those tools and results get added to the
context and the process repeats until
the LM provides the final response. With
loops, the agent's role expands again.
It still does everything we mentioned
but now it also manages the agentic
loop. plan, act, observe, repeat, and it
builds and maintains the context that
gets sent to the LLM with each
iteration. The LM's role also gets a
crucial clarification.
It's stateless. It has no memory between
API calls. Every time the agent sends
context to the LM, it's a fresh request.
The LM only knows what's in that
context, nothing more. So, here's the
complete flow. You send intent to the
agent. The agent sends context to the
LLM. That's the system prompt, your
intent, and tool descriptions. The LLM
can either request tools or provide the
final answer. If it requests tools, the
agent executes them, gets results, and
loops back, sending updated context to
the LLM again. This repeats until the
LLM has enough information and sends the
final answer back to the agent which
sends it to you. So that's the core
architecture. Agent manages the loop. LM
provides the reasoning. Tools do the
work. But there are practical
considerations when you scale this up.
Agents don't just have built-in tools.
They can also integrate external tools
through MCP. So MCP or model context
protocol is a standard protocol for
connecting external tools to agents in
addition to those built-in tools. It
allows dynamic tool integration without
modifying the agent itself. So think
database connectors or API clients or
custom business logic. You can plug
those in and the LM gets access to them
just like uh it gets access to built-in
tools. So the architecture stays the
same, but now the agent sends context to
the LM and that context includes
descriptions of both built-in tools and
MCP tools. The LM can request either
built-in tools or MCP tools. The agent
executes them, gets results, and loops
back with updated context. Now, there's
a practical limit to all this. Context
size. Context size is the maximum
information that can be sent to an LLM
in a single request. It's measured in
tokens which is basic unit of text
processing. Let's say a token is roughly
a word or part of a word. Different
models have different context limits and
those limits change over time. When you
exceed that limit, the context must be
compacted. So what is it? What is
context compacting? Well, it's the
process of reducing context size when
limits are hit. There are several
methods though. You can summarize all
conversation turns. You can remove less
relevant tool outputs. You can truncate
file contents. The key nevertheless is
to keep the system prompt and recent
messages intact. Those are critical. The
trade-off is obvious. Reduced context
means the LLM has less information to
work with. Less information means
potentially
worse decisions.
So let's bring this all together. Agents
like cloud code cursor and others are
orchestrators. They sit between your
intent, the available tools and the LLM.
That's their role coordination.
You provide the intent like create a
posgressql database on AWS and that's
where it starts. The LLM provides the
reasoning and instructions. It figures
out what needs to happen and tells the
agent which tools to execute like read
this file, run that command, search for
this information. The LM never touches
those tools directly. It just requests
them. The agent on the other hand
executes everything. It runs the tools.
It gathers the results. It builds the
context and it sends it back to the LLM.
It manages the loop until the LLM has
everything it needs to fulfill your
intent. And here's the key thing to
understand. The agent is the only dumb
actor in this system. It doesn't think
it doesn't decide, it only does what
it's told, either by you or by the LM.
It's pure execution. No intelligence.
The LM provides the brains, you provide
the intent, and the agent makes it
happen. It's a dance between three
components. you, the agent, and the LM.
With the agent as the coordinator making
it all work without the agent, the LM is
just answering questions. With the
agent, it can actually do things. Thank
you for watching. See you in the next
one. Cheers.
You type "Create a PostgreSQL database in AWS" into Claude Code or Cursor, and it just works. But how? Most people think the AI does everything, but that's wrong. The AI can't touch your files or run commands on its own. This video breaks down the real architecture behind AI coding agents, explaining the three key players that make it all work: you (providing intent), the agent (the orchestrator), and the LLM (the reasoning brain). Understanding this matters if you're using these tools every day. We'll walk through increasingly sophisticated architectures, from basic system prompts to the complete agent loop that enables real work. You'll learn how tools get executed, what context really means, how the agent manages the loop between you and the LLM, and why the LLM is stateless. We'll also cover practical considerations like MCP (Model Context Protocol) for integrating external tools and context limits that affect performance. By the end, you'll understand that the agent is actually the only "dumb" actor in the system—it's pure execution with no intelligence. The LLM provides the brains, you provide the intent, and the agent coordinates everything to make it happen. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: RavenDB 🔗 https://ravendb.net ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #AIAgents #LLM #HowAIWorks Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join ▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-agent-architecture-explained-llms,-context--tool-execution ▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below). ▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/ ▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox ▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI Agents Explained 01:02 RavenDB (sponsor) 02:16 How Do Agents Work? 05:42 How AI Agent Loops Work? 09:36 MCP (Model Context Protocol) & Context Limits 11:38 AI Agents Explained: Key Takeaways