AI Agent Architecture Explained: LLMs, Context & Tool Execution | DailyDevLists

Loading video player...

Full Transcript

1,865 words • EN

[Music]

You type let's say create a posgrql

database in AWS in cloud code or cursor

or whichever agent you use and then you

hit enter and boom it just works

database created configured running like

magic but it's not magic behind that

simple request is an intricate dance

between you and orchestrator called an

agent and a massive language model. Most

people think the AI is doing everything.

They're wrong. The AI cannot touch your

files. It cannot run commands. And it

cannot do anything on its own. Nothing.

So, how the hell does it work? How does

your intent turn into actual results?

That's what we are going to break down.

the real architecture, the three key

players, and why understanding this

matters if you're using those tools

every single day. Before we discover how

agents work, let me address a related

problem. When you're building AI

features into your own applications like

semantic search, Q&A summarization, you

typically end up bolting on fragile

services and writing glue code to

connect everything. vector database

here, embedding service there, LLM

integration somewhere else. It's a mess.

And that's where the sponsor of this

video comes in. Raven DB is a NoSQL

document database known for staying out

of your way in production. No

babysitting, no band-aids, no weird 3:00

a.m. alerts. It includes AI features

built directly into the database. Vector

search embedding generation, genai

workflows, even AI agent creation, all

native to the data layer. You're not

adding external services or managing

sync jobs. You can issue vector queries

and structured filters in the same

database call. Raven DB provides

integrations with OpenAI, Azure OpenAI

and OAMA. So agents can generate and

retrieve embeddings, summarize content

or trigger actions. Check out the link

in the description to get started. And

now let's discover how agents work.

Let's start with a simple scenario.

Imagine you're using cloud code or

cursor or any other coding agent and you

type this intent. What happens when you

hit enter? You get a posgrrisql database

in AWS. That's the promise, right? But

how the hell does that actually work?

What's happening behind the scenes to

turn that simple request into reality?

Now, before we dive in, quick

terminology here. When people say AI

agent or just agent, they usually mean

the whole damn thing, the entire system.

I'm going to be more precise here. I

will call that complete system an

agentic system. And when I say agent, I

mean specifically the orchestration

layer sitting between you and the LLM.

Now keep that distinction in mind. It is

important. So let's start by defining

the key players. First the user that's

you. You provide intent whether it's a

task request or a question. You get

responses back and you can clarify when

needed. Next, the agent. It sits between

you and the LLM acting as an

intermediary. Finally, the LLM, the

large language model. This is the brain

doing the reasoning and decision making.

Think GP5 cloud models like that. It

receives context from the agent and

generates responses. Those are basic

definitions. As we move through

increasingly complex architectures, we

will expand on what each of those actors

actually does. The simplest possible

architecture would have the user, you

sending intent to an agent, which passes

that intent to the LLM, which responds

back through the same chain. If anyone

actually built systems that way, it was

long time ago. It's way too simplistic.

We need more, much more. So, what's

missing? Let's talk about system

prompts. A system prompt is the

foundational instruction that defines

how an LLM behaves. It sets the role

the behavior, the capabilities. Agent

developers set these prompts. They're

completely invisible to you as a user.

Here's an example. Always explain your

reasoning step by step before providing

answers. It's short, but it completely

changes how the LLM responds. Now, with

the system prompts in the mix, we get a

more sophisticated setup. You send

intent to the agent. The agent sends

context to the LLM, not just raw

content. That context includes the

system prompt plus your intent. The LLM

generates a response that goes back to

the agent, then to you. This

architecture user to agent to LLM with

context is essentially what you get with

something like JGP. It's mainly for

answering questions for having

conversations, not much more. So what

exactly is context? Context is

everything sent to the LLM in a single

request. It includes the system prompt

conversation history, your intent, and

any other information the LM needs to

generate a response. Is the complete

picture the LLM sees. So that covers the

basics, the key players and how they

communicate. But this architecture still

cannot do much beyond answering

questions to actually accomplish tasks.

We need something more.

Even with system prompts and context

agents cannot do real work. They can't.

They need tools. Tools are functions the

agent can execute on behalf of the LLM.

things like reading files, executing

bash commands, searching the web

editing code. The LLM requests tools

then the agent executes them, and the

results go back into the context. So

how does the LLM know which tools to

execute? Huh? What do you think? Tool

descriptions are included in the

context. The agent sends tool names

parameters, descriptions, everything the

LLM needs to know what is available.

Then the LLM analyzes your intent, looks

at available tools, then requests

specific tools with specific parameters.

Agent executes those requests and

returns results. So with tools in the

picture, the roles expand. The user now

also reviews and approves actions when

needed. That's very important. When

tools are making actual changes, the

agent becomes an intermediary between

user, LLM, and tools. It executes tools

requested by the LLM. Crucially, it

never makes decisions. Agent never makes

decisions. It only facilitates

communication and execution. And the

LLM, well, it now generates responses

and requests tool execution, but it

never executes tools directly. It only

requests them. So now we have a user

sending intent to the agent. The agent

sends context which includes system

prompt, intent and tool descriptions to

the LLM. The LLM now knows that tools

are available. It just knows that they

are there and it generates a response

back to the agent and then to you. But

wait, how can the LLM instruct the agent

to execute tools if all it does is

generate a response that gets sent back

to you? If the LLM just responds once

how does it even trigger tool execution?

And the answer to that one is loops. An

agent loop is the cycle where agent

sends context to the LLM. The LLM

responds with either tool requests or

final answer and then the agent executes

those tools and results get added to the

context and the process repeats until

the LM provides the final response. With

loops, the agent's role expands again.

It still does everything we mentioned

but now it also manages the agentic

loop. plan, act, observe, repeat, and it

builds and maintains the context that

gets sent to the LLM with each

iteration. The LM's role also gets a

crucial clarification.

It's stateless. It has no memory between

API calls. Every time the agent sends

context to the LM, it's a fresh request.

The LM only knows what's in that

context, nothing more. So, here's the

complete flow. You send intent to the

agent. The agent sends context to the

LLM. That's the system prompt, your

intent, and tool descriptions. The LLM

can either request tools or provide the

final answer. If it requests tools, the

agent executes them, gets results, and

loops back, sending updated context to

the LLM again. This repeats until the

LLM has enough information and sends the

final answer back to the agent which

sends it to you. So that's the core

architecture. Agent manages the loop. LM

provides the reasoning. Tools do the

work. But there are practical

considerations when you scale this up.

Agents don't just have built-in tools.

They can also integrate external tools

through MCP. So MCP or model context

protocol is a standard protocol for

connecting external tools to agents in

addition to those built-in tools. It

allows dynamic tool integration without

modifying the agent itself. So think

database connectors or API clients or

custom business logic. You can plug

those in and the LM gets access to them

just like uh it gets access to built-in

tools. So the architecture stays the

same, but now the agent sends context to

the LM and that context includes

descriptions of both built-in tools and

MCP tools. The LM can request either

built-in tools or MCP tools. The agent

executes them, gets results, and loops

back with updated context. Now, there's

a practical limit to all this. Context

size. Context size is the maximum

information that can be sent to an LLM

in a single request. It's measured in

tokens which is basic unit of text

processing. Let's say a token is roughly

a word or part of a word. Different

models have different context limits and

those limits change over time. When you

exceed that limit, the context must be

compacted. So what is it? What is

context compacting? Well, it's the

process of reducing context size when

limits are hit. There are several

methods though. You can summarize all

conversation turns. You can remove less

relevant tool outputs. You can truncate

file contents. The key nevertheless is

to keep the system prompt and recent

messages intact. Those are critical. The

trade-off is obvious. Reduced context

means the LLM has less information to

work with. Less information means

potentially

worse decisions.

So let's bring this all together. Agents

like cloud code cursor and others are

orchestrators. They sit between your

intent, the available tools and the LLM.

That's their role coordination.

You provide the intent like create a

posgressql database on AWS and that's

where it starts. The LLM provides the

reasoning and instructions. It figures

out what needs to happen and tells the

agent which tools to execute like read

this file, run that command, search for

this information. The LM never touches

those tools directly. It just requests

them. The agent on the other hand

executes everything. It runs the tools.

It gathers the results. It builds the

context and it sends it back to the LLM.

It manages the loop until the LLM has

everything it needs to fulfill your

intent. And here's the key thing to

understand. The agent is the only dumb

actor in this system. It doesn't think

it doesn't decide, it only does what

it's told, either by you or by the LM.

It's pure execution. No intelligence.

The LM provides the brains, you provide

the intent, and the agent makes it

happen. It's a dance between three

components. you, the agent, and the LM.

With the agent as the coordinator making

it all work without the agent, the LM is

just answering questions. With the

agent, it can actually do things. Thank

you for watching. See you in the next

one. Cheers.

AI Agent Architecture Explained: LLMs, Context & Tool Execution

DevOps & AI Toolkit

6 days ago

13:22

Agentic AI Systems

Rank #1

Description

You type "Create a PostgreSQL database in AWS" into Claude Code or Cursor, and it just works. But how? Most people think the AI does everything, but that's wrong. The AI can't touch your files or run commands on its own. This video breaks down the real architecture behind AI coding agents, explaining the three key players that make it all work: you (providing intent), the agent (the orchestrator), and the LLM (the reasoning brain). Understanding this matters if you're using these tools every day. We'll walk through increasingly sophisticated architectures, from basic system prompts to the complete agent loop that enables real work. You'll learn how tools get executed, what context really means, how the agent manages the loop between you and the LLM, and why the LLM is stateless. We'll also cover practical considerations like MCP (Model Context Protocol) for integrating external tools and context limits that affect performance. By the end, you'll understand that the agent is actually the only "dumb" actor in the system—it's pure execution with no intelligence. The LLM provides the brains, you provide the intent, and the agent coordinates everything to make it happen. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: RavenDB 🔗 https://ravendb.net ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ #AIAgents #LLM #HowAIWorks Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join ▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-agent-architecture-explained-llms,-context--tool-execution ▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below). ▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/ ▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox ▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI Agents Explained 01:02 RavenDB (sponsor) 02:16 How Do Agents Work? 05:42 How AI Agent Loops Work? 09:36 MCP (Model Context Protocol) & Context Limits 11:38 AI Agents Explained: Key Takeaways

Video Details

Category

Agentic AI Systems

Featured Date

November 11, 2025

Quality Rank

#1

AI Recommended