Scale and Orchestrate Multi-Agent Systems Effortlessly (APAC) | DailyDevLists

Loading video player...

Full Transcript

8,721 words • EN

Hey everyone, thank you for joining us

for the December session of the AI apps

and agent dev days series. My name is

Anna. I'll be your producer for this

session. I'm an event planner for

Reactor joining you from Redmond,

Washington.

Before we start, I do have some quick

housekeeping. Please take a moment to

read our code of conduct.

We seek to provide a respectful

environment for both our audience and

presenters.

While we absolutely encourage engagement

in the chat, we ask that you please be

mindful of your commentary, remain

professional and on topic.

Keep an eye on that chat. We'll be

dropping helpful links and checking for

questions for our presenters to answer

live.

Our session is being recorded. It will

be available to view on demand right

here on the Reactor channel.

With that, I'd love to turn it over to

our speakers for today. Thanks so much

for joining.

>> All right. Thank you very much, Anna.

Welcome everyone to the third episode of

the AI apps and agents dev day series.

Um, got lots of really cool stuff we're

going to be looking at today. Really

focused on the scaling and orchestration

of AI agents. So before we get started,

uh my name is Stephen McCulla and I'm an

AI solutions architect with NVIDIA. So

that means that I get to work with

Microsoft on implementing all of the

latest and greatest AI technology into

Azure. Um and I am joined today by

Anthony Shaw.

>> Hi everybody. Uh yeah, my name is

Anthony. Great to be here again and and

yeah, present some stuff with you this

morning, Stephen. I'm calling in from

Sydney in Australia and yeah I lead the

Python advocacy team at Microsoft.

>> Yeah, thank you Anthony. So a bit about

this program u Microsoft and Nvidia are

super close partners and they work

together to power the next generation of

AI through deep integration of our

technology into Azure. Um so this

webinar series is part of that

integration. We want to show users like

you how you can best leverage all of

this amazing technology that's coming

out. Um, so definitely make sure that

you tune in to the series and explore

all of these great integrations that we

have to show. So without further ado,

let's go jump into it. So today we'll be

covering a couple different things. Um,

the first is sort of a recap on exactly

what AI agents are and how they work. Of

course, that's going to be sort of the

core topic of this episode. So, we want

to make sure we're going into this with

a robust and refreshed understanding of

it. Um, next we'll go into Microsoft

Agent Framework, which is essentially um

a tool you can use to orchestrate and

tune and build your agent uh agentic

workflows

um on Microsoft's tool and on Azure.

[snorts] Uh, next we'll go into Nvidia

AI blueprints, which are essentially

reference recipes and architectures that

you can deploy yourself or you can use

to customize and fine-tune for your own

purposes. I'll show you a custom example

of the AI model distillation blueprint

from NVIDIA. And then Anthony is going

to take us through some real world

integration examples showing how you can

integrate your agentic workflows into

your app in into your uh customerf

facing apps as well as background batch

processing.

All right. So

first thing AI agents right let's have a

quick recap. So what are they? How do

they work? Right. Well, I I imagine that

if you're attending this webinar, you

are at least a little bit familiar with

large language models or LLMs like

ChatGpt or or Neotron or DeepSeek. U but

what exactly is an agent, right? Where's

the difference between LLMs and agents?

Well, you can think of an agent as an

LLM plus memory plus reasoning and

tools. Um many LLMs nowadays have native

reasoning and tool calling support built

in. So once we add that memory layer on

top um that's managed by agentic

software

um like Microsoft agent framework we can

turn that LLM into an agent. So you if

an agent is essentially an LLM that has

tool calling, reasoning capabilities and

memory capabilities. So those three real

functional capabilities that we add into

the LLM are open up a whole world of

possibilities um and allow us to do

amazing things with our LLMs and with

our agents. Um so from a high level

right what can this agent do um with the

tool calling essentially it's calling um

external code so you you can pass in

let's say a Python function that updates

your LinkedIn or or a you know

JavaScript function that will um create

a web page for you. Um the possibilities

for the tool calling within agents is

you know really like your imagination is

the limit. Um so there's lots of amazing

things that you can do. Um, and whenever

we think about the reasoning, right,

that's essentially the recursive

prompting that's happening inside the

LLM that allows the LLM to become more

intelligent and have much more

intelligent uh, reasoned outputs. So,

um, agents are capable of incredible

things whenever we add in this reasoning

and tool calling capability. So whenever

we think about agentic workflows, that's

essentially putting multiple agents

together to work to combine together and

accomplish some specific task that we

assign it. So a common question um that

we get is you know why would we do why

would we build this agentic workflow and

introduce this complexity instead of

having let's say some uber intelligent

agent some uber intelligent single model

to do all of this work right um well the

reason for that is is specialization

um there's really not a single AI model

out there that is the best at

everything, right? We haven't hit

artificial super intelligence just yet.

Um, so we need these specialized

fine-tuned models to accomplish

individual tasks that they were really

trained for. Um, so for example, some AI

models like Neotron parse are really

specialized for document analysis. Um,

some models like GPT5 or uh Neotron are

more uh chat agents with coding

capabilities. Um, some models are great

for audio processing. Um, so, you know,

most of us probably have experience with

these chat models, but there's really a

whole world of different LLMs out there

that are meant to accomplish particular

tasks. So whenever we use them together

we can really get the best of all

worlds. We can get the best document

processing capability, the best um audio

visual capabilities, the best chat

capabilities all together into one

system. Um so for example you can think

about let's say an AI system that might

be used by doctors during a medical

diagnosis. Um if that doctor brings in

let's say their phone connected to this

backend AI system um that backend AI

system needs to have the capability to

um transcribe and understand whatever

the doctor and the patients are saying

in this diagnosis room. Um, that

workflow also needs to uh have the

ability to look up past medical history

of the patient and it also needs to have

the ability to do deep research into the

symptoms that this patient is listing um

and associated uh diseases that this

patient might have. Um so all of those

capabilities are really important for

this hypothetical uh you know medical

agentic workflow system and rather than

having one you know super intelligent

model try and do all of these different

things at the same time. Um we see much

better results whenever we have an

agentic workflow comprising multiple

models all working together to achieve

that task. So now that we know about

agents and agentic workflows, um the big

question is okay, how do we how do we

run these agents? How do we build these

workflows um in a effective efficient

way? And that's where Anthony is going

to jump in and tell us about Microsoft

Agent Framework. Take it away.

>> Thanks, Stephen. Yeah. So, Microsoft

agent framework is something that we

released in preview a couple of months

ago

um and at the Ignite conference uh just

last month. Um there is several

workshops and showcasing of the

Microsoft agent framework. So, agent

framework um supersedes two other

projects, one called autogen and one

called semantic kernel. um is designed

as an open agent framework that works on

the Microsoft stack and also others as

well and I'll cover that in a bit. Um

the main thing is the agent framework is

a uh codebased

um agent builder. So you know you would

either use Python or C.NET to develop

these agents. Um and I guess the

question is like there are drag and drop

editors for agents. So why would you

write it in code? I think the main thing

is that you've got absolute control uh

and the ability to customize everything

in the agent. So, you know, like Stephen

says, an agent basically is a

combination of a uh an LLM like a model

with memory tool calling and reasoning.

Um but a lot of the time as well, you

want to integrate other things like

you've got MCP calls, you've got

specialized APIs you want to call into.

um you want to kind of have a lot of

control as to what comes in and what

comes out of the um each individual

agent in your workflow. So agent

framework gives you that ability because

you you're writing Python so you have

full control. Um it's a open source

product as well. So that is MIT

licensed. Um so for any companies who

you know are concerned about having

proprietary code or anything like that

we we have open license step. Um another

great thing about it which I'll

demonstrate in at the end of the slides

is that you can run it locally and you

can experiment locally. Um so you can

even run it with local models. So if

you're running um uh some models locally

on your machine um or you want to

experiment with something like GitHub

models which is free uh you can do that

and then when you're happy with how the

agent works um you can get it running

running on uh on Microsoft Foundry. And

then the other thing as well is that it

is an open framework. Um obviously it

does support uh Azure and Foundry but it

actually supports other clouds as well

and I'll demonstrate that uh later. Next

slide.

So agent framework is is one option. Um

but we have several different ways of

building agents on on our platform. Um

if you have your own framework or you've

got another one that you prefer to use

and you just want to run those

applications on Azure. Uh we have

infrastructure as a service. So you can

run that as either a container which is

kind of the [clears throat] modern way

of doing it. And the way I would

recommend is that the agents are run in

container environments and you would

deploy those onto something like Azure

Kubernetes service or Azure container

apps. Then that would be the option. So

you can basically take any open-source

uh agent orchestration framework and you

can run that on Azure. Um we have full

support for lang for example. If you go

on the lang chain documentation, there's

a Microsoft section in there that lists

how you can integrate Langchain into

every uh Azure service that would be

used for agents uh with Langchain. Um

any other tool and framework as well

would support um the models on Azure or

can be run on Azure. So the team that I

work on does a lot of work with those

open source projects to make sure that

we're well supported in the different

frameworks. Um if you prefer something

where um the the platform is doing more

of the management for you. So that's

sort of platform as a service. Um AI

foundry and agent service does that for

you. So you can just do a sort of click

ops based agent. So you're just giving

it a prompt giving it a model and you

just basically create an API endpoint

and call it from there. You don't need

to write any code. The pass option is

foundry and agent service. So you can

build agents just fully in foundry

without writing any code. Um, and you

can plug in knowledge sources, you can

implement basic memory patterns and

things like that. Um, and you can also

integrate tools with MTP servers as

well. So, yeah, I'd recommend checking

that out if you don't have a dev team.

Um, and then sort of our sort of fully

SAS option is Copilot Studio, whereas uh

you're really with Copilot Studio,

you're mostly looking at building agents

for like users within your company. Um

or if you're a uh a vendor that actually

sells that sort of thing to other Office

365 uh users, you can use Copilot Studio

to do that. So you're building an agent

that's used for uh like employees and

internal users um to do specific tools

and capabilities.

And next slide.

So agent framework is our open source

engine for building and orchestrating

intelligent AI agents. Uh like I

mentioned it is built on open standards.

Um it's designed to be interoperable

with uh different models, different uh

storage for memory, different backends.

Uh we're doing a lot of work to make

sure the agent framework is something

that doesn't have any lock in. Um you're

really kind of building on an open base.

uh and then you can integrate that into

any different model, any different cloud

and any different backend.

Um we also have a team at Microsoft

research that's looking into like the

cutting edge ways of running agents and

feeding some of those insights into

agent framework. So anyone who's

interested in AI research and a dentic

research uh you'll see a lot of papers

coming out from the Microsoft research

team and some of those ideas and

technologies are being built into agent

framework. Uh also it's ready for

production which is something I'll

explain what that means later. Um but

you know just because the scale of

Microsoft and the types of customers

that we have you know have very high

expectations as to what uh production

application looks like. Um, we put a lot

of thought into things like security,

into telemetry, into deployment models,

how to make it resilient, things like

that. So, yeah, next slide.

Cool. So, if you're building, so for a

single agent, an agent has capabilities.

Uh, it has, I think, sorry, skipped a

slider.

>> Yeah, sorry about that. That was my bad,

Anthony. There you go.

>> Okay, cool. So a single agent has a

model, memory, tool capabilities. Um,

and often like Stephen said, you want it

to have like a single purpose. So you

basically drawing boundaries. This agent

does this task and it specializes in

that specifically. Um, with agent

framework, you can build different types

of workflows. Uh, where I would

recommend everybody starts is a

sequential workflow where one agent

hands off to the next. So you have a

basic agent which has a purpose. You

give it that purpose in its prompt. Uh

you give it the tools that can only help

it do that one thing, nothing else. Um

and then once it completes its tasks, it

gives its output to the next agent. So

when you're designing these things, um

if you think you need multiple agent

workflow, and often actually you

possibly don't, um modern models

can handle multiple things at once. uh

you can give them 20 different tools and

you can give them a fairly complicated

list of instructions and they can call

those tools and just complete a single

task as a single agent. Uh if you feel

like you definitely want more control

over what each agent can and can't do,

uh you can put them in a sequence as a

sequential workflow. Uh because agents

can take a few seconds, sometimes a few

minutes to complete depending on how big

the task is. Um often you also want to

have that concurrent. So just instead of

them running as a sequence, you can kind

of initialize a a workflow, send it to

multiple uh agents to all work on the

same thing at the same time and then you

can consolidate those results back. We

call that a fan in fan out pattern and I

I'll give you one example of that later.

We also have handoff where the agents

can actually um you fan out to different

agents but they can also send

information to each other. And then the

ones after that, the bottom three in

this graph are really more advanced. And

I'd recommend only exploring those once

you've done the top three. [laughter]

I've met a lot of people that have got

really excited about the idea of having

uh multiple agents where they're all

talking to each other and coordinating

and things like that. Like in in theory

sounds like a brilliant idea. And then

when you actually test it out in

practice, um they can spend, you know,

several minutes running and having

conversations with each other getting

confused. Um or you can use just a huge

amount of tokens um in the conversation.

And so I strongly recommend starting off

with a sequential workflow. If what you

think could be sped up a bit, then look

at a concurrent, so fan out or fan in.

and then really really only if the

problem that you have needs um like a

group chat or magentic pattern then

start to explore that. But I'd only

recommend doing that once you've

explored the other options first. I

wouldn't suggest going straight into

that um because you really need to have

a lot of knowledge about developing

prompts and how to um kind of refine the

prompts and the tools down so the agents

don't get stuck in an infinite loop.

Next,

next slide. Yeah.

So, yeah, an agent without tools is not

an agent. Um, it is really just another

another chat interface and you're just

talking with a model. So, the whole

point of this agent framework is that

you give it capabilities to actually go

and do things or retrieve data to make

decisions. And so what we've done with

agent framework is we've um enabled it

so it can talk to anything that uses one

of the open standards. Uh so A to A uh

agent to agent MCP or whilst it's not a

standard um the open the open a sorry I

could confuse the open AI spec uh is not

a standard but um it's actually becoming

like the the sort of standard API

interface for most models these days. Uh

so we support anything which has like an

open AI interface. Uh and then we also

support a huge number of different

backends for storage retrieval

um for function execution

um and even different clouds as well. So

Amazon Bedrock would probably be one

name that jumps out there. Uh yes, it is

a competitive platform um but we it's

something we can also integrate into. So

like I said before, we're not really

designing something that's designed for

lockin. We just want to make an open

framework um that works really well on

Azure but also can be used on other

platforms.

Next slide.

Yeah. So with agents um there's a lot of

information you can provide in the

context or in the prompt. Um, but

ideally, especially if it's a chat

agent, uh, and a user's had a

conversation with it and they start

another thread, they don't want to have

to remind that agent about things that

they previously talked about. Um, so

what we kind of have built into this is

the idea of a memory or a conversation

state. Uh, where we've got a process

which kind of like summarizes a

conversation and then stores it in a

memory pool and that memory pool can

then be retrieved and injected back into

the agent for another thread. So if you

have a conversation thread with it with

an agent which is the chat agent and you

talk about your plans what you were

working on and then you start another

thread with that agent uh it's basically

can have a longerterm memory where it's

remembered specific fact um and also

it's summarized previous conversations

so it's actually aware of decisions that

have been made before so you don't have

to explain everything all over again. Um

that pattern is an open uh

implementation. the example we're going

to use the code example um just uses it

in memory so it just stores the the

memory in memory so just in RAM um but

you could drop it in Reddus for example

you could drop it in a structured SQL

database that's just the standard API so

depending on where you want to put that

information obviously security is like a

massive concern there because if it's

got information that the user's given it

about uh what their plans are what

they're thinking out um whatever it is,

then you know you want to store that

somewhere

um where it's is secure and only

available to that specific user. And so

we offer that as a like an open spec

that you can implement. So if you got

your own proprietary database, you can

lock it into that. Um or we've got

example implementations with um with SQL

databases. Next slide.

Cool. Uh yeah and then lastly on agent

framework this is designed to be

enterprise ready. So when you run a

workflow in agent framework um you can

enable tracing all that tracing is then

available either locally you can see the

full traces in Debui um or if you

connected to application insights you

can see uh the full trace for every

conversation and every agent uh and you

can see exactly what decisions it made

and why which tools it called with which

parameters how long it took if there

were any errors. So we've got basically

got a full audit log of everything it

was doing. Uh that again is built on

open standards that's built on open

telemetry. So if you have a different

tracing provider that supports open

telemetry, you can also export it there

as well. Cool.

So, I'm going to give you a quick uh

demo before we jump in

um and how to build an agent. And to do

this, uh I'm going to use an extension

for VS Code called AI toolkit. Uh this

one is if you go and look in VS Code and

go to the extension store and search for

AI toolkit, you'll see this one here, AI

toolkit for Visual Studio Code. Uh,

that's the extension you want to

install.

Um, and once you've got that extension,

you'll see that there is a AI toolkit

menu. And once you open that up, you go

to agent builder. Agent builder is kind

of just a way of getting you started

building your first agent. So what you

do is you connect in the different

models. Uh so in my case I've got uh

I've connected to my GitHub account so I

can run any model that's on GitHub. I've

also got it connected to Foundry. So

I've got GPD5 mini running there. And

I've also got um five for reasoning

running on my laptop.

Um and I can actually run the models

directly on uh your machine. So if

you're running uh Foundry local, you can

run Foundry local models and just

develop the agent locally. You don't

need a uh cloud provided model. So once

you've done that, you give your agent a

name. You pick a model. Um it also works

with OAM. So if you can connect it to

any local model, you can do that. You

give it your instructions.

Um you would set up uh then any tools.

So tools are things that it would

connect into. Uh these could be custom

tools. So is a piece of code that you've

connected it to or an MCP server. So in

my case, I've got two MCP servers um

which are related to my application.

I've already got them running on my

machine. So I'm running them in here. Uh

this is my architecture. I've got a API

and I've got two MCP servers. One is the

finance MCP and the other one is the

supplier MCP. And these are connected to

proprietary databases. This is all

custom custom code but I these are MCP

servers that are running locally on my

machine and I can just design my agent

by giving it some instructions. Um and

then in the playground I can give it a

prompt and it will run the agent locally

and then it will if it does decide to

call MCP tools I can see which tool did

it call and with what parameters and

then what was the output. So for example

uh in this one we're analyzing the

performance of products in this

fictional store and I wanted to

understand the top selling products uh

for a specific store over the last few

weeks and I wanted to use the MCP tools

to do that. So basically, it's going to

retrieve some information. And so in my

conversation, I said, I'll look up store

number one uh for the last three months,

and then I want you to use the tools to

look at revenue, units sold, and skew.

Um, so it's going to use GPT5 mini to do

that. And I can see that it's actually

run the agent has called out to my MCP.

It's fetched some information about the

the products and it's given me back uh

that result as JSON. So I could ask it

for a text summary instead, but in my

particular case, I wanted this back as

structured data. So my agent's actually

retrieving information about um like

which are the highest performing

products uh that I that I want to use.

When I'm happy with the agent

uh in in the agent builder, I can go

down to the bottom and do view code. And

then I can pick Microsoft agent

framework and Python. And it will

generate um the Python code to basically

um run my agent. So um if I want to turn

this into an agent framework script, um

I can use this as a starting point. So

it's basically generated all the code I

would need. uh and I can look at this

and customize it and then start to

integrate that into my app. So yeah,

that's kind of how you would get started

with agent framework. You can either go

jump straight into the code um or what

I'd recommend is getting AI toolkit um

to connect it to some models, give it

some instructions, give it some tools,

and then just kind of experiment with

the prompts until you're happy with it.

>> Yeah. Back to you, Stephen.

>> All right. Thank you, Anthony. That was

awesome. Um yeah, so Anthony did a great

job showing us how we can create these

agentic workflows uh using Microsoft

agent framework. Um but if you're

anything like me, you know, the the

question is how do I get started, right?

Where do I look what what can I

reference if I want to build some sort

of agentic workflow for either my

personal experiments or for for my job?

Um, the answer to that is NVIDIA AI

blueprints. So, a these NVIDIA AI

blueprints are essentially reference

architectures, reference workflows that

you can deploy yourself. Um, and they're

all open source, so you can go in and

tweak and edit them to deploy them

however you would like. Um, so for

example, if I wanted to take the rag

blueprint and plug in my own models that

I wanted to, um, it really wouldn't be

difficult to do that. Um, I I'll show

you the GitHub repo in a second. Um, but

it's all open source, so very easy to

customize for your own deployments. And

the way that these blueprints work is

that uh there are three foundational

blueprints that Nvidia has created. The

first is agentic uh AIQ blueprint. The

second is the rag blueprint and the

third is the data flywheel blueprint.

And then on top of these blueprints,

we've created other blueprints that are

more specific to particular industries.

So, for example, a healthc care

blueprint that's focused on rag or a

dataf flywheel blueprint that's focused

on uh the financial services industry.

And if you go to um

build.envidia.com/bloopprints,

um I don't know if my screen is too

small to see here. I'll zoom in a bit.

Um but you'll be able to see all of the

different blueprints that we have

available. So for example um AI model

distillation for financial data um also

the um AI observability data flywheel

um

one that we'll be showing next January

as uh maybe this will be a bit of a

teaser is the Agentic AIQ blueprint that

I mentioned. So uh very powerful

blueprint. I recommend you check out

that episode when it comes in January.

Um, and yeah, we have lots of blueprints

for all different industries and use

cases. So, I mean, if you're looking to

develop and deploy your own agentic

workflow for whatever industry you're

in, there's a pretty good chance that

Nvidia has a blueprint that is either,

you know, maybe just what you're looking

for or pretty close. And since it's open

source and really made to be

configurable, you can go and configure

it and deploy it yourself. Um, so it's

very easy to get started here. So if I

go back to let's say this model

distillation for financial data, um, if

I want to view the source code on

GitHub, um, I can just go here and it

will take me to the Jupiter notebook

that contains all of the source code

associated with this workflow.

Um, and in this workflow, it has the

entire orchestration that you see here.

So, I'll zoom in a bit, but um,

yeah, it has this entire orchestration

all within this Jupyter notebook. So,

all of the databases, all of the, uh,

models, all of the nims, um, and the

surrounding orchestration software is

all built into this Jupyter notebook. So

really really easy to uh just get set up

and get going really quickly. And this

particular blueprint is focused on model

distillation. So essentially where

you're taking a larger more intelligent

model and using that to uh fine-tune a

smaller model uh for a particular

purpose. So

um let's say um we have a 50 billion

parameter model but and we want to train

a 1 billion or three billion parameter

model for a very specific task like uh

summarizing uh stock data on the public

stock market. Um that's a great example

uh for where you might use this uh

blueprint. So this uh notebook takes you

through the entire process of how you

can get up and running and you really

just you know click uh to um complete

all of these different code sections and

it just does the entire thing for you.

So it's really just awesome and it tells

you all of the requirements you would

need

for any individual blueprint. So, for

example, for this one, you would need

two Nvidia GPUs. Um, A100, H100, or if

you somehow have a B200, you know,

you're a very lucky person. Um, and so,

yeah, just two two of those GPUs and

then, um, you can go ahead and deploy

this workflow. So, uh, really easy to

get up and running with these

blueprints. Um, and it's just a great

way to like kind of get your feet um,

uh, or you know, get started, get your

hands dirty, get your foot in the door,

um, with a gentic orchestration.

So, let's take a look into, you know,

how these um, different blueprints work,

right? So, for example, I talked to you

about the um model distillation

workflow. And in this case, we're using

the Llama 3.370B,

of course, a 70 billion parameter model

to fine-tune these smaller models. So,

everything from a 1 billion parameter

model to 49 billion parameter. Um, and

they're being fine-tuned on uh financial

data. So everything from, you know, uh,

financial news that you might find in,

let's say, um, the Financial Times or

Yahoo Finance. Um, that's what it's

using to fine-tune this model.

Um, so let's look a bit more under the

hood at what's going on inside these

blueprints. So what you'll see a lot of

in these blueprints is Nvidia's own

open-source homegrown a LLM. Um, so the

latest one actually just came out

yesterday is the Neotron 3 Nano. Um, and

these are all under the Neotron AI

family. So the Ne Neotron 3 Nano is the

uh smaller member of the Neotron 3

family. Um, and it's a very powerful

LLM. um really seen great results um in

our internal benchmarks and great

results over the past two days after its

release. Uh we also have the Llama

Nemotron Super and Llama Nemotron Ultra

um that are just sort of going up that

chain of size as an and intelligence.

Um, also for vision language models,

information retrieval and content

safety. Like I mentioned earlier, um,

there's no one super like uber

intelligent model that can do all of

this stuff very well. Um, so Neotron

aims to really provide all of the models

that you would need um, in an in a fully

open-source method um, so you can run

the agentic workflows that you need.

So there's still the question of okay

now we have the model but how do we run

it? Um, and there's

something called Nvidia Nim, which is

essentially a Docker container that

comprises your model and different

observability microservices, different

optimization uh, inference optimization

microservices as well. And it's all

built into a single Docker container.

And it makes it really easy for your to

deploy and run an AI model. So, if I go

back to um my browser and I go to

catalog.ng.envidia.com,

it will take you to essentially like the

the home base for lots of NVIDIA

software. If you go on containers and

scroll down, you'll see NVIDIA NIM.

And from here, you'll see all of the

different NIMS that we have. So uh for

example there is the um the meta llama

270b

chat model um so that's a container like

a docker container with the metal llama

2 chat model baked inside and optimized

and included and baked in with other um

observability microservices.

Um, we also have I'll show you the

Neatron 3 one that just came out.

[snorts] So, if you want to go ahead

and, you know, try these models out for

yourself, this is a really great way to

get started. You really, I mean, you

just run like docker run and then the

container address here. Um, and it it

spins it up for you. The only thing

you'll need is an NGC account. Um, so if

you come back to this website and create

an account and get an API key and

include that in your Docker run command,

um, you'll be able to spin it up, you

know, very easily.

So, lots of great stuff going on. NVIDIA

really makes it easy for you to get

started and get up and running u with

your AI models and your agentic

workflows. So, now that we know how to

get up and running and um get set up

with our agents and our models and our

workflows, um Anthony is going to show

us how we can integrate these workflows

into real world userfacing applications.

>> Yeah, great. Thanks, Steven. So, I'm

going to kind of lead on from where we

finished before, which is that I showed

you how you can use the agent builder to

experiment building an agent. Um, then

what you can do is you can take the code

that it generated. In my case, it's

going to be Python. Um, but you can also

use C#.NET as well. Uh, and I basically

developed a few different agents. Um and

these agents are designed to provide

some like weekly insights to store

managers and this sort of fictional

store app that we have. Uh the code for

this application I'll share at the end

of the presentation as well. Uh if you

want to download this and have a look to

see how it works. So before that I'm

kind of building one agent here but I've

actually built sever several agents in

this particular case and I've given them

names. So we've got um one which looks

at the sort of weather forecast for that

region, one which looks at events uh and

one that's the one we were kind of

building before which is looking at top

selling products. So we've got those

three different agents. Then what we do

in agent framework is we have a workflow

builder and then we give it a name and a

description. Um and then we say what's

the starting point. So that's the data

collector. So that's the one you

initially give the instructions to. And

then we add the fan out edges. So we say

okay a data collector goes to a weather

an events and a top selling products.

And then we have fan in. So those things

come back. So these three agents then

consolidate back into an insight uh

synthesizer which is fun to say. Um but

the synthesizer agent basically is the

one that takes the results from all the

other agents and then turns it into like

a single response. Um once I've built

this agent then um I kind of just wrap

it in a Python function. Uh what you can

do in agent framework actually which is

pretty cool is you can run this thing

called devi um which lets you visualize

uh the workflows that you've made. So

the one I was just showing you before

the code for that you can actually

visualize that directly in a browser and

then you can also just run it directly.

So instead of having to write scripts

and stuff to test this out, you can just

iterate on it and run it in the browser

and then any events that happen like

whilst that's running um will pop up and

be shown on the right hand side. So you

get all the traces and you get um tool

calls and anything like that um would be

shown on the right. So what we have

basically with this weekly insights is

we want the ability to run the workflow

uh see what's happening uh and then get

that uh back directly in here as well.

Um something else we can do is in the

foundry agent uh we've got a local

visualizer. Um so instead of uh sorry in

the foundry extension there's a local

visualizer. So if you wanted to see this

information uh bit but in VS code you

can use that feature in the extension as

well. So what I've done is then taken

this this code uh this is the workflow

that we've built this workflow object in

Python and we can actually just run that

workflow from code directly. So sure we

can run it in um Dev UI and we can

experiment and stuff like that but I

actually want to integrate this into the

application because I think that's a lot

more realistic as like an example. Um

and when we're building this application

I kind of gave a glimpse before and

someone in the chat asked for some more

information on this. This is Aspire uh

Aspire 13 uh which was shipped uh yeah

October 31st um actually supports Python

as well now. So um what I've done is

basically defined my application

architecture in Aspire and I'm running

this all locally and so I've got uh an

API and the API then can talk to the

different MCP servers and I've got a web

front end as well. So my shop is running

as a as an app. So I can click on uh the

link which again the table I can click

on the front end and I can see the the

shop. Um, and this is actually running a

Python API. Um, I've got the agents and

the workflows that I defined before in

DevUI. I've imported those into my

application. Uh, and if I log in as like

a manager, then I can see and it can run

the inside. So, this workflow is now the

one that was running. That's not working

today. Uh, we were migrating some stuff

yesterday. That's why it's broken. Um,

but I can see all of that information.

And then in Aspire, uh, I can actually

go and get logs and stuff like that and

I can see what's happening. I can go and

look at that trace and understand

exactly why it broke. Um, so I use

Aspire because uh, this is just a really

nice way of me defining an application

that has multiple components. So I've

got the MCPs, I've got the API that

hosts all my agent framework application

um, workflows. Uh we've deployed a

production version of this as well which

is up here. So in our app I've logged in

as a store manager uh and it's actually

run that workflow uh that I was talking

about before and it's given me a nice

like integrated output. So this is an

example of an agent that's not a like a

typical chat agent where you've got to

say you've got to start conversation

with it and say please can you tell me

about important things happening this

week. The way we designed this one is

actually that the uh the weather, the

events, and the top selling products

would be run for a particular store. So

in this case, we've got um the pop-up

New York Times Square store. And so

we've run the workflow and said, "Okay,

I want you to only look at um

it's fiction. this store doesn't exist,

but in a fictional scenario, like are

there any major events happening in

Times Square this week that a store

manager might need to be aware of? So,

we're going to search the internet and

see if there's any like parades or

anything happening um that the store

manager might need to be aware of. Um is

it going to be really cold? Is it going

to be raining? Like, should we stock up

on umbrellas? Um like what other things

might be happening in that particular

store? And then also in that store like

what are the top selling products um

that the store manager might be aware

of. So like they can see that over the

last few weeks here are the top selling

products. We've sold 63 pullover fleces.

And the interesting thing is that when I

run the the workflow the output of this

is is not just text

because we can just we can get it to

output markdown. That's fine. Um, but

then the user's actually got to read

that and understand it. Um, and we're

just outputting long long paragraphs

into our interface. What we want to do

instead and what you can do with agent

framework is when you're building these

workflows. So the this synthesizer agent

is something that we've defined and in

code we've said okay it takes weather

events and product analysis like it

takes all this information and it

actually outputs this uh class that

we've defined called the weekly insights

class and this is a model uh and it

would say for this particular store

here's a summary here's the weather

here's the events and here's the stock

items and these actually a of uh

objects. So where previously you might

build an agent that gives you text

output and then the user's got to read

that information, what we do instead

with agent framework a lot of the time

is we define these uh classes, these

types um and then you have structured

information in there. So when the agent

runs, it actually gives you back a list

of stock items to the API. It gives you

back some specific actions that you can

take. It gives you back insights as a

list that you can render in the in the

portal. So in the UI here, like if I

wanted to make it so that this was

clickable, so I could click on that and

it would take you to the product, that's

really easy for me to do. And I don't

need to write all this kind of weird

code that needs to like pause the text

and kind of guess where the links are

and stuff like that. So with agent

framework, my strong recommendation when

you're building these agents is that you

think about whether it is you need

actually structured output. Um and where

AI agents are really capable is where

they take something which is

unstructured. So you've got like uh

paragraphs of text or you've got inputs

from users. It can take unstructured

information, it can make it structured

um and it can make decisions on that and

it can make suggestions on that and it

can summarize information and then you

can get that information back in a form

which is unstructured as well. So the

other option we have is we've got this

inventory agent which is something we've

built. Uh I'll show you what that looks

like in here.

Uh it's this one. This is a nice simple

one. Um yeah, it basically is a single

agent uh that is orchestrated by a

second one. And our suggestion here was

that we've got a number of items in our

store catalog that are low on stock. And

we wanted to help the store managers

basically figure out how do I restock

them. So this is one where we can launch

this particular agent and let me find

it.

Um, this particular agent has given been

given a set of instructions. So, we've

got a summarizer, but we also have one

that will go and fetch as much

information as it can about the stock

using MCP tools. Um, and once it's got

that, it will fetch the information. I'm

not having much luck today, am I? Uh, it

will fetch the information and to give

you back some recommendations as to

stock priorities and things like that.

So yeah, that's another one we

suggested. The code for this again is in

the repo. Um, and like I said before, my

suggestion is that you start off

building a simple agent in agent

builder. Um, once you're happy with

that, you turn that into code by going

to the bottom and doing uh V code. And

then once you've got the code snippet in

agent framework, you can then convert

that into a an actual workflow by

creating a workflow builder, deciding

which pattern you want to use from the

slides we had before. So either fanning

out and fanning in or having a

sequential sequence. Um and then for

each of the agents, you would then um

basically define a set of instructions

and the tools that it has. Um, and then

once you've done that, you can visualize

it in Dev UI. Uh, so that's the insights

one. You can visualize it in here, run

it in here, experiment with it until

you're happy. Uh, and then you can see

the events and traces and tools on the

uh, right hand side in Dei. And then

once you're happy with all of these

things um you can basically compose this

into a container uh that can be run

either in foundry or it can be run on

any kind of container compatible

platform. So um that is a uh feature of

the AI toolkit and the foundry extension

is that once you've built a workflow for

agent framework you can actually turn

this into like a deployable uh

application container uh and you can

have that running on production

infrastructure.

Cool. Back to you Stephen.

Back to the slides.

>> Yeah I mean I think that was um

that was the goal. That was it.

>> I think we uh Yeah, there's um

like amazing stuff that we have covered

and uh I mean it was all really

interesting, highly technical stuff. Um

and I think that the next step for

everyone in the audience is to go and

try to do it yourself. Um, so right here

I have a QR code that takes you to a

Microsoft blog which uh shows you how to

run an agent yourself using a Microsoft

Agent Framework. Um, so this is probably

the best way to get started. It's the

the sort of hello world of using

Microsoft Agent Framework and it's

really great. This is what I use to get

started uh to get familiar with Agent

Framework. Um, so I'll leave it up here

for a bit so you can um you can go check

it out for yourself. Um, there's lots to

explore in this world. I in this AI

world. Um, there's been a banner going

under the um screen um to join the

NVIDIA developer program. Um, yeah,

there it is. Uh, so definitely join that

as well. Uh, lots of cool resources um

through that program. So definitely

check it out. Um, and our next episode

coming up in January is focused on one

of our other NVIDIA AI blueprints. Um,

it's focused on our deep research

agentic blueprint. Um, really focused on

sort of like um fully automated rag

system that can handle all different

kinds of data and you can deploy it

super easily. Um, so it's definitely

something that you should check out. Uh,

whether it's applicable to your job or

maybe it's something you're interested

in personally, there's lots of value um

from from this upcoming episode. So

definitely check it out. Stay tuned. Um,

lots of stuff coming through this series

in the future. So yeah, thank you.

Anything else, Anthony?

>> No, that's all. Thanks. Yeah, thanks

very much everyone. Um, I'll drop a link

to the code I was demonstrating as well.

Um, if you're interested in how that was

put together.

>> Awesome. Tony Bologna. Love it.

>> That's my username.

>> Such a good username.

>> Too late to change it.

>> All right, everyone. Thank you so much

and take care.

>> Thanks, everybody.

Thank you all for joining and thanks

again to our speakers.

This session is part of a series. To

register for future shows and watch past

episodes on demand, you can follow the

link on the screen or in the chat.

We're always looking to improve our

sessions and your experience. If you

have any feedback for us, we would love

to hear what you have to say. You can

find that link on the screen or in the

chat and we'll see you at the next one.

>> [music]

>> Dix [music]

dick.

Dick [music]

dick

[music]

dick

dick

dick [music]

dick.

[music]

Data. [music]

[music]

Want to dict.

[music]

Scale and Orchestrate Multi-Agent Systems Effortlessly (APAC)

Microsoft Reactor

75 days ago

55:36

Multi-Agent Collaboration

Rank #1

Description

Explore how to leverage multi-agent systems in your applications to optimize operations, automate recommendations, and enhance customer experience. The solution utilizes Microsoft Agent Framework, OpenAI ChatKit and NVIDIA Nemotron model on Azure AI Foundry to seamlessly connect with store databases, integrate human oversight, and deploy scalable chat agents. This approach enables real-time analytics, predictive insights, and personalized interactions, resulting in improved decision-making, operational efficiency, and a superior user experience for application developers and users 📌 This episode is a part of a series. Learn more: https://aka.ms/AIAgentsApps/y-MSFT 00:06 - Welcome & Housekeeping 01:04 - Introduction to AI Apps & Agents Dev Days 02:29 - Agenda Overview: Agents, Frameworks & Blueprints 04:55 - What Are AI Agents? Key Concepts Explained 07:10 - Why Agentic Workflows? Specialization vs. Single Model 09:18 - Building Agents with Microsoft Agent Framework 12:01 - Options for Agent Deployment: IaaS, PaaS & SaaS 16:07 - Workflow Patterns: Sequential, Concurrent & Advanced 19:22 - Tools & Memory in Agent Framework 23:00 - Enterprise-Ready Features: Telemetry & Security 23:51 - Demo: Building Your First Agent with AI Toolkit 28:25 - NVIDIA AI Blueprints: Getting Started 33:04 - Under the Hood: Models, NIM Containers & Deployment 38:35 - Real-World Integration: From Workflow to App 47:20 - Structured Output vs. Text Output in Agents 50:39 - How to Get Started & Resources 52:50 - Upcoming Episodes & Closing Remarks #microsoftreactor #learnconnectbuild [eventID:26557]

Video Details

Category

Multi-Agent Collaboration

Featured Date

December 17, 2025

Quality Rank

#1

AI Recommended