Scale and Orchestrate Multi-Agent Systems Effortlessly (EMEA) | DailyDevLists

Loading video player...

Full Transcript

10,063 words • EN

Hey everyone, thanks for joining us for

the December session of our AI apps and

agents dev day series. My name is Anna.

I'll be your producer for this session.

I'm an event planner for Reactor joining

you from Renman, Washington.

Before we start, I do have some quick

housekeeping.

Please take a moment to read our code of

conduct.

We seek to provide a respectful

environment for both our audience and

presenters. While we absolutely

encourage engagement in the chat, we ask

that you please be mindful of your

commentary, remain professional and on

topic. Keep an eye on that chat. We'll

be dropping helpful links and checking

for questions for our presenters to

answer live.

Our session is being recorded. It will

be available to view on demand right

here on the Reactor channel.

With that, I'd love to turn it over to

our presenters for today. Thank you so

much for joining.

>> Thank you Anna. I really appreciate it.

Um and thank you everyone for joining.

Uh this is the third episode of our AI

apps and agents dev day series and we

are super excited to show you lots of

great stuff about AI scaling and agents

and orchestration.

Um now my name is Stephen McCulla and

I'm an AI solutions architect with

Nvidia. So, I work very closely with

Microsoft on integrating all of the

latest technology between Microsoft and

Nvidia directly into Azure. And I'm

joined here today uh with Gwyn.

>> Hey, Stephen. And uh hey everyone else,

welcome to this series. My first time

here. Thanks so much for the for the

invite. Stephen has been awesome working

together. My name is Gwen. I'm on our

Python advocacy team and yeah, excited

to show you a bunch of cool agent agent

stuff.

>> Yes. Awesome. Lots of great agent stuff

to show today. Um really just no

shortage of great things going on in

this world. Um so a bit about the

program before we jump in. Um the

partnership between Microsoft and Nvidia

is a very deep one, very long lasting

and really this um webinar series is a

big part of this partnership. We want to

show developers and users like you how

you can really leverage the latest and

greatest technology coming out of this

partnership. So, make sure that you tune

tune in to um the rest of this series so

you can learn how you can best leverage

all of this great technology. Now, that

being said, let's go ahead and jump on

in. So, today it's all going to be about

scaling and orchestrating your AI

agents. So, we're going to start off

with a quick touch uh touch point about

AI agents, what they are, how they work,

um how you can create your AI agents.

Then we're going to dive into Microsoft

Agent Framework. And this is really um a

great um tool that helps you orchestrate

and build multi- aent frameworks that

can scale and can orchestrate across

hundreds of agents and can help you

achieve very complex tasks um completely

automatically.

Then we'll dive into NVIDIA AI

blueprints which are you can think of

them like u recipes for complex uh

multi- aent workflows. Um I'll show an

example of uh one of our AI blueprints.

Then we'll go into a couple demos

showing how you can integrate your

agentic workflows into your application

and how you can use it um for batch

processing for asynchronous workflows.

So, lots of great stuff to jump into

today. I'm just going to go right into

AI agents. So, um quick review. Um what

exactly is an AI agent and how does it

differ from an LLM? Um well, an agent

has a couple different capabilities um

on top of sort of the um you know basic

LLM.

Um the first is that it has reasoning

and tool calling capabilities.

So um not every LLM can be an agent but

if it does have that reasoning and tool

calling capability then it can be used

as an agent. Um on top of that we also

introduce long-term and short-term

memory capabilities. Um, so if you're

familiar with something uh with tools

like lang chain or Microsoft agent

framework um that help introduce that

memory aspect into these LLMs um that's

a big part of what makes an LLM an

agent.

So whenever you introduce these

capabilities to your LLM and you sort of

encapsulate it into an agent um you open

up a huge world of possibilities because

not only does it become more intelligent

with this reasoning capability but it

can also interact with the outside

world. So example um if I want to create

an agent that is my content creator for

let's say LinkedIn um I can give it an

API tool to where it can create some

content on LinkedIn maybe a paragraph or

an announcement and um it can

automatically call this tool to post

onto my LinkedIn page for example. So

that's just one sort of possibility that

this agent can do. But once you

introduce it to um tools that you create

with your own code, the the world, you

know, is is your oyster in that case.

You can really do um just about anything

you put your mind to with your agents.

So these are very powerful um tools that

you can use for pretty much any complex

uh task. So

whenever we think about agentic

workflows, you know, how how does that

work? Why do we even need an agentic

workflow if our agent can already

reason? It's already intelligent and it

can already call tools and interact with

the outside world. You know, why do we

need to introduce um these workflows

into into our uh systems and into our

company? Um well the the key word here

is specialization.

Um there isn't a there isn't a single AI

model out there that is the best at

everything. Um some AI models like

Neotron parse are specialized for

document analysis. Some models like you

know G GPT5 are um you know powerful for

coding and and chat. And some models are

great for audio processing. Um, now if

you get into fine-tuning models, then

you can get even more specialized. You

could take your model that's great at at

chat and document analysis and fine-tune

it for, let's say, the uh, you know,

financial sector or fine-tune it for

health care. Um, so you can get even

more specialized models. Um, and it's

really that uh the idea that multiple

specialized models will beat one

generalized intelligent model that

introduces the need for agentic

workflows. Um so for example let's think

about an AI system where you know you

might you like a doctor might bring a uh

a device into a diagnosis room uh where

they're meeting with the patient and

that device might listen to the whole um

conversation going on between the doctor

and patient and in the background this I

AI agent workflow

um is number one transcribing the audio

into text. Um it number two is

researching everything it's learning. Um

and it's uh creating a uh sort of

diagnosis suggestion for the doctor

based off of what the patient is saying.

And number three, it's creating the

appropriate plans uh for age uh for the

patient to follow um depending on their

diagnosis.

So in that system there likely there

there isn't really one model that can do

all of those things, you know, just the

absolute best, right? We probably want

to have fine-tuned models, uh

specialized models that are focused on

each one of those areas. the

transcription from voice to text. Uh the

research into u medical history and u

other past medical diagnosis and three

um looking into the patient patients

past research history and then four uh

creating a uh action plan for the

patient. So that's an example where you

might see an agentic workflow, but there

are

almost endless work uh use cases where

you can apply this methodology. Um and

we'll be going over that today. So um

I'll hand it over to Gwen to talk more

about how you can actually build these

workflows yourself and um customize them

and deploy them using Microsoft Agent

Framework. So over to you Gwen.

>> Awesome. Thank you Stephen. Yeah, let's

talk about Microsoft agent framework for

context. You might have used something

like autogen or semantic kernel before

which is which are frameworks that

Microsoft has created in the past. Agent

framework is sort of the uh journey

moving forward. So if you're building

something new, it's what we recommend

you leverage. But to take a step back

and kind of discuss why you would want

to build agents in code in the first

place

as developers you already have many

reasons most likely like you as an

example you probably know Python that

means building something else but with

Python you already have some

similarities there right you know you

have uh you can leverage different SDKs

different tools different data sources

uh you can make them as custom and as

fluent controls as you'd like and that

just comes with the flexibility of of

programming something, right?

Additionally, you can create these

things locally. You can leverage local

models, you can leverage tools that you

would leverage your day-to-day

development journey, right? And then you

can have them in your CI/CD pipelines.

All these types of things that we've

known from from this development journey

and uh portability. If we leverage a

framework, we're using code. We can

deploy to one cloud. If we want to try

it to a different cloud, if we want to

have it deployed locally on a container

or on some random computer that you have

sitting around, we can do that as well.

Right? So those are awesome advantages

as to why we want to use code to create

these agents. But that doesn't mean

you're limited to

only using code, right? you have sort of

the same abstractions that we have when

we think of cloud infrastructure is ps

fast container as a service so many

options we see that same pattern

happening with these

agent developments geni uh apps and

things like that right so with is

the same way that you think about with

you know cloud infrastructure you bring

your own containers and frameworks and

you are more in control of the infra but

that also means that you are in control

of the maintenance and all that kinds of

things right uh in this case you'd have

like open source LMS and frameworks and

things like that that you can leverage

for pass so platform as a service we

have things like uh foundry now uh that

have an agent service there as well and

then SAS traditionally you could think

of something like uh logic apps or um

you know all your software as a service

type things we have actually copilot

studio that allows you to create agents

uh without code which I think is pretty

awesome. So there are there are options

for everyone but for today we uh want to

focus on

code and Microsoft agent framework right

and I will move over here. Microsoft

agent framework is in preview. We do

have a lot of changes happening so do

keep that in mind but it is something

very very fun to build with and expected

to be a G sometime in the new year early

in the new year. uh keep that in mind

but it is the open source engine for

building and orchestrating intelligent

AI agents. We do have integrations with

you know things like open telemetry

which are golden standard when it comes

to observability. You have all the

benefits that you get from using the

Azure platform, guard rails, security

evaluations, all that types of things.

Um integrations into Entra ID and all

all those things and yeah very cool

product. Check it out. So for

orchestrating uh multi- aents, we have a

couple of workflows. And I'm actually

going to load this here.

Let's go back. My uh animations are

somewhat slow here. Anyway, we have

multiple workflows. And these aren't

specific to Microsoft agent framework. I

think you will see the sort of same set

of workflows across various

frameworks that are meant to be used to

build agents. and they might just have

different names, but the patterns

themselves are quite similar. The most

easiest, and if you haven't built

anything before, I recommend you start

with the sequential, is you really just

have one agent do a task, then the next

agent does the task, then the next agent

does the task, and then you end with

some result, right? Sequential. With

concurrent, you kick off something, but

you have various uh agents working um at

the same time, right? And this is ideal

for when you have work that isn't

necessarily dependent of each other,

right? So, uh they can all do their own

things and you save on like processing

time, right? Hand off. This would be

ideal for something like a like a

customer support experience. So, the

customer sends some kind of message and

then the first agent it's goal is to

triage, right? So, then it understands

like okay, this is like a tech support

request. then it will go and hand that

off to the tech support agent or maybe

it's um a refund uh request. So then

it'll go and hand that off to whichever

is the right agent for that workflow. Uh

and then they just you know hand off the

work until they complete. Right. Now,

those three I feel like cover a good

amount of

sort of getting started and actually

quite quite complete uh workflows and

examples and things that you most likely

would want to build. But if you need a

little bit more, a little bit more

planning, a little bit more management,

I would say the group chat and the ones

are the way to go. Group chat is you can

think of it as like a writer's room.

Like we we have someone who pitches an

idea or is looking for feedback on

something and then you have a bunch of

other agents that can either provide

feedback, provide iterations, edits, and

things like that. And then it comes back

to the the reviewer and then it's sort

of like a collaborative back and forth

and the goal is to know get to the

result and you will define at the

beginning which type of pattern you want

to use for this specific group chat.

There's like roundroin. Uh there's a

couple of other options there and it's

great for when you are not 100% sure on

it should be sequential or things like

that. Um I'll actually show you an

example in a little bit of this one. And

then magentic I like to think of it as

like a souped-up version of group chat

because the goal of the the magentic

workflow is to

plan ahead of time. So you give it some

kind of task. it will turn it into

subtasks and then outline a plan for

okay I'm going to give this subtask this

agent this subtask and this agent but

the reason there's a little document on

this uh workflow here is because it

keeps updates on the progress if any of

the agents are not working any anything

that is sort of relevant to the

execution of this work it'll keep track

of that and then if for some reason

something doesn't work it goes back into

planning mode it's quite flexible quite

robust to figure Okay, how do how do we

get to the end result that we need to

and you can envision that that's going

to take you know more resources more

time but more robust. So for specific

workflows that would be ideal right and

then the ultimate one ultimate

complexity workflow process which is

essentially taking all of these

workflows and turning them into agents

themselves. So each one of these little

items could be like a sequential or a

concurrent or a handoff and then they

are all interacting that well right it's

like it's work like workflow exception u

but many many options again if you

haven't worked with any of these types

before I recommend starting with at

least the first first two and going from

there

now uh on top of having all of these

workflow options and like I mentioned

before with entra ID and things like

that you can also plug in a lot of

different tools and make your workflows

quite extensive, right? Have like MCP

servers. We have like Cosmos DB, SQL

Server, uh Lang Chain, and many many

more coming. So, it's not just about

creating agents that can interact with

LMS. It's about creating agents that can

interact with LLMs and are also grounded

and sort of powered by different data

sources, different tools, uh different

inputs that will come from all of these

tools and extensions, right?

And the other sort of key part to make

agents work well is having some kind of

memory tool or mechanism. In agent

framework, we have something called

agent thread. So we can just kind of

work through this example here. So let's

say I have a travel planning

application. I have a user that sends I

need to book a hotel in New York for two

stays, right? And then the agent will go

and use the trip advisor API, search for

the nearest hotel, creates the message

and then sends it back there, right? And

then I ask another question. Again,

let's kind of think of these threads as

like chats. Like when you're inside like

a chatbot, you have like one chat

thread, right? And then I'm asking

another question here which is what's

the daily meal allowance for the

business trip and then again the agent

goes in this time it goes to max

shareepoint queries the comp the company

travel policy creates a message and then

sends a message back right so the thread

itself has the in context memory here of

what is going on because the next time I

send a message in the same thread it

would be helpful to have all of this

sort of relevant information that we've

worked on before right but on top of

that when I create another thread,

right, maybe it's a different day, a

slightly different topic, whatever it

is, or maybe I need to book um I don't

know, I want to know about

transportation options or something like

that. We also have the memory mechanism,

which this keeps track of important

things of all across all your threads

for specific user, right? That way your

user is not always starting from zero,

right? It depends on what you want like

short term, right? shortterm you would

keep in mind these threads and like long

term you keep in mind these memory uh

options and for that you would need a

database of course you wouldn't want to

be storing that in something like I

don't know JSON file or anything that

now um

I had mentioned a little bit about us

having options for observability with

open telemetry and then if you're

familiar already with the Azure platform

you've most likely seen that you can

[clears throat] get very very rich

insights in application insights, active

monitor, right? You can see traces, you

can see logs, you can use uh custoto to

query for all of those things. And we

also have constant policies uh human in

the loop sort of approval flows, things

like that. And you can also uh we also

support longunning things. I'm actually

running something right now that I

started like half an hour ago. And so

I'll show you hopefully it's done by the

time uh we have some demos at the end

and yeah we have uh policy enforcement

thanks to you know we have entra ID we

have content filters uh yeah bunch of

amazing stuff that comes thanks to

leveraging the foundry platform uh for

building these things. Uh and before we

move on to this I do just want to show

you an example of what like a really

simple workflow will look like. I think

sometimes people think like

oh um you know agent frame agents

building agents can be quite complex and

things like that. I'm just going to show

you here. So this is something we call

dev UI.

I'm going to zoom in a little bit here.

So this is something we call devui

and move this over here. And just

because I get asked this a lot. Yes, you

can switch it to dark mode if you want

to. Um but anyway, so here what I'm

going to do is just run this, right? And

I'm going to put in a message here. Um

tomorrow

I am free. All right, I'm just going to

send this over. So what we have here is

a a workflow that we can actually see it

execute uh one by one, and we'll see

what decisions it actually makes along

the way. Its goal is to grab some kind

of input and translate it to Spanish,

right? with the reason why I tried to

make it somewhat of a vague uh English

text is because the goal here is for

between the reviewer and the editor and

the final input and then the

re-reviewer. It can only sort of give us

a final input if we have above 95%

accuracy. But this one it is giving us

let me see

uh it gave us

and okay that's actually a good one. I'm

trying to get it to the These models are

getting quite good at uh translating to

Spanish. Let's see. This is a Let me do

some grammar issues. Um Microsoft agent

framework.

I'm trying to get it to trigger that uh

rear reviewer here. Uh but you can start

to kind of [clears throat] visually

easily see here how it works, right? And

then we also have the events here on the

right side. So we can see what happened

step by step. And we also obviously have

it in here as well. And yeah, I'm really

trying to get it to No, it keeps Oh,

here we go. No,

no, it went straight to the final. All

right, you get the point here is you can

have the option where you sort of get

things to work depending on specific

conditions. And if we look at the code

here, uh let's go on here.

So it is this single file. The I guess

me oh there's a bunch of meat and

potatoes but the most important thing is

if we go to the bottom here we can take

a look at our actual workflow here which

is defined in this code. I'm going to

make this just a little larger. Go here.

There we go. So we have a workflow

builder. The goal is to start with you

know your executor, right? Which in this

case is just our translator. And then we

have an at edge, right? because here

each uh each one of these are sort of

considered when you branch out we

consider that an edge. So we'll go back

here and then uh we'll say add edge the

translator which will send off to the

reviewer and then here it tells us here

if the high quality go to output else

you go to the editor which we have here

a switch edge switch case edge group so

again depending if we hit above that

percentage of quality uh which again is

subjective because we're giving these to

LLMs but you get the point right and

then after editing re-review and if it

is high quality we go to the output if

We'll just go back to editor until we

get a good a good one. I just want I'm

going to try once more to see if I can

get it to fail at it.

>> The model is just too smart.

>> Yeah, these the GPT models are fantastic

with Spanish. Maybe I should have pic

picked a different model. Uh let me just

say what is you let's do something like

that. Let's just like some incor very

incorrect English. [laughter]

Um, yeah, I've I've worked ever since

since like GPT3

with like a lot of translation because I

work a lot with like developers that

speak Spanish and G the ever since like

yeah, probably 4

one. It's just been one of the best

models for for working with with other

languages. Uh, so yeah, pretty

interesting. All right, let's see. It is

>> question um how can we use and install

dev UI and is it Python only?

>> Yeah, it's a great question. So a

framework Asian framework works with uh

we have C and Python support now and it

really is just installing I'll show you

in a second here but we did get it to to

run. And we see it's going on run two,

but it really is just using we'll go

over here

projecttoml and here we are using agent

framework and then agent framework devi

right you don't necessarily have to use

wii um but it's a great way to sort of

you have that visual representation of

what's going on and then the actual

configuration of it is just down here

nothing too too

uh crazy just this line here. Sure.

Right. And uh we'll drop some links,

documentations for you to go and uh

review that. But it's um yes, but pretty

neat. Pretty neat tool. Now we'll go

back here. Yes. So here. Okay, cool. Now

you see translating the text. The

accuracy was 90%. So it needs

improvement because it's below 95. Then

it goes back and then it ended and uh so

I said, "What is you?" I guess my I was

trying to say who who are you? So it

does it does get the correct uh

translation in Spanish. Yeah. So anyway,

that was a nice little WI plus uh agent

framework example there. Uh do you want

to talk about Nvidita AI blueprints now?

>> Absolutely. Let's do it.

All right. So yeah, like Gwen showed, um

there's so much capability and so much

flexibility when you build your agent

workflow. Um but you know, admittedly,

it's it could be a bit tricky to to know

how to get started. Um maybe, you know,

you need some reference architectures or

at least some some ways to help you, you

know, get on your feet and start

building your workflows for your um for

your company. Um, so that's really where

Nvidia AI blueprints come in. And you

can think of AI blueprints as like

reference workflows

um, and that can be pre-trained and

customized for specific use cases. So if

you go on build.envidia.com nvidia.com

um and go to slashbloopprints. You can

see all of our blueprints shown here. Um

and there's just so many different kinds

of reference architectures and um

recipes that you can use. So we have a

multilm nim blueprint. We have a

blueprint for um creating a data

flywheel. We have a blueprint for

creating an AI retail shopping

assistant. And all of these are

available on the NVIDIA AI blueprints

GitHub repo. So you can go here and find

the code for all of these different

blueprints. So it's all you know

completely visible to you. You can come

in here, download the code and deploy

these blueprints yourself. Only

requirement is that you would have uh an

NGC account. NGC is Nvidia GPU cloud.

you just create an account, create an

API key, and plug it in, and then you're

ready to go. Um, so it's it's a great

way to get started. And the way that

your uh that the blueprints work is that

we have three foundational blueprints.

So we have AIQ, agentic AI blueprint.

Um, we have the rag blueprint and we

have a data flywheel blueprint. So these

are sort of the three foundational

workflows that we build everything else

upon. So you can see if we go back to

build that each one of these is sort of

built on one of these three foundational

workflows. So AI observ AI observability

for the data flywheel. Of course that's

the data flywheel uh foundational

blueprint. the streaming data to Rag.

That's the blueprint for Rag uh

foundational blueprint. So all of these

are sort of built on these foundational

blueprints and customized for individual

use cases. So this is a great way to

understand how you can build these

agentic workflows um for your own use

case. And we try to really make these as

approachable and as applicable as

possible to as many realw world use

cases as possible. So there's a high

likelihood that whatever orchestration

uh agentic workflow uh you're trying to

orchestrate um there's already a

reference architecture for it or for

something very similar um on NVIDIA's

blueprints website. So, I strongly

encourage you to go check it out and see

what you can build.

Um now

see, so one of the blueprints that I'll

be showing you today is the uh financial

model distillation blueprint.

So what this does is it dis it distills

um or it fine-tunes a smaller model

that's about 1 billion parameters and it

uh fine-tunes it using financial data

and you know uh whenever you are doing

model distillation you are essentially

using a larger model to train a or to

fine-tune a smaller model. So what we're

doing here is we're using uh the larger

models like llama 3.18b,

llama neotron 49b

and uh using those larger models to uh

fine-tune the smaller model. So uh this

is of course using um financial data and

financial modeling things like stock

price information financial news uh to

to to fine-tune the smaller model but

this reference architecture can be used

for multiple different kinds of data.

You can use healthcare data, you can

use, let's say, sports data. Um, and you

can use the same architecture.

And again, all of the code to run this

is available on the NVIDIA AI blueprints

GitHub. So, if we go here um to AI model

distillation for financial data, we have

our entire Jupiter notebook that is used

to create uh this architecture. So if we

deploy this onto a virtual machine which

I'll show you in a second um you can run

this entire uh Jupyter notebook and step

by step understand how this model

distillation is working. So this is a

really really great way to get your

hands dirty to understand how this

workflow is working and to implement um

something yourself.

So um

back here it's important to understand

the different components that are going

on um under the hood in these different

workflows. So of course we have our uh

vector database, we have our

orchestrator and we have some of our

long-term memory data stores here with

Nemo.

Um but let's focus on the models and the

compute side that's going on here. So

these models um and are based on the

NVIDIA Neotron model family. So Neotron

is a suite of models that includes LLM

uh VLMs um and safety models as well as

rag models. So Neotron is a family where

you can pretty much any model that

you're looking for um Neotron has a uh a

model that can use for your use case.

So for example um one of the newest

announcements in the AI world is that

just yesterday um Nvidia released the

Neotron 3 Nano model which is a

fantastic model that can do um

agentic work. it can do um tool calling

and uh very intelligent reasoning um

through in in just a very small

footprint. So it's highly intelligent

model top of its class um and it was

just released yesterday. So it's very

brand new. Um we also have uh the

Neotron Nano2 vision learning model uh

which is great for um multimodal

workflows. And so whenever you're

building these different blueprints um

you can plug these different models in

where you need them. So if you remember

what I was talking about earlier, how we

have different specialized models for

different use cases and that's why we

need this whole agentic orchestration

rather than one intelligent model. This

is exactly sort of the answer to that.

We have these models for all of these

different use cases and they're all open

source, openw weight, so very

approachable and easy to get up and

running.

Now whenever it comes to how we run

these models that's where NVIDIA NIM

comes into play. So uh NVIDIA NIM is

really the answer for how can we bring

this complex process of serving these

models um and tuning them and optimizing

them for our infrastructure.

And it the answer to that question is

running it with Nvidia Nim because

Nvidia Nim is uh you can think of it

like a Docker container with all of this

optimization and tuning baked into it um

that you can run with a simple uh Docker

run command. Um and all of our NIMS are

available um on NGC. So if I go to

ngc.invidia.com nvidia.com

and I go to the catalog, it will take me

to and I go search for the containers.

Um, I can find the Nvidia uh NIM section

that has all of our NIM models here. So,

for example, the Llama 3.1 Neotron

NanoVL um that's available here. We have

a NIM for GPTOSS20B.

So we make NIMS not just for Neotron

models but also for um other open-

source models. Um so really great way to

get up and running and to leverage these

microservices um very quickly.

So um

now that we understand what's going on

under the hood um let's go back to this

AI uh financial distillation workflow.

So, if I open up the uh if I clone the

repo and open up the the workbook, um

you can see that it walks me through all

of these different steps that take me

through the data processing and

preparation part of the workflow, as

well as how we can create a data

flywheel. And if you're not familiar, a

data flywheel is essentially a

continually running process to sharpen

your models using newly incoming data um

from both your um from both the outside

world as well as um any real user

interaction with the model. So this

helps cover a lot of these really core

ideas that are important to agentic

workflow orchestration.

>> [clears throat]

>> So, um, as you can see, it includes all

of the code that you need to get up and

running. You can really just run it as a

Jupyter notebook. Um, and you can also

use it to reference whenever you create,

let's say, your own workflow. Um, you

can plug in the models that you would

like to use. So, right here, we plug in

3.3 Numatron Super 49B uh v1. We could

also plug in a different large uh

reasoning model um to use here because

this is going to be the teacher model

that is sort of larger and teaching the

smaller student model about this uh

financial data.

So um this really walks you through the

entire process um and helps you get up

and running very quickly. Now we also

have the capability to run these

blueprints on uh Brev. So, if I go back

to

um let's see if I go back to the

blueprints and I find that AI model

distillation for financial data, I can

uh go view the code on GitHub, but I can

also just click and deploy it um via

Brev. And with Brev, it launches the um

workflow on a hosted service in the

cloud. and I can choose which one I want

to host. So this would be Lambda Labs

and I can just deploy the Launchable and

it creates this entire blueprint on

Brev. So it's another really great way

to get up and running. Um you can test

it out seven less than $10 an hour.

Really great way to sort of get

accustomed and get your hands dirty uh

with these blueprints.

Um, so lots of great stuff going on here

with the blueprints and I highly

recommend it as a way to um to

understand and build your learning with

these AI workflow orchestration tools.

Um, so I'll hand it back to Gwyn and

she's going to show you a couple

different examples how we can use these

workflows for realworld use cases.

>> Yeah. Before we we dive into that, I

want to grab a couple questions.

There's a will blueprints run on Azure

platform or Nvidia platform?

>> Yes. So it it depends. You can run um on

the Azure platform. I recommend you

check out uh the AIQ blueprint. Um there

is a you can install it via Helmchart.

Um so you can run it on Kubernetes which

means AKS. Um, so and then on the Azure

platform,

um, it's not going to be like hosted and

and serverless where you don't worry

about the infrastructure. Um, if you

want to run it on Azure, you would run

it on something like AKS or an Azure VM.

But if you want something more hosted

and serverless, that's where you would

use something like Brev. So it it sort

of depends on what you're looking to do

there, but short answer is you can run

it on both.

>> Perfect. There's um a few more here. Can

the blueprints run on Nvidia's DGX

Spark?

>> So that depends it depends on the

blueprint because um some of these

models uh like if you want to run a

model on the DGX Spark um number one the

model has to be the right size for the

DGX Spark and it has to have the right

uh like software support for the DGX

Spark. So if the blueprint is using

models that do fit that criteria, then

then yes, absolutely. But I would I

would verify um that the blueprints

you're using are using models that have

that support.

>> Awesome. Yeah, we can get through the

rest uh towards the end. We'll I'll try

my best to leave time at the end uh

after we go through a couple of these

examples. Uh okay, so I want to show you

a couple of things. some integrations,

some background running stuff. Uh we and

then you know whatever else I can show

you. So to start, let's go to

uh we can just look at the codebase to

start for example. So we have quite an

involved project here. I'll show you. We

have uh I guess of importance. Uh we

have a bunch of agents, right? This is a

Python project. We're using Microsoft

Asian framework here. And we have a

bunch of agents in here. Here we have

some we have a stock agent marketing

admin insights insights and then these

are all using different types of

workflows and on top of that we also

have MCP servers we have a finance

server and we have a supplier server. Uh

as you know agents work really well uh

with MCPs. Uh so great to have a couple

of those as well. And then we just have

like some front end well a lot of front

end stuff there too.

But uh we can dive more into that in a

bit but we'll we'll stick to kind of

looking at this. I launched the project

and for those who are C developers

you're most likely familiar with

something called Aspire uh and now

Aspire also has first class support for

Python projects as well which is why

we're using this tool. And it's in my

opinion one of the best ways to not only

get an overview of like where everything

is running. For example, my agent dev is

running on uh this URL. My finance MCP

is running on here, but we also have

quite rich uh console output here. I

just ran a bunch of my

like background things. So we'll see

like a bunch of things coming in here.

Uh we also have a more structured login

if uh that's helpful. This is quite

helpful when you want to see like

immediate errors versus information at

different levels. And we also have

traces which when we're calling uh you

know LLMs and things like that quite

helpful to to be able to see this as

well. Uh but this isn't inspired talk

but I wanted to show you that. All

right. So what we have here actually is

a like sort of just like a shop right.

I'm going to go and open it up. Uh here

we'll open this up. Right. So we have

this pop-up shop. you can buy a bunch of

projects um products I mean and we have

the option to log in either as a

customer or as an admin and I think I

have the customer logged in already here

so this is one of the integrations we're

using uh what's it called chat UI kit by

openai which allows you to create these

sort of chat experiences that are

powered by LMS behind them right so I am

signed in here as a customer I just zoom

in here you can see here I'm signed in

as Stacy, right? And then here you get

what you would expect when you sign in

to a customer portal. Just orders,

items total

uh savings in this case. And then I just

opened up the chat and I asked what was

my most recent order and it gives me

information on that. If I wanted to do a

return, I think this is way out of the

return policy. This was in Yeah, this

was like six months ago. So, I'm pretty

sure there's no store out there that

would let me uh do a return, but if we

wanted to, we would have the option to

kick off that type of functionality as

well. And um we have a couple of other

customer ones, but I want to show you

our admin site first. So, let's go in

here.

I can log in as a manager.

And we have different managers for

different stores. You can log in, of

course. But the first integration that

we have here is this weekly insights.

This is actually AI generated. And this

is specific to the store that you are

logging in. Right? So when we think of

managing a store, managing product, it's

important to consider not only like top

selling products, but also maybe there's

going to be a lot of snow in the

following week. So it makes a lot of

sense for us to stock up on like winter

boots or heavy jackets or things like

that. Or maybe it's raining, right? So

having that quick glance of information

here and I'll show you how this works in

a second. We also have those top selling

products like I mentioned and then also

local events. I uh logged into the New

York store. So here it's saying several

major outdoor events including a holiday

festival in the New York new New Year's

Eve celebration. And these expect to

drive significant foot traffic and

clothing sales in the coming weeks. So

these are all things that are important

to keep in mind when we try to restock

things or you know try to make the most

out of this. Right now let me just show

you how that works a little bit before

we see

uh the the stocking functionalities.

So we have I'll start here at the bottom

and I know this looks like a lot but I

promise you don't sense it. So we'll

start here at the bottom and that is

where we define our actual workflow and

sort of the process of how things work,

right? So we have a data collector, we

have a weather analyzer, events

analyzer, top selling products analyzer

and insights synthesizer, right? So if

we look back here again, weather

analyzer is in charge of providing us

this information. Top product analyzer

is the agent in charge of giving us this

information. The events analyzer is the

agent in charge of giving us this

information for the events and then the

summarizer or the one that distills all

of it is in charge of you know giving us

this entire entire thing. Right now if

we head on back here we can see that we

are doing uh we start with a data

collector which just gets information

about what store I'm logged into. Right?

And then we have that fan out. I think

we have uh do we have the WI open here?

Yes. Let me actually switch this to

negative. If we look at our weekly

insights, we see we have we start with

our data collector get information on

the store and because when you think

about this project and let's go back to

those different types of workflows that

I mentioned

it isn't dependent like the top selling

products that insight to the events

analyzer to the winner an analyzer. So

this is a good use case to use something

where we are running these uh

concurrently right. So we kick

everything off. We give the information

that each one of these executors, each

one of these agents needs and they can

go and run at the same time and I don't

have to wait for like the weathers and

events. They don't necessarily depend on

each other and then my synthesizer will

collect everything. So fan in and then

give us [clears throat] the information.

Right? So if we go back here, that's

exactly what we are are defining here.

We have our fan out your our data

collector fans out to our weather, our

events, our top selling and then fans

back in, collects everything and

generates that insights for us. Right?

Now, another thing that's very relevant

here is to take a look at how our let's

look at our top selling product. Right?

This is essentially our agent. Well,

actually a little bit more. It's a

little bit more code here, but here what

we're saying is, okay, I want a chat

agent, right? And we're providing here

the instructions. You are a retail

analyst analyzing product performance

retrieve the top five selling products

right just relevant information here and

as a tool we are sending the finance MCP

so instead of us creating the

functionality in here making like adding

all that code we just created a finance

MCP because other functionality other

agents can leverage this as well and

then we're sending it to the this chat

agent like hey you have these tools I

need you to get this task done go ahead

and do uh do that for us

And just for that, I'll just pop open

our finance table here. And this is

connecting to a database that will go

ahead and

uh run SQL queries, run get relevant

information that it needs uh to

uh answer the question, right? And each

one of these functionalities would end

up being a tool. I'll show you. So

anything that's decorated with a tool

means that that is something that an

agent can leverage to get information to

get results to get answers from. Right?

So for example, get company order

policy.

Uh we also have get supplier contract,

get historical sales data, get top

selling products. Right? Now another

thing to keep in mind here is for

example in our insights, I'm not telling

it specifically which tools to use. I'm

just saying it has the entirety of

finance MCP. And if you are

really good at prompting and you're

quite explicit from what you are

providing as instructions, it will know

which tools to go and pay. Right? And um

that's why we see the relevant

information here on our homepage. Right?

These are all relevant to this. And uh

the last thing oh the last thing that I

think is pretty cool here that I want to

show you before I show you the batch

stuff is we are also leveraging external

tools, right? So, we can use internal

MCP servers that interact with our

information, our data, which is great.

But we can also have a let me take a

look here at our

uh context. Let me look at our weather.

We have a weather analyzer that should

go

uh stock. Oh, no. We're in the wrong

insights. Here we go. Uh we should have

a weather.

All right. I'm just going to search for

it

weather

uh analyzer or is it uh here we go

and here what we are doing is simply

calling an API we're calling open media

which uh working with this API was

awesome uh so highly recommend and the

cool thing here is we are providing a

structured output that way we have we

can sort of be strict with what we

expect And instead of having just you

know random text or random JSON returned

to us and yeah we're just calling an API

here we're providing the proper

parameters that the API expects similar

to just working with normal code and

APIs. Same exact thing here but uh once

we get that information from the API we

send it to an LLM to give us the proper

insights for uh the weather that we want

for the SE right uh that's this part

here. What else I want to show you? Okay

cool. So that's a little bit about how

we would sort of generate this. And then

if we were to log into a different

store, this would look different

depending on that store. And the other

uh really cool functionality that we

have here is we have this if you click

on well we can one click on inventory or

we can use uh this button here that says

generate insights based analysis which

will take all those insights. I click on

this and we have an agent that is

specific to inventory to stocking page,

right? And the instructions for the

agent is all those insights that we just

got, right? We've got the weather, we've

got the top selling products, and

essentially what we wanted to do is,

hey, we need to restock product, use

these insights to go and make a smart

decision on what to stock. Okay. Now, I

don't want to kick this off right now

because it's probably not going to be

done by the time, but I did uh

run it before, but I ran it just from uh

if we go back to dashboard,

I ran it from I think there's like an

inventory. Yeah, actually inventory

here. And it this what you would expect

from an inventory uh dashboard. What's

low? What we can stock up on, right? But

then we also have this launch AI agent

here. And it has some pre-filled

instructions. So that's what I ran here,

right? So you see I kicked it off at

11:32, which was about 20 minutes ago.

And it took about 4ish or so minutes to

go and complete this for us.

And it tells us here uh we should

restock on peacot, peacicoat, wool,

blend, outerwear, uh a couple of other

things here. And it tells us current

stock and um 10. And I'm assuming this,

what did it tell us here? Um, a key

highlight is the peac coat wool blend,

which is completely out of stock,

indicating a need for immediate

replenishment. Other items, while not

critically low, have limited stock

levels and should also be monitored for

potential restocking to maintain optimal

inventory. Let's try let's just try

running this. I hope it

it uh runs quick, but if it doesn't, I

won't, right? But um while this is

running so in the background I do want

to show you a little bit of the code for

the stocking because this is what's

doing like bulk processing or batch

processing I should say. The key here is

we have this collection.

So the first step is to call the MCP and

get information relevant information

based on uh you know the store and all

that kind of stuff. And instead of

having it go to the MCP server, find

one, then calling an LLM, we're batching

everything into this uh collection here.

So that way we just make one call to an

LLM versus making various calls, which

is also why it takes uh a few minutes to

go and collect all the information it

needs, all the products that potentially

need to be restocked based on that

insights that I've provided it. And you

know, it kind of runs we can expand

this. Yeah, it kind of runs in the

background there. It looks like it's

kicked off already. And then if we look

at our console, we should probably see

some stuff coming in here too. Uh we'll

stick this to our API. Yeah. So things

are running here. It's going and making

the calls as well. If I look at traces

here. Yeah, we have a couple things

kicked off as well here for our MCP

servers. Uh I hope that actually it

might uh

Okay. Wait, no. Was this the one I

already did or was this the name?

Okay. I know, but it looks like the new

one. Awesome. Let's see.

So, it says here, uh, well, uh, several

autoear products are available in

sufficient qualities, which may not

require restocking at this time. And

it's selling us combat boots, work boot,

steel toe. What else? Oh, a sports

jacket, a rain uh, jacket. And this is

different than is this different than

Yeah, it's different than these, right?

So, this one looks more like just

products that need to be restocked

versus this one is more specific to the

weather insights, I'm assuming. And um I

guess the loafer slip-ons is because it

had the event of the New Year's Eve

thing. Um but yeah, anyway, so many

things that you can get done with a

variety of workflows. I will also share

their GitHub repo here at some point

because I know people will probably be

asking and I did want to leave a couple

of minutes to just answer what questions

that we had. So if we could just switch

to Q&A um that'd be awesome

would be yes people are asking for let

me let me find the uh the repo.

Give me one sec.

Unfortunately, I don't have the ability

to type in the chat.

>> Um, otherwise I'd be answering them that

way.

>> No worries.

>> Um, find it. Uh, do we have any other

>> um,

>> there's a couple questions. So, is

I'm not sure how to like bring it up on

the screen, but there's one that says,

"Is it okay to say that these are like

customized workflows that we Yeah,

exactly. customized workflows that we

build as models and leverage the VM or

EC2 instances where we will be executing

these workflows. Yeah. Um absolutely. So

you can run these um and I assume you're

talking about um you know just in

general like the blueprints or the agent

framework workflows. Um yeah, you can

run these on a VM, you can run them on

Kubernetes. Um, for the blueprints,

they're sort of like reference

architectures that are meant to be

really flexible and customizable.

So, some of them come prepackaged as

like Helm charts where you can just

install them and run them on Kubernetes,

but if you want to see the source code

and turn it into more of a monolith

where you run it on, let's say, a single

VM, um, that could absolutely work too.

Um, so there's lots of flexibility

there. Um, so yeah, absolutely.

I just uh shared the GitHub repo. It

pasted in that. Perfect. Awesome.

Uh let's see any other. So are we

triggering the events based on the

functionality broken down the functions

into small workflows? Uh we are

triggering events based broken down the

functions into small workflows. Yeah,

you could you could think of it as each

agent should be specific and then your

workflow could also be somewhat specific

but you could have various tasks, right?

So for example, for generating the

insights, it is like a specific task

like generate insights but inside of

that we have specialized agents that are

going to get the weather, going to get

the top products, going to get the um I

already forgot the oh the event popular

events, right? So it is all

functionality that

um is quite cohesive and versus if there

if you find things being completely

random, they might make more sense in a

different workflow or you might like

rethink that architecture.

>> I think we're about time. We can do one

more.

>> Okay. Sorry. I'll do one more. I'll fit

it in. Yeah. This is an alternate

solution to model tuning in Azure

Foundry or could we leverage these

solutions in Azure Foundry also? Yeah,

so good question. Um, so for the model

tuning in Azure Foundry, Microsoft

Foundry now, um, the model tuning, yeah,

it's a great way to like fine-tune your

model and sort of customize it for your

specific workflow. And with that, you

can create the endpoint to interact with

your model. And you can essentially plug

that endpoint into your application. So

let's say that back to that uh financial

you know distillation um endpoint or

sorry financial data distillation

blueprint. Um if I wanted to host all of

the models except one um locally I could

do that. And then that one model that's

fine-tuned and uh hosted in Azure uh

Microsoft Foundry um that could be

hosted there and I just interact with it

through the uh the endpoint that's

exposed from Microsoft Foundry. So that

could absolutely work. Yes, it's a bit

more complex that way. So technically

yes, but um I I would recommend trying

to keep it keep it as simple as

possible.

>> Awesome. Now that's it. But the good

news is we have other episodes people

can come join ask plenty more questions

there as well. Right, Stephen?

>> Absolutely. Yes, we have two more

sessions of this episode. So feel free

to join and um you know ask your

questions there if we couldn't get to

them today.

All right. Well, uh, thank you so much,

Stephen, and thanks everyone for hanging

out here, and I'll see you in the next

one.

>> Thank you, team.

Thank you all for joining and thanks

again to our speakers.

This session is part of a series. To

register for future shows and watch past

episodes on demand, you can follow the

link on the screen or in the chat.

We're always looking to improve our

sessions and your experience. If you

have any feedback for us, we would love

to hear what you have to say. You can

find that link on the screen or in the

chat and we'll see you at the next one.

Thank you all for joining and thanks

again to our speakers.

This session is part of a

>> [music]

[music]

[music]

Scale and Orchestrate Multi-Agent Systems Effortlessly (EMEA)

Microsoft Reactor

76 days ago

1:03:25

Multi-Agent Collaboration

Rank #1

Description

Explore how to leverage multi-agent systems in your applications to optimize operations, automate recommendations, and enhance customer experience. The solution utilizes Microsoft Agent Framework, OpenAI ChatKit and NVIDIA Nemotron model on Azure AI Foundry to seamlessly connect with store databases, integrate human oversight, and deploy scalable chat agents. This approach enables real-time analytics, predictive insights, and personalized interactions, resulting in improved decision-making, operational efficiency, and a superior user experience for application developers and users 📌 This episode is a part of a series. Learn more: https://aka.ms/AIAgentsApps/y-MSFT [eventID:26558]

Watch on YouTube

Video Details

Category

Multi-Agent Collaboration

Featured Date

December 17, 2025

Quality Rank

#1

AI Recommended