Scale and Orchestrate Multi-Agent Systems Effortlessly (EMEA) | DailyDevLists

Loading video player...

Full Transcript

9,357 words • EN

Hello everyone and thank you for joining

our live session today. My name is

Lorissa. I'm an event planner at Reactor

joining you from Brazil. Before we

start, I have some quick housekeeping.

Please take a moment to read our code of

conduct. We seek to provide a respectful

environment for both our audience and

presenters. While we absolutely

encourage engagement in the chat, we ask

that you please be mindful of your

commentary, remain professional and on

topic.

The session is being recorded and will

be available to view on demand here on

the Reactor YouTube channel and that's

within 48 hours.

Keep an eye on the chat. We'll be

dropping helpful links and checking for

questions for our presenters to answer.

I'll now turn it over to our speakers

for today. Thank you and let's welcome

Stephen

All right. Hello. Thank you very much,

Lissa, and thank you everyone for

joining. Um, really have lots of great

stuff to go over today. Um, so we'll

just jump right in. Um, my name is

Stephen McCulla and I'm an AI solutions

architect with Nvidia. Um, so I get to

work closely with Microsoft on

implementing all of the latest and

greatest AI technology into Azure. and

I'm here today uh with Gwen.

>> Hey, thank you uh Stephen for the intro

and for the invite to to be here and

talk about agents and such. My name is

Gwen. I'm on our Python advocacy team

here at Microsoft. So we uh get to work

alongside awesome partners like Nvidia

and uh you know teaching a bunch of

stuff uh improving how uh people can

deploy their workloads onto our platform

and yeah a bunch of other things and

yeah excited to to dive right in.

>> Yeah, lots of great stuff we're going to

go over. Um so a bit about the program

before we jump in. Um,

if you're new to this program, we cover

lots of amazing technologies, um,

focusing on AI agents and integrating

them into applications. Um, so this is

this AI apps and agents dev day series

is really part of a uh larger

partnership between NVIDIA and Microsoft

to help users like you become more

acquainted and more comfortable of all

of the with all of the amazing tools and

technologies that are coming out um from

both Nvidia and from Microsoft. So, we

want to show you how you can best

leverage those um to create all of these

amazing AI agents and automation and

technology yourself. Um so, with that

being said, let's go see how we can

scale and orchestrate agents.

So, today we'll be covering a couple

different major areas. Uh first we want

to jump into um a recap of how exactly

what AI agents are and how we use them

and how we build them into agentic

workflows. Um then Gwen is going to show

us how we can use Microsoft agent

framework to orchestrate our agents and

have them working in tandem to

accomplish really complex, really

interesting tasks. Um, I'll show you how

Nvidia has what's called AI blueprints,

which help you get started and provide a

reference architecture and framework for

you to build your own um, agentic

workflows for whatever you're looking to

accomplish. Then we'll go into a couple

of hands-on demos showing NVIDIA's AI

model distillation blueprint, which I

think is amazing. Um then we'll go into

uh showing how we can integrate agentic

workflows into userfacing applications

as well as back-end batch processing use

cases. So really going to be touching on

a lot of cool stuff today. Um so the

first thing let's go into is a quick

recap. What are AI agents? um how do

they differ from large language models

uh like chat GBT or Neotron or DeepSeek

um and what is really the difference you

know between these LLMs and and agents

so I imagine if you're attending this

webinar you're at least a little

familiar with uh large language models

um and agents are sort of a

abstraction on top of that so if you

have an LLM let's say you know Neotron

um or DeepSeek um that has reasoning and

tool calling capabilities and then you

add on sort of this agentic software

like Microsoft agent framework or

Nvidia's Nemo agent toolkit you can use

this LLM as an AI agent so what that

unlocks is long-term um and short-term

memory capabilities to where your agent

can remember past conversations. It also

unlocks tool calling capabilities. So

you can expose different tools to your

agent. And whenever you hear the word

tool, just think of a chunk of code um

that executes some particular task. So

that can be something as simple as you

know turning the lights off or on in

your smartome or something really

complex like um creating a entire

website and deploying it onto um AKS.

So these tools um really open up an

entire world of possibilities where your

your imagination is really the ceiling

here. I mean these um with with tools

like I mentioned it's sort of like a

chunk of code. So whatever you can code

you can turn into a tool and expose it

to your agent. Um so the world is is

your oyster there. Um, so whenever you

have these LLMs wrapped as agents, um,

you can achieve lots of amazing things

with the tool calling and the reasoning.

Um and

the real only limitation is that you

might have just a single model that

you're using as an agent. So this is

where we start to look at agentic

workflows. Um, if I have an agent that

can reason and tool call and do

research, um, you know, why why do I

need an agentic workflow? Why do I need

to string these multiple agents

together? Um, why can't I just have a

single, very large, very intelligent um,

LLM that I'm using as an agent that can

do everything, right? And the the key

word here, the key answer is

specialization.

Um there isn't a single AI model out

there that is the best at everything. We

haven't reached AGI or super

intelligence yet. Um and so there what

there isn't really a model that can do

everything the best. Um, some AI models

like Neotron parse, u, they're

specialized for document analysis. Um,

some AI models like GPT5 are more

general chat agents that can do chat and

code and image generation. Um, and some

models are great for audio translation

and processing. U, some other models are

great for language translation. So the

entire AI ecosystem is is very

specialized on accomplishing particular

tasks. Um and if you were to try and use

one model to do all of these different

tasks, um you're you're probably not

going to do as well as you could if you

have an agentic workflow. Um so this

really this workflow allows you to get

the best of all worlds. You can have the

best document processing agent. You can

have the best chat agent. You can have

the best translation agent and have them

all working together.

So that's a major reason that we see a

lot of people implement these agentic

workflows. So for example, let's let's

think about how what an agentic workflow

might look like in the real world.

>> [clears throat]

>> Um so let's let's think about um how a a

doctor might use an agentic workflow

during a medical diagnosis. We can

imagine a a doctor let's say opening an

agentic app on their phone that listens

to the diagnosis between the patient and

the doctor and helps to create um or

helps to guide the doctor towards

particular diagnosis and to create

appropriate action plans for the

patients.

So in this case you would need this

agentic tool to transcribe the voice of

the doctor and patient into text. Um use

that text to reason and to research and

to uh find possible diagnoses according

to the symptoms that the patient is

listing. um you would need a model that

can uh parse the documents that show the

a that show the patients past medical

history. Um and you would need this

agent to also uh create appropriate

plans for the patients. So there are

lots of different skills that are

focused on uh or that are used in this

you know hypothetical workflow and it's

very unlikely that you would have a

model a single model a single agent that

could do all of these things very well.

So in that case that's where you would

want to bring in an agentic workflow.

You could have a model that's really

great at that audioto text processing.

You could have a model that's fine-tuned

for medical research. Um, you could have

a model that's uh has a rag database of

all of the uh patients medical history.

So combining all of those capabilities

together um makes your agentic workflow

much more powerful than just running a

single model or a single agent. So

obviously this unlocks so much

capability and so much creativity

um whenever you use these agentic

workflows. So Gwen is actually going to

show us how we can build them um using

Microsoft agent framework. So I'll hand

it over to Gwen.

>> Awesome. Thank you, Stephen. Yeah, let's

let's talk a little bit about

uh agent framework. Uh but first, why

would we want to build agents in code?

In comparison to other ways of building

agents, there are options that are like

a drag and drop type thing or more of a

UI and you outline a flow and things

like that which are awesome tools.

However, when we build agents in code,

we do have full control in

customization. So we're not limited to

templates or

you know predefined connectors or things

like that that these other types of

tools would have. Plus we can leverage

our existing experience with programming

languages

and uh tools that come alongside that uh

to build these uh this new technology.

Right. Also local experimentation. We

can leverage local models and uh run

everything locally to build our Asians

and we can test and we can experiment

locally before the need to deploy

anything to a cloud or any service.

Right? And then portability, we can

leverage a language, leverage a

framework, deploy it onto one cloud,

then maybe we want to test some other

offering, we can move it over to another

uh service. So we we get a lot of

benefits of building agents in code and

and for this we uh also have sort of

different ways of building and deploying

and habbering your agents in the cloud.

Uh we have similar to the abstractions

that we see when we think of cloud

infrastructure we have uh infrastructure

as a service platform as a service and

software as a service. depending on how

much you want to control, customize, but

also manage your infrastructure, you'

find the correct solution for you there.

If you want something, you know, SAS,

you can use co-pilot studio, build your

agents, something in the middle, you can

use uh foundry with the agent service

there. And if you want to be in full

control of your infra there, you can use

the is solution there. leverage uh some

kind of framework deployed onto a

container and such on on the cloud,

right? So various options there and

again it depends on how much visibility

control into the underlying info that

you want and you got to figure out what

that balance is for for you.

Now putting all this together, we do

have Microsoft agent framework which uh

we are calling the open source engine

for building and orchestrating

intelligent AI agents

open standards and interoperability.

We have a pipeline for research open

source so it's communitydriven and

extensible by design which is very

important. It's also going uh through

lots of changes. It is in public preview

at the moment. So do keep that in mind.

But we highly encourage you to go get

hands-on, experiment, build a couple

things and you know give us feedback,

open up an issue and uh whatever that

that might be. But uh or an excellent uh

tool for you to try. Now when we talk

about multi- aents, we

have a couple of options when it comes

to orchestration.

You can think of these as like

workflows, right? Your most

basic, and if you're just starting out

building a multi- aent system, I would

recommend you try out the sequential

one, which is this one right here.

It is as straightforward as the name is.

You just have one agent work after the

other and then after the other until the

task is completed, right?

If you have

work that is

or can be done independent of each

other. So for example, you have three

agents in this case that don't

necessarily need to run at at waiting

for each other's results or anything

like that. You can use concurrent.

That will save you some processing time

there.

if you need to be able to

kick off some type of task with an agent

and then depending on the work that

needs to be done, it can be handed off

to another one. So for example, this can

be like a customer support. The first

agent can be the triage and then it

understands the query that the customer

has given and it sends it off to the tax

support agent or to the

refunds agent or to whatever is the

appropriate agent there. That would be

the handoff

workflow here. Now [snorts]

um a little more complex here. I

recommend go and experiment with those

three. A little bit more complex. Here

is we have the group chat workflow. This

option is think of this like a writer's

room. You have a writer that pitches an

idea and then you have a bunch of other

people in the room kind of giving you

feedback and there's a back and forth

there. That would be the the group chat

option there. So you have to think of

like iterations until you get the task

done. Magentic is sort of like a

souped-up version of this. The goal of

this uh agent that we have here in the

middle is to not only plan but also keep

a prog or a status let's say or

documentation on the progress what which

agents are working which are not changes

that need to be done [clears throat]

and it actually starts by planning

everything ahead of time grabbing the

task turning it into subtasks and

outlining okay this agent is going to do

this, this agent is going to do that. If

there's a stall, if an agent's not

working, okay, it needs to go and work

on its on its plan. So, it'll plan again

and then, you know, it goal is to be

robust and flexible. So, if that's

something you need for your system, look

into that one. And then the workflow

process, you can think of each one of

these little squares as one of the other

options, right? So, maybe you have a uh

sequential workflow in here. we're

actually just wrapping that workflow as

an agent. So it's kind of like agent. So

it's an agent that's actually workflow

with a bunch of other agents, right? And

you can create a bunch of uh

applications with that there. Uh but do

just try out the first couple of ones

and then as you uh get a better

understanding, you can uh take a look at

the other ones. Uh on top of the

different workflow options we have of

course have tools and extensibility

because agents without access to tools

are not that useful. Uh most likely

you're familiar with MCP the model

contact protocol which gives us access

to many many things right uh there's

also access to agent to agent open API

MongoDB there's a bunch of options there

for you a lot of these are out of the

box so that means you don't have to

spend a crazy amount of time figuring

out how to get these to work there's a

lot of that uh that could be set up for

you uh with you know simple integrations

so you don't have to start from scratch.

That's the most important part, like

reusing things that already exist. And

it's cool because you can declaratively

define agents in YAML and then you can

specify which tools require human

approval, which is pretty neat there. Uh

yeah, check those out. There a bunch of

options in terms of the tools and

extensibility. And the other key part,

so it's not just outlining the

workflows, the tools and extensions that

your agents can use, but it's also

memory. very very important. So in this

case here we have an example of a sort

of travel

uh travel website or app or something

like that helps me plan my travel.

Right? So I have a user here that asks I

need to book a hotel in New York for two

stays and agent goes and it uses the

trip advisor API to search the nearest

hotel and then returns a message with

that information back to our user.

Right? And inside of this same

conversation, so we have here this uh

limit at the outside,

this is a thread. We're then asking

another question, what's the daily meal

allowance for business trip? The agent

goes and it leverages its uh integration

with SharePoint to query the company

travel policy, creates a message and

returns them. Right? So not only do we

have sort of a shorter term memory

within the thread here, right? uh here

and this this obviously makes the

conversation that I'm actively having so

you can think of this as like any sort

of chat interface that you have you

create a conversation it it would be

terrible if message after message there

was no shared context between them right

so that's important but we also have a

sort of more longer term right in this

memory so this will keep track of things

across multiple conversations for the

user and that allows you to connect

context across various

conversations which also improves the

user experience and then of course the

results that your your agents can have.

Right? So we have something called agent

thread which is an abstraction that

retains conversation history across

turns and sessions and the goal is to

just ensure that the agents have context

that they need for these long running

dialogues again so it doesn't feel like

I'm starting from new as a user every

single time. So the short-term memory is

going to uh be session scoped. So how we

solve a thread array and this is

valuable for immediate context. And then

long-term memory for that you'll require

some kind of database. This is not

something that you want to store in like

a CSV or a JSON file unless it's maybe

like a demo but even then probably not.

All right. uh and then you would have

integrations with vector databases for

similarity search and things like that.

Now

the other big and important thing here

is being able to understand what your

agents are doing

and where what works well, what doesn't

work well, which prompts are being sent,

what could be changed, what could be

improved and things like that. And for

this we have well a couple of options,

right? Uh we have integration with uh

via open telemetry standard which is

pretty important when you think of

deploying to one cloud like you set up

your your telemetry one way and then you

want to deploy it to a different cloud

and perhaps the UI from where you see

your telemetry is a little different but

you can expect the information to be

there in the format that you expect and

it's it's quite important to to leverage

these these open standards there. On top

of that, we have integration with uh

Microsoft Entra ID for policy

enforcement uh for uh enterprise uh

scenarios, you know, identity and things

like that are quite important. Uh

content filters are also available and

yeah, a bunch of other guard rails uh

that come thanks to being able to plug

into the Microsoft ecosystem.

All right, before I move over to

Stephen, I do want to show just a couple

examples of what these workflows could

look like. So, I have here a just a

pretty straightforward example. This is

a sequential

uh workflow. So, you see we have this

example. This is a restocking workflow.

It says it I'm going to just zoom in

right here. Right. So, we have a

restocking workflow and this is a

framework. This is a tool called dev UI

which allows us to vis visually see the

workflows. And then we have a stock

agent that then sends its work to

prioritization agent and then moves over

to the summarized agent. Very

straightforward sequence here. Now in

this same project and we'll see more of

this towards the end of the session. We

have a couple of other workflows. We

have this uh we'll take a look at this

weekly insights workflow. This is a

concurrent. So in this case, we have one

agent that kicks everything off by

collecting some data. And then we have

three agents that run at the same time.

We have a weather analyzer. We have an

advanced analyzer. And then we have a

top selling product analyzer because

they don't depend on each other. They

can run at the same time. So we are

fanning out onto all of these. And then

we're collecting everything fanning back

in into an insight synthesizer agent.

Right? So, this can be useful for

generating insights for a store based on

the weather for the next seven days. I

might need to stock up on rain coats if

it's raining or uh there's a big parade

and the events analyzer picked up. Okay,

we might need to stock up on of I don't

know like um some type of fan

merchandise or sports team merchandise

or something like that, right? Uh so

things like that. I also have this other

example here and this is an example of

the group chat uh workflow. Here I have

a translator and the goal of this

translator is to run only

well give us an output only if the

translation that it's given us is at

least 99% accuracy. All right. So you

can see here that we'll have a

translator that kicks everything off.

Then we have a reviewer agent. If a

reviewer agent decides that the accuracy

is not above 99%.

It'll kick it off to the editor to write

and prove that translation. Then it goes

back to the reviewer to see if that fits

the standard. And at some point we'll

get a final output here. Uh the ones

that in green are the actual ones that

actually ran here. I I actually am kind

of struggling to find an example that

this one cannot get. These models are

getting so great at uh translating

things. I have this example here. Trying

with like idioms here. Uh the results

are nothing to sneeze at. Still, let's

not jump the gun. Yeah, I'm trying to

get something that you would most likely

not say. Translate word by word to

different language. But yeah, these

models are getting really good at at

translating in like in one shot, which

is which is awesome, right? It makes all

this technology more accessible, makes

information and content more accessible,

which I think is awesome. But here the

other cool thing about WI which I can

show you here is as agents are working

you can see the completed ones with the

green. You can see the currently running

ones with the purple here. I'll zoom in

just one more here. There we go. And oh

in this case it looked like it didn't

get that right. So it went from the

reviewer to the editor. We can see it's

moving here. Re-reviewer is running. So

checking if it got its accuracy. If it

had hit the first time above 99, it

would have gone straight to our final

output here. Uh but it didn't. So let's

see. Yes. So we see here the first

review got 94% needs improvement because

it is less than 99. And then uh it gives

us the original one and then the current

one. Uh yeah, I mean I do happen to

speak Spanish. by manually reviewing

these and I do think this last one is uh

yeah way better. All right, awesome. So

those are a couple of examples of how we

can sort of look at these different

types of workflows. I find it very

helpful to be able to see these things

run in a UI because we can see the code

as well too. I'll show you this. Uh

we'll take a look at this main here and

I I'll make sure to zoom here. So the

the

meat and potatoes here is we scroll down

and we have our workflow here, right? So

this is the code that that does that.

And I will probably zoom in once more

there. And then uh because we start we

have to kick it off. Translator goes and

does its work. And then each one of

these

are called edges. So this is an edge

here. This is an edge here. Right? And

then we're just defining the the edges,

right? So in this case, we have to have

a switch case because it depends on if

we're getting that 99% quality or not.

So case is high quality. All right,

awesome. We can send it to the final

output agent. If it's not, then what do

we got to do? Well, we got to re-review,

right? And the goal is to get that high

quality output here. And just to show

the sort of how each one of these agents

are defined, we'll go into we'll go into

the translator here.

And we see here we have an agent that is

of type OpenAI. We're using OpenAI here.

And then we are just Python code, right?

Create here translate. What's your goal?

We have a couple of instructions here.

If you've worked with models before,

you're familiar with system prompts and

those types of instructions. And then we

just call the agent to go do do its

work. And then we can pass information

from agent to agent to depending on the

context and things that we want to uh

showcase there. Uh but yeah, bunch of

cool things out there to to get your

hands on and build some things with with

Microsoft agent framework. and we'll

make sure to drop a link to to the

documentation so you can check that out.

But yeah, back to back to you, Stephen.

Let's talk about Nvidia AI blueprints.

>> Yeah, let's do it. Um, and Gwen, there

was there were a couple questions in the

chat. Um,

>> could you highlight the relationship

between Microsoft agent framework and

semantic kernel?

>> Yep. So, prior we had semantic kernel

and we also had autogen. Uh so moving

forward we have sort of taken the the

best of both of those worlds and united

as agent framework. So if you're

building now something new we encourage

you to you know leverage agent framework

versus the other options there and agent

framework is available for C and Python

now. So uh I know like semantic kernel

was big in the C# world outen being more

popular in the Python world. Uh, so

there's a little bit of everything for

anyone.

>> Awesome. Thank you. Um, and I saw there

was another question. I'll take this

one. Um, some LLMs are better at some

jobs. Do you have a reference on what

LLM to use for different jobs? And that

question is a perfect segue into this

next part about NVIDIA AI blueprints.

Um, so Gwen showed us how we can build

these agentic workflows and orchestrate

these agents together using agent

framework. Um, and it is obviously

something that is super powerful and

allows you to be really creative. Um,

but there's still the question of like

how do we get started, right? How do

what kind of models should we use for

different use cases? um and what kind of

agentic workflows can we build with

those models?

So, NVIDIA AI blueprints answers those

questions. Um these blueprints are

essentially reference workflows that you

can create for all sorts of different

applications.

And all of these blueprints are

open-source and available on um

build.envidia.com.

So, if you go to build.envidia.com

and click on blueprints up here, you can

see all of the blueprints available. So,

we have blueprints for AI model

distillation, um, which is one of my my

uh favorite topics. U, we have

blueprints for 3D object generation or

data streaming for rag or AI

observability.

So these blueprints are all out there

and created for users like you to

reference and to understand how you can

build these workflows and apply them in

the real world. So let's say I go back

to the AI model distillation blueprint

and I can go view it on GitHub to see

the uh the actual code behind it. And in

this case, it's running us through a

Jupyter notebook um which helps us

understand all of the different code

segments that go into this. So in this

case um it walks us through how we can

prepare the data uh used for the model

distillation. Um it allows us to input

the

uh different models that we want to use.

So for the teacher model we would use

the Neotron super 49B.

For the student model believe we uh

defined that a bit further.

Um the student model in this case is

about a 1 billion parameter model um

that is you know obviously much smaller

much more lightweight so you can run it

with less hardware.

Um and the uh but of course it's a bit

less intelligent. Um so you let's see

here we go. Um

yeah so the the the smaller model is a

bit less intelligent. Um so it's usually

not something you would want to use for

really um indepth use cases. However,

with this model distillation, you're

sort of increasing the intelligence of

this larger model for a very particular

use case. And in this example, it's uh

financial data. So, we're feeding this

model data about the uh stock market and

kind of news you would see on Financial

Times or Bloomberg. And this smaller

less intelligent model is uh becoming a

lot more uh like in-depth a lot more

intelligent about this part about uh

financial data. Um so uh this Jupyter

notebook um walks you through how to run

the entire thing and it's I went through

it myself. It is very uh easy to

understand very sort of plugandplay

um and it also whenever you go to

build.envidia.com nvidia.com. It shows

you the requirements that you would

need. So, in this case, you would need

uh two Nvidia GPUs, A100, H100, H200, or

B200. Um those would all work great.

And um once you have those, you can

start this up and get it running. And

it's it's very um you know, easy to get

going. And that's really the whole point

of these blueprints is that we want them

to be as approachable um as possible. So

um really if you have um the adequate

hardware I encourage you to check it out

and see what is available u for you.

Uh so the way these blueprints work is

that Nvidia created three foundational

blueprints which you see here. There's

one for AIQ,

um, which is sort of our deep research

blueprint.

Um, we have a blueprint for RAG, which

is probably going to be the most

applicable to most of the the people

here. Um, and just, you know,

enterprises and workflows in general.

And we also have a data flywheel

blueprint. Um, so all of these

blueprints are very unique, tailored for

obviously very different tasks. Um and

then what Nvidia has done is they take

each of these foundational blueprints

and build on top of them more um more

industry focused more tailored specific

blueprints for particular use cases. So

for example the the model distillation

blueprint that I just showed you that

falls into this data flywheel blueprint.

So they they took this foundational

blueprint and built on top of it. They

customized it. They use different data,

different models for this model or for

this financial uh use case. And we'll

see that for all of the blueprints that

you see on uh build.envidia.com.

Um it's all sort of focused for

particular use cases. And the reason we

do that is because we want everyone to

have a reference blueprint that is

either, you know, perfectly

plug-andplay. they can you can go and

use it and just get up and running and

have something working in your

environment within you know half an

hour. Um and if it's not something

that's perfectly aligned to your use

case at least it's uh close enough where

you can configure it to perfectly match

your use case. So that's why it's all

open source. uh you can go in, rearrange

the code, rewrite the code, plug in

whatever you need to uh to get these

blueprints sort of geared to what

exactly you are trying to accomplish.

Um so definitely a fantastic resource. I

encourage you to check it out,

especially if you're first getting

started with Agentic Workflows. Um this

is a great resource uh to leverage.

So this was that agentic uh or uh

agentic model distillation for financial

data. So there was a question about

which models to use for what. Um and

this diagram shows all of the different

models that we use in this workflow. Um

so like I mentioned we have the uh

larger teacher model um which in this

diagram is the 3.370B.

So 70 billion parameter model um sort of

medium size and then we have the stu the

candidate models which are like the

student models. So 8 billion, 14

billion, 1 billion and a 49 billion. So

this workflow um walks you through

training or fine-tuning all of these

different models and comparing them um

using some special benchmarks.

Um question in the chat, can I rent some

GPUs in Azure somewhere? Absolutely. So

there's a bunch of different ways you

can use GPUs in Azure. Um the easiest

way in my opinion is to uh provision a

Azure VM. So if you use a standard NC24

uh or anything in the NC series will

have GPUs in it. So NC24 is for an A100

standard. NC40 is for an H100 and then

you can sort of multiply on top of that.

You can have VMs with two H100s or two

A100s. Um, so that's that's what I would

uh check out as kind of like the easiest

way to to get started.

[clears throat]

Um, those LLMs are open source or do

they just run in Azure environment? So,

it depends on the models you're using,

right? For for if you're using any

Neotron model, it's going to be fully

open source. Um, if you're using like

DeepSseek or GPTOSS,

of course, those are open source as

well. Um, but you're not going if you

want to run, let's say, Claude or GPT5,

um, that's something that you'd have to

run through, you know, OpenAI or

Anthropics, um, endpoints. Um, so it

depends on how you want to run it. Um,

open source you have a lot more control.

Um, so that's a a great way to kind of

understand how to build these agents on

top.

Um, so yeah, Nvidia Neotron, what I was

mentioning earlier. So these are

Nvidia's family of models that cover a

ton of different use cases. So again to

that question of how do we know which

models to use for different use cases?

This sort of breaks it down for the

Neotron family. So if you want a

reasoning or agentic u model that's

where we we have uh Neotron Nano,

Neotron Super, Neotron Ultra. Um so the

Neotron 3 Nano actually just came out on

Monday. So it is like you know brand new

uh top of its family. Um really

fantastic model and super lightweight

too. It's 30 billion parameters. Um now

if you are looking for a more multimodal

vision language so it can understand

videos um uh images as well as text um

that's where you would look at a vision

language model. So the the VL is sort of

what you would look for in the name to

un to make sure you're sort of choosing

the right uh model there. And then of

course for um information retrieval this

is something you would use in rag. So

document processing where you need to

take let's say uh scans of documents and

pick out the text. Um so you can parse

that and put that let's say in a rag

database or processes it process it and

a number of different ways. That's where

you'd use information retrieval and

content safety. Of course, you know,

goes without saying. Just make sure that

your LLM is outputting um the

appropriate responses and not doing

anything harmful that we wouldn't want

our end users to do.

[clears throat]

So um whenever we so that sort of

answers the question of like which

models to use and then there's another

question of how do we run these models

right we we have the open-source

uh models like deepseek neotron etc but

you know how do we how do we run them

right and the best way to get up and

running is to use something called

nvidia nim which is nvidia inference

micros service. Um, and essentially the

NIM is a Docker container that contains

the model. Um, it contains the uh

inference engine. So like VLM or Tensor

RTLM or SGLANG that's baked into this

container image. Also, it comes uh

already baked in with lots of

observability tooling and capabilities.

Um, so you could think of it as a docker

container that has everything you need

to run a model and you just say docker

run, you know, and then then the uh

container name and it will spin up your

your LLM. Um, and it makes it really

easy to get up and running. Only thing

you would need for this is uh an NGC

account. So if you go to just ngc.com

um that will uh and then you can create

an account here and create an API key

and um it will you plug that in with

your docker run command and it gets you

up and running very very quickly. Um so

if you want to see which nims are

available uh you can go to

catalog.gcinvidia.com ngcinvidia.com

and scroll down and see Nvidia NIM. So,

we don't only create a NIM for our own

models like our own Neotron models, but

also for tons of different open-source

models. So,

if I go here, I can find I've been

picking on DeepSeek a lot today,

but I can find a NIM for DeepSseek R1.

So again, this is a Docker container has

everything you need inside it to run

Deep Seek R1. Of course, it doesn't have

the hardware you need. So, uh that sort

of comes outside of the container. Uh

but Azure again would be a great place

to secure that hardware and those GPUs.

Um Deepseek V3.1. So all of these NIMS

um makes it really easy to get up and

running, not just with Nvidia's models,

but also with um thirdparty open-source

models.

>> [clears throat]

>> All right. So, now that we know, you

know, which models to use and how we can

get up and running with reference

workflows and get started, um, Gwen is

going to show us how we can take these

workflows and these models and integrate

them into our realworld applications.

So, I'll hand it back to you Gwen.

>> Yep. Thank you, Stephen. Right. So we,

you know, saw on the slides a couple of

options of, you know, workflows we can

leverage and we saw in the dev UI like a

more visual representation of how how

they run. But let's uh let's make it

more concrete as to like why you would

even want to use something like this and

where you could actually integrate them

into your applications. So here we have

a sort of website for retail store.

We're calling it the Zava live popup

shop and we sell premium technical

apparel, right? So obviously at the

front page we have a bunch of products

and we can you know purchase things,

right? I also have the option to to log

in and I can log in as a some customer

or a manager, right? And I'll show you

first logging in as a a customer. You'll

see your your dashboard of things that

you've purchased. Obviously you can

purchase more things, right? But the

first sort of neat integration here,

we're actually leveraging OpenAI's chat

kit, which allows you to have an agent

and also create this sort of nice chat

UI experience here. And in this case, I

just asked here, what was my most

expensive item purchased? And we see

here that it tells us that we purchased

running athletic shoes at a total amount

of $647

with 92. And that is because we

purchased nine pairs. We also would have

the option to return it, but I think

this was uh in yeah, it's telling us

here this was in June, so we're well

beyond that uh return period. So six

months ago. I don't think any store

would let you do that. But you can

create these sort of experiences in code

and then you know plug them right into

your application and have a lot more

sort of things that your customers can

ask and accomplish

uh thanks to your agent without having

to uh rely on um like waiting on someone

and you can leave your your team of

customer representatives to do stuff

that really need you know human in the

loop things right now. Another option we

have is logging in as a manager. Here

you see I'm logged into the Zava

management side of things and you see

here information that would be relevant

to a manager of a store, right? So we

have top categories by revenue and we

also have these weekly insights here.

Now remember that flow I showed you

earlier which is I might still have it

open. No, do I not? Oh, this one here.

So, remember I showed you this one where

we have uh the data collector and then

we have a a concurrent workflow where we

have three individual agents

accomplishing a task and then we fan

back into the insight synthesizer. That

is actually the

uh work that we see here, right? So, we

have some weekly insights and here it

says AI generated insights based on

weather forecast, inventory data and

local events. So again in this case here

it says the next seven days expect

fluctuating temperatures with rainy day

midweek. So increase stock on winter

coats, sweaters and waterproof footwear

because they will provide warmth and

protection during cold and with

conditions. Right. So we have our

weather agent that went and got us that

information. Then we have a top selling

products which give us just a preview of

the five bestselling products in the

last 21 days. This is specific to this

store. And this also gives us an idea of

like oh what's moving more in inventory

and all that kind of stuff, right? And

then uh we have our events agent that

tells us here that several major outdoor

events including a holiday festival and

New Year's Eve celebration are expected

to drive significant foot traffic and

clothing sales in the coming week. Great

now from here we have all of these

insights and we actually can uh generate

or kick off another agent with this. But

before we sort of look into that I want

to show you another integration that we

could leverage. Here we have if we go to

our inventory I'm going to click here.

Here you would see what you would expect

in inventory management right total

items what's low in stock some standard

information there. But what we can also

do is click on our launch AI agent. And

what this will do is do a real-time

inventory analysis and it'll make sure

it's policy aware and then budget

optimization. And we can also provide

instructions that are specific for this

restocking analysis that we want. So

they has here some default text here

that says analyze inventory and

recommend restocking priorities. But if

we went back to our dashboard, then we

click on generate insight based

analysis, we can go ahead and send that

context that those agents and that

workflow created for us. So here we're

saying based on the weather conditions,

local events and current sales

performance, what items should we

restock? And then we just send over the

the weather forecasts and all the other

information that we got there. And then

we can launch that analysis. Now I did

this already before and just to kind of

save us a little bit of time here and we

can see here that we kicked off I will

show you our right our stocking

restocking workflow. This was just a

sequential workflow right one after the

other here. So the stock agent the

prioritization agent and the summarizer

agent. And if we go back there we can

also see that that work was kicked off

here. stock agent, prioritization agent,

and then summarizer agent. And if we

scroll down here, we have a bit of an

activity log there just to make sure we

understand what's working and uh how

things are going. And this returns us uh

restocking recommendations. Tells us 15

items need restocking. Select items to

reorder. And it gives us a list of this.

It also should explain to us why it gave

us this. So,

uh let's see. Notably, the peacicoat

wool blend is currently out of stock and

requires replenishment. Other items with

low stock levels include the pullover

fleece hoodie and several accessories.

And it gives us, you know, some more

information there. I think I ran this

with the

just default. So, let's So, we'll see

here. We have the the peac coat. We have

some warm

uh warm clothing, I would say. We have

like a belt there, some shoes, right?

So, I'm going to try to kick this off by

using

that uh those insights that we got for

the weekly insights. So, if I click

launch AI analysis Oh, come on. Of

course, it doesn't want to work right

now. Is our uh Let me take a look at

what's going on here.

>> It's the curse of the live demo.

>> It is the curse of the live demo, but I

think it might be go here.

No, it's not. It didn't want to work.

Wait, let me

No. Is it running still? Let me take a

look here.

We should be running.

The good thing is I did run it before,

which is how we got that uh restocking

before.

Interesting. I wonder what what exactly

happened. Perhaps uh some token or

something expired. But anyway, you get

the gist. You provide those additional

instructions there. So, uh, this I I'm

pretty sure I ran this with just like

the generic one that we saw at the

beginning and then it gave us this list

here. But what it does is it'll say

like, oh, it's, you know, the weather

context. Oh, it's raining. And the next,

let's stock up on rain coats or things

like that, right? And, uh, the I guess

last thing I want to show you is

observability.

We have here a dashboard. This is

Aspire. And this is fantastic because

not only does it allow us to see all the

services, we have agent dev here. Uh we

have our API running, our finance MCP,

we have a front-end application in our

supplier MCP.

And we also get pretty rich data for all

of our logging, right? So it should come

in here. And I think this might have uh

kicked off. Oh, there we go. So our

our restocking agent it does things in

batches. So it'll go and it'll call our

uh finance MCP. So it does a lot of lot

of calls. So that's why our sort of

console here is seeing a lot of

information. Uh but we also get sort of

structured logs. This is great if you

want to see the different like levels.

So here we have information

uh level in case you get any errors.

It's really easy to see errors in here

as well. And then we also have traces

which are really valuable when we're

working with well in general when we

need telemetry. But when we are working

with calling different LLMs and things

like that, we can see what gets called

which prompt is sent here. For example,

our finance MCP is calling our uh SQL

light database here and getting

information. Uh, and this is all thanks

to having those integrations with open

telemetry and being able to send things

off that way as well. And I think I I

could show you just uh let me see

a little code here. If we look at our

insights MCP

or insights.py, I will show you an

example of let's look for our top

product. This is

top product. Okay. Here. So the cool

thing here and we had spoken about like

tools extensibility and things like

that. Our we see here we have our top

selling our top selling product analyzer

right and this is an agent here. And

here what we're doing is having it call

the finance MCP to use as its tools.

Right? We're not telling specifically,

oh, you have to use this tool. But MCPs

are awesome because they work in ways

where it's like, here's a list of all

the things I have. You pick what you

want. And if you are detailed enough

with your instructions and the context

you are providing to your agent, it will

most likely be able to go and pick the

correct tool. Right? So, if we take a

look at our actual MCP server, I'm going

to click into MCP and I'll click into

our finance server because that is what

we're sending to this one or making

available.

We see that we have a bunch of tools

here and you can see that because we

have this decorator MCP tool here and

this is the get top selling products

tool and in here you have the code that

you would expect, right? We're doing

some some queries here returning some

data depending on the uh parameters that

we are providing right so you create

your MCP servers you leverage those

extensions and things that you can use

provide them as tools to your agent

and then you can make these agents much

more capable much more smarter and uh

you know accomplish a lot more work

there and we have about five minutes

we'd love to answer any questions if we

have uh before we end things. Uh but

yeah, once again, get hands- on with

agent framework, play around with

building these multi- aent uh

applications and uh yeah, it's quite

it's quite an exciting time to be uh to

be a developer.

>> Oh, absolutely. Um and I actually have a

QR code on my screen to take you to a

Microsoft blog which will walk you

through getting up and running to create

your first agent with Microsoft agent

framework. So really great way to get up

and running on your feet um and get your

hands dirty with agent framework. So

I'll leave it up here for a couple

seconds um while we maybe answer a

couple questions and then I have a QR

code for the next episode.

>> Awesome. Did we we might have you were

doing a great job at keeping up with

these uh these questions. So I think we

got through got through them all. There

was one about the slides being available

after the presentation. Do you know if

that's the case?

>> I

believe so. I'm not 100%

sure, but if anything, wherever they

you registered for the series, you

should have been sent an email with uh

resources. And if and I know that we can

also there's there will be like a

followup so we can make that available

to people like a link to that in there

as well. Um or just you know

subscribe to the YouTube channel any

updates we'll uh we'll we'll share as

well.

>> Yeah.

>> And then definitely

join us for the next episode. Um this

will be going a lot deeper into one of

our blueprints. Um this is the AIQ

blueprint for deep research. So really

great way to get um a fully deep

research rag system up and running very

quickly. It even has its own Helm chart

so you can deploy it into your

Kubernetes environment and it just it

takes maybe 20 minutes, half an hour to

get fully up and running. Um, so it's

there's a lot of great stuff coming in

this episode and this is coming up in

January. So make sure you tune in for

that. And um, yeah. Anything else from

your side, Gwen?

>> No, January is a great time to, you

know, kick off the year learning

something new. So definitely tune in for

for that episode. Uh yeah, really

appreciate the the people commenting now

on uh great insights. Thanks for your

time. Thank you all for for being here

and yeah, just make sure to catch the

rest of the episodes and dive deeper and

learn all this cutting cutting edge

stuff. It's quite it's quite cool that

it's available via like APIs or like

coding like you don't have to

necessarily rent out or purchase

expensive physical hardware. So, the uh

yeah, like like you said, Steve, the

world is is everyone's oyster with with

all these things. But hey, it's been

great. Uh thank you for the invite and

uh yeah, I hope everyone has a great

rest of their day.

>> Yeah, thank you everyone. Take care.

>> Thank you. Thank you. Thank you Gwen for

the session today and thank you all for

joining us. Uh we are always looking to

improve our sessions and our experience

here at the reactor. So, if you have any

feedback for us, we we would love to

hear what you have to say. You can find

the link to our survey on the screen or

in the chat. And we'll see you on the

next one.

Scale and Orchestrate Multi-Agent Systems Effortlessly (EMEA)

Microsoft Reactor

74 days ago

59:58

Ai Whitelist

AI Whitelist

Rank #1

Description

Explore how to leverage multi-agent systems in your applications to optimize operations, automate recommendations, and enhance customer experience. The solution utilizes Microsoft Agent Framework, OpenAI ChatKit and NVIDIA Nemotron model on Microsoft Foundry to seamlessly connect with store databases, integrate human oversight, and deploy scalable chat agents. This approach enables real-time analytics, predictive insights, and personalized interactions, resulting in improved decision-making, operational efficiency, and a superior user experience for both application developers and users. 📌 This episode is a part of a series. Learn more: https://aka.ms/AIAgentsApps/y-MSFT #microsoftreactor #learnconnectbuild [eventID:26559]

Video Details

Category

Feed

AI Whitelist

Featured Date

December 18, 2025

Quality Rank

#1

AI Recommended