Loading video player...
Hello everyone and thank you for joining
our live session today. My name is
Lorissa. I'm an event planner at Reactor
joining you from Brazil. Before we
start, I have some quick housekeeping.
Please take a moment to read our code of
conduct. We seek to provide a respectful
environment for both our audience and
presenters. While we absolutely
encourage engagement in the chat, we ask
that you please be mindful of your
commentary, remain professional and on
topic.
The session is being recorded and will
be available to view on demand here on
the Reactor YouTube channel and that's
within 48 hours.
Keep an eye on the chat. We'll be
dropping helpful links and checking for
questions for our presenters to answer.
I'll now turn it over to our speakers
for today. Thank you and let's welcome
Stephen
All right. Hello. Thank you very much,
Lissa, and thank you everyone for
joining. Um, really have lots of great
stuff to go over today. Um, so we'll
just jump right in. Um, my name is
Stephen McCulla and I'm an AI solutions
architect with Nvidia. Um, so I get to
work closely with Microsoft on
implementing all of the latest and
greatest AI technology into Azure. and
I'm here today uh with Gwen.
>> Hey, thank you uh Stephen for the intro
and for the invite to to be here and
talk about agents and such. My name is
Gwen. I'm on our Python advocacy team
here at Microsoft. So we uh get to work
alongside awesome partners like Nvidia
and uh you know teaching a bunch of
stuff uh improving how uh people can
deploy their workloads onto our platform
and yeah a bunch of other things and
yeah excited to to dive right in.
>> Yeah, lots of great stuff we're going to
go over. Um so a bit about the program
before we jump in. Um,
if you're new to this program, we cover
lots of amazing technologies, um,
focusing on AI agents and integrating
them into applications. Um, so this is
this AI apps and agents dev day series
is really part of a uh larger
partnership between NVIDIA and Microsoft
to help users like you become more
acquainted and more comfortable of all
of the with all of the amazing tools and
technologies that are coming out um from
both Nvidia and from Microsoft. So, we
want to show you how you can best
leverage those um to create all of these
amazing AI agents and automation and
technology yourself. Um so, with that
being said, let's go see how we can
scale and orchestrate agents.
So, today we'll be covering a couple
different major areas. Uh first we want
to jump into um a recap of how exactly
what AI agents are and how we use them
and how we build them into agentic
workflows. Um then Gwen is going to show
us how we can use Microsoft agent
framework to orchestrate our agents and
have them working in tandem to
accomplish really complex, really
interesting tasks. Um, I'll show you how
Nvidia has what's called AI blueprints,
which help you get started and provide a
reference architecture and framework for
you to build your own um, agentic
workflows for whatever you're looking to
accomplish. Then we'll go into a couple
of hands-on demos showing NVIDIA's AI
model distillation blueprint, which I
think is amazing. Um then we'll go into
uh showing how we can integrate agentic
workflows into userfacing applications
as well as back-end batch processing use
cases. So really going to be touching on
a lot of cool stuff today. Um so the
first thing let's go into is a quick
recap. What are AI agents? um how do
they differ from large language models
uh like chat GBT or Neotron or DeepSeek
um and what is really the difference you
know between these LLMs and and agents
so I imagine if you're attending this
webinar you're at least a little
familiar with uh large language models
um and agents are sort of a
abstraction on top of that so if you
have an LLM let's say you know Neotron
um or DeepSeek um that has reasoning and
tool calling capabilities and then you
add on sort of this agentic software
like Microsoft agent framework or
Nvidia's Nemo agent toolkit you can use
this LLM as an AI agent so what that
unlocks is long-term um and short-term
memory capabilities to where your agent
can remember past conversations. It also
unlocks tool calling capabilities. So
you can expose different tools to your
agent. And whenever you hear the word
tool, just think of a chunk of code um
that executes some particular task. So
that can be something as simple as you
know turning the lights off or on in
your smartome or something really
complex like um creating a entire
website and deploying it onto um AKS.
So these tools um really open up an
entire world of possibilities where your
your imagination is really the ceiling
here. I mean these um with with tools
like I mentioned it's sort of like a
chunk of code. So whatever you can code
you can turn into a tool and expose it
to your agent. Um so the world is is
your oyster there. Um, so whenever you
have these LLMs wrapped as agents, um,
you can achieve lots of amazing things
with the tool calling and the reasoning.
Um and
the real only limitation is that you
might have just a single model that
you're using as an agent. So this is
where we start to look at agentic
workflows. Um, if I have an agent that
can reason and tool call and do
research, um, you know, why why do I
need an agentic workflow? Why do I need
to string these multiple agents
together? Um, why can't I just have a
single, very large, very intelligent um,
LLM that I'm using as an agent that can
do everything, right? And the the key
word here, the key answer is
specialization.
Um there isn't a single AI model out
there that is the best at everything. We
haven't reached AGI or super
intelligence yet. Um and so there what
there isn't really a model that can do
everything the best. Um, some AI models
like Neotron parse, u, they're
specialized for document analysis. Um,
some AI models like GPT5 are more
general chat agents that can do chat and
code and image generation. Um, and some
models are great for audio translation
and processing. U, some other models are
great for language translation. So the
entire AI ecosystem is is very
specialized on accomplishing particular
tasks. Um and if you were to try and use
one model to do all of these different
tasks, um you're you're probably not
going to do as well as you could if you
have an agentic workflow. Um so this
really this workflow allows you to get
the best of all worlds. You can have the
best document processing agent. You can
have the best chat agent. You can have
the best translation agent and have them
all working together.
So that's a major reason that we see a
lot of people implement these agentic
workflows. So for example, let's let's
think about how what an agentic workflow
might look like in the real world.
>> [clears throat]
>> Um so let's let's think about um how a a
doctor might use an agentic workflow
during a medical diagnosis. We can
imagine a a doctor let's say opening an
agentic app on their phone that listens
to the diagnosis between the patient and
the doctor and helps to create um or
helps to guide the doctor towards
particular diagnosis and to create
appropriate action plans for the
patients.
So in this case you would need this
agentic tool to transcribe the voice of
the doctor and patient into text. Um use
that text to reason and to research and
to uh find possible diagnoses according
to the symptoms that the patient is
listing. um you would need a model that
can uh parse the documents that show the
a that show the patients past medical
history. Um and you would need this
agent to also uh create appropriate
plans for the patients. So there are
lots of different skills that are
focused on uh or that are used in this
you know hypothetical workflow and it's
very unlikely that you would have a
model a single model a single agent that
could do all of these things very well.
So in that case that's where you would
want to bring in an agentic workflow.
You could have a model that's really
great at that audioto text processing.
You could have a model that's fine-tuned
for medical research. Um, you could have
a model that's uh has a rag database of
all of the uh patients medical history.
So combining all of those capabilities
together um makes your agentic workflow
much more powerful than just running a
single model or a single agent. So
obviously this unlocks so much
capability and so much creativity
um whenever you use these agentic
workflows. So Gwen is actually going to
show us how we can build them um using
Microsoft agent framework. So I'll hand
it over to Gwen.
>> Awesome. Thank you, Stephen. Yeah, let's
let's talk a little bit about
uh agent framework. Uh but first, why
would we want to build agents in code?
In comparison to other ways of building
agents, there are options that are like
a drag and drop type thing or more of a
UI and you outline a flow and things
like that which are awesome tools.
However, when we build agents in code,
we do have full control in
customization. So we're not limited to
templates or
you know predefined connectors or things
like that that these other types of
tools would have. Plus we can leverage
our existing experience with programming
languages
and uh tools that come alongside that uh
to build these uh this new technology.
Right. Also local experimentation. We
can leverage local models and uh run
everything locally to build our Asians
and we can test and we can experiment
locally before the need to deploy
anything to a cloud or any service.
Right? And then portability, we can
leverage a language, leverage a
framework, deploy it onto one cloud,
then maybe we want to test some other
offering, we can move it over to another
uh service. So we we get a lot of
benefits of building agents in code and
and for this we uh also have sort of
different ways of building and deploying
and habbering your agents in the cloud.
Uh we have similar to the abstractions
that we see when we think of cloud
infrastructure we have uh infrastructure
as a service platform as a service and
software as a service. depending on how
much you want to control, customize, but
also manage your infrastructure, you'
find the correct solution for you there.
If you want something, you know, SAS,
you can use co-pilot studio, build your
agents, something in the middle, you can
use uh foundry with the agent service
there. And if you want to be in full
control of your infra there, you can use
the is solution there. leverage uh some
kind of framework deployed onto a
container and such on on the cloud,
right? So various options there and
again it depends on how much visibility
control into the underlying info that
you want and you got to figure out what
that balance is for for you.
Now putting all this together, we do
have Microsoft agent framework which uh
we are calling the open source engine
for building and orchestrating
intelligent AI agents
open standards and interoperability.
We have a pipeline for research open
source so it's communitydriven and
extensible by design which is very
important. It's also going uh through
lots of changes. It is in public preview
at the moment. So do keep that in mind.
But we highly encourage you to go get
hands-on, experiment, build a couple
things and you know give us feedback,
open up an issue and uh whatever that
that might be. But uh or an excellent uh
tool for you to try. Now when we talk
about multi- aents, we
have a couple of options when it comes
to orchestration.
You can think of these as like
workflows, right? Your most
basic, and if you're just starting out
building a multi- aent system, I would
recommend you try out the sequential
one, which is this one right here.
It is as straightforward as the name is.
You just have one agent work after the
other and then after the other until the
task is completed, right?
If you have
work that is
or can be done independent of each
other. So for example, you have three
agents in this case that don't
necessarily need to run at at waiting
for each other's results or anything
like that. You can use concurrent.
That will save you some processing time
there.
if you need to be able to
kick off some type of task with an agent
and then depending on the work that
needs to be done, it can be handed off
to another one. So for example, this can
be like a customer support. The first
agent can be the triage and then it
understands the query that the customer
has given and it sends it off to the tax
support agent or to the
refunds agent or to whatever is the
appropriate agent there. That would be
the handoff
workflow here. Now [snorts]
um a little more complex here. I
recommend go and experiment with those
three. A little bit more complex. Here
is we have the group chat workflow. This
option is think of this like a writer's
room. You have a writer that pitches an
idea and then you have a bunch of other
people in the room kind of giving you
feedback and there's a back and forth
there. That would be the the group chat
option there. So you have to think of
like iterations until you get the task
done. Magentic is sort of like a
souped-up version of this. The goal of
this uh agent that we have here in the
middle is to not only plan but also keep
a prog or a status let's say or
documentation on the progress what which
agents are working which are not changes
that need to be done [clears throat]
and it actually starts by planning
everything ahead of time grabbing the
task turning it into subtasks and
outlining okay this agent is going to do
this, this agent is going to do that. If
there's a stall, if an agent's not
working, okay, it needs to go and work
on its on its plan. So, it'll plan again
and then, you know, it goal is to be
robust and flexible. So, if that's
something you need for your system, look
into that one. And then the workflow
process, you can think of each one of
these little squares as one of the other
options, right? So, maybe you have a uh
sequential workflow in here. we're
actually just wrapping that workflow as
an agent. So it's kind of like agent. So
it's an agent that's actually workflow
with a bunch of other agents, right? And
you can create a bunch of uh
applications with that there. Uh but do
just try out the first couple of ones
and then as you uh get a better
understanding, you can uh take a look at
the other ones. Uh on top of the
different workflow options we have of
course have tools and extensibility
because agents without access to tools
are not that useful. Uh most likely
you're familiar with MCP the model
contact protocol which gives us access
to many many things right uh there's
also access to agent to agent open API
MongoDB there's a bunch of options there
for you a lot of these are out of the
box so that means you don't have to
spend a crazy amount of time figuring
out how to get these to work there's a
lot of that uh that could be set up for
you uh with you know simple integrations
so you don't have to start from scratch.
That's the most important part, like
reusing things that already exist. And
it's cool because you can declaratively
define agents in YAML and then you can
specify which tools require human
approval, which is pretty neat there. Uh
yeah, check those out. There a bunch of
options in terms of the tools and
extensibility. And the other key part,
so it's not just outlining the
workflows, the tools and extensions that
your agents can use, but it's also
memory. very very important. So in this
case here we have an example of a sort
of travel
uh travel website or app or something
like that helps me plan my travel.
Right? So I have a user here that asks I
need to book a hotel in New York for two
stays and agent goes and it uses the
trip advisor API to search the nearest
hotel and then returns a message with
that information back to our user.
Right? And inside of this same
conversation, so we have here this uh
limit at the outside,
this is a thread. We're then asking
another question, what's the daily meal
allowance for business trip? The agent
goes and it leverages its uh integration
with SharePoint to query the company
travel policy, creates a message and
returns them. Right? So not only do we
have sort of a shorter term memory
within the thread here, right? uh here
and this this obviously makes the
conversation that I'm actively having so
you can think of this as like any sort
of chat interface that you have you
create a conversation it it would be
terrible if message after message there
was no shared context between them right
so that's important but we also have a
sort of more longer term right in this
memory so this will keep track of things
across multiple conversations for the
user and that allows you to connect
context across various
conversations which also improves the
user experience and then of course the
results that your your agents can have.
Right? So we have something called agent
thread which is an abstraction that
retains conversation history across
turns and sessions and the goal is to
just ensure that the agents have context
that they need for these long running
dialogues again so it doesn't feel like
I'm starting from new as a user every
single time. So the short-term memory is
going to uh be session scoped. So how we
solve a thread array and this is
valuable for immediate context. And then
long-term memory for that you'll require
some kind of database. This is not
something that you want to store in like
a CSV or a JSON file unless it's maybe
like a demo but even then probably not.
All right. uh and then you would have
integrations with vector databases for
similarity search and things like that.
Now
the other big and important thing here
is being able to understand what your
agents are doing
and where what works well, what doesn't
work well, which prompts are being sent,
what could be changed, what could be
improved and things like that. And for
this we have well a couple of options,
right? Uh we have integration with uh
via open telemetry standard which is
pretty important when you think of
deploying to one cloud like you set up
your your telemetry one way and then you
want to deploy it to a different cloud
and perhaps the UI from where you see
your telemetry is a little different but
you can expect the information to be
there in the format that you expect and
it's it's quite important to to leverage
these these open standards there. On top
of that, we have integration with uh
Microsoft Entra ID for policy
enforcement uh for uh enterprise uh
scenarios, you know, identity and things
like that are quite important. Uh
content filters are also available and
yeah, a bunch of other guard rails uh
that come thanks to being able to plug
into the Microsoft ecosystem.
All right, before I move over to
Stephen, I do want to show just a couple
examples of what these workflows could
look like. So, I have here a just a
pretty straightforward example. This is
a sequential
uh workflow. So, you see we have this
example. This is a restocking workflow.
It says it I'm going to just zoom in
right here. Right. So, we have a
restocking workflow and this is a
framework. This is a tool called dev UI
which allows us to vis visually see the
workflows. And then we have a stock
agent that then sends its work to
prioritization agent and then moves over
to the summarized agent. Very
straightforward sequence here. Now in
this same project and we'll see more of
this towards the end of the session. We
have a couple of other workflows. We
have this uh we'll take a look at this
weekly insights workflow. This is a
concurrent. So in this case, we have one
agent that kicks everything off by
collecting some data. And then we have
three agents that run at the same time.
We have a weather analyzer. We have an
advanced analyzer. And then we have a
top selling product analyzer because
they don't depend on each other. They
can run at the same time. So we are
fanning out onto all of these. And then
we're collecting everything fanning back
in into an insight synthesizer agent.
Right? So, this can be useful for
generating insights for a store based on
the weather for the next seven days. I
might need to stock up on rain coats if
it's raining or uh there's a big parade
and the events analyzer picked up. Okay,
we might need to stock up on of I don't
know like um some type of fan
merchandise or sports team merchandise
or something like that, right? Uh so
things like that. I also have this other
example here and this is an example of
the group chat uh workflow. Here I have
a translator and the goal of this
translator is to run only
well give us an output only if the
translation that it's given us is at
least 99% accuracy. All right. So you
can see here that we'll have a
translator that kicks everything off.
Then we have a reviewer agent. If a
reviewer agent decides that the accuracy
is not above 99%.
It'll kick it off to the editor to write
and prove that translation. Then it goes
back to the reviewer to see if that fits
the standard. And at some point we'll
get a final output here. Uh the ones
that in green are the actual ones that
actually ran here. I I actually am kind
of struggling to find an example that
this one cannot get. These models are
getting so great at uh translating
things. I have this example here. Trying
with like idioms here. Uh the results
are nothing to sneeze at. Still, let's
not jump the gun. Yeah, I'm trying to
get something that you would most likely
not say. Translate word by word to
different language. But yeah, these
models are getting really good at at
translating in like in one shot, which
is which is awesome, right? It makes all
this technology more accessible, makes
information and content more accessible,
which I think is awesome. But here the
other cool thing about WI which I can
show you here is as agents are working
you can see the completed ones with the
green. You can see the currently running
ones with the purple here. I'll zoom in
just one more here. There we go. And oh
in this case it looked like it didn't
get that right. So it went from the
reviewer to the editor. We can see it's
moving here. Re-reviewer is running. So
checking if it got its accuracy. If it
had hit the first time above 99, it
would have gone straight to our final
output here. Uh but it didn't. So let's
see. Yes. So we see here the first
review got 94% needs improvement because
it is less than 99. And then uh it gives
us the original one and then the current
one. Uh yeah, I mean I do happen to
speak Spanish. by manually reviewing
these and I do think this last one is uh
yeah way better. All right, awesome. So
those are a couple of examples of how we
can sort of look at these different
types of workflows. I find it very
helpful to be able to see these things
run in a UI because we can see the code
as well too. I'll show you this. Uh
we'll take a look at this main here and
I I'll make sure to zoom here. So the
the
meat and potatoes here is we scroll down
and we have our workflow here, right? So
this is the code that that does that.
And I will probably zoom in once more
there. And then uh because we start we
have to kick it off. Translator goes and
does its work. And then each one of
these
are called edges. So this is an edge
here. This is an edge here. Right? And
then we're just defining the the edges,
right? So in this case, we have to have
a switch case because it depends on if
we're getting that 99% quality or not.
So case is high quality. All right,
awesome. We can send it to the final
output agent. If it's not, then what do
we got to do? Well, we got to re-review,
right? And the goal is to get that high
quality output here. And just to show
the sort of how each one of these agents
are defined, we'll go into we'll go into
the translator here.
And we see here we have an agent that is
of type OpenAI. We're using OpenAI here.
And then we are just Python code, right?
Create here translate. What's your goal?
We have a couple of instructions here.
If you've worked with models before,
you're familiar with system prompts and
those types of instructions. And then we
just call the agent to go do do its
work. And then we can pass information
from agent to agent to depending on the
context and things that we want to uh
showcase there. Uh but yeah, bunch of
cool things out there to to get your
hands on and build some things with with
Microsoft agent framework. and we'll
make sure to drop a link to to the
documentation so you can check that out.
But yeah, back to back to you, Stephen.
Let's talk about Nvidia AI blueprints.
>> Yeah, let's do it. Um, and Gwen, there
was there were a couple questions in the
chat. Um,
>> could you highlight the relationship
between Microsoft agent framework and
semantic kernel?
>> Yep. So, prior we had semantic kernel
and we also had autogen. Uh so moving
forward we have sort of taken the the
best of both of those worlds and united
as agent framework. So if you're
building now something new we encourage
you to you know leverage agent framework
versus the other options there and agent
framework is available for C and Python
now. So uh I know like semantic kernel
was big in the C# world outen being more
popular in the Python world. Uh, so
there's a little bit of everything for
anyone.
>> Awesome. Thank you. Um, and I saw there
was another question. I'll take this
one. Um, some LLMs are better at some
jobs. Do you have a reference on what
LLM to use for different jobs? And that
question is a perfect segue into this
next part about NVIDIA AI blueprints.
Um, so Gwen showed us how we can build
these agentic workflows and orchestrate
these agents together using agent
framework. Um, and it is obviously
something that is super powerful and
allows you to be really creative. Um,
but there's still the question of like
how do we get started, right? How do
what kind of models should we use for
different use cases? um and what kind of
agentic workflows can we build with
those models?
So, NVIDIA AI blueprints answers those
questions. Um these blueprints are
essentially reference workflows that you
can create for all sorts of different
applications.
And all of these blueprints are
open-source and available on um
build.envidia.com.
So, if you go to build.envidia.com
and click on blueprints up here, you can
see all of the blueprints available. So,
we have blueprints for AI model
distillation, um, which is one of my my
uh favorite topics. U, we have
blueprints for 3D object generation or
data streaming for rag or AI
observability.
So these blueprints are all out there
and created for users like you to
reference and to understand how you can
build these workflows and apply them in
the real world. So let's say I go back
to the AI model distillation blueprint
and I can go view it on GitHub to see
the uh the actual code behind it. And in
this case, it's running us through a
Jupyter notebook um which helps us
understand all of the different code
segments that go into this. So in this
case um it walks us through how we can
prepare the data uh used for the model
distillation. Um it allows us to input
the
uh different models that we want to use.
So for the teacher model we would use
the Neotron super 49B.
For the student model believe we uh
defined that a bit further.
Um the student model in this case is
about a 1 billion parameter model um
that is you know obviously much smaller
much more lightweight so you can run it
with less hardware.
Um and the uh but of course it's a bit
less intelligent. Um so you let's see
here we go. Um
yeah so the the the smaller model is a
bit less intelligent. Um so it's usually
not something you would want to use for
really um indepth use cases. However,
with this model distillation, you're
sort of increasing the intelligence of
this larger model for a very particular
use case. And in this example, it's uh
financial data. So, we're feeding this
model data about the uh stock market and
kind of news you would see on Financial
Times or Bloomberg. And this smaller
less intelligent model is uh becoming a
lot more uh like in-depth a lot more
intelligent about this part about uh
financial data. Um so uh this Jupyter
notebook um walks you through how to run
the entire thing and it's I went through
it myself. It is very uh easy to
understand very sort of plugandplay
um and it also whenever you go to
build.envidia.com nvidia.com. It shows
you the requirements that you would
need. So, in this case, you would need
uh two Nvidia GPUs, A100, H100, H200, or
B200. Um those would all work great.
And um once you have those, you can
start this up and get it running. And
it's it's very um you know, easy to get
going. And that's really the whole point
of these blueprints is that we want them
to be as approachable um as possible. So
um really if you have um the adequate
hardware I encourage you to check it out
and see what is available u for you.
Uh so the way these blueprints work is
that Nvidia created three foundational
blueprints which you see here. There's
one for AIQ,
um, which is sort of our deep research
blueprint.
Um, we have a blueprint for RAG, which
is probably going to be the most
applicable to most of the the people
here. Um, and just, you know,
enterprises and workflows in general.
And we also have a data flywheel
blueprint. Um, so all of these
blueprints are very unique, tailored for
obviously very different tasks. Um and
then what Nvidia has done is they take
each of these foundational blueprints
and build on top of them more um more
industry focused more tailored specific
blueprints for particular use cases. So
for example the the model distillation
blueprint that I just showed you that
falls into this data flywheel blueprint.
So they they took this foundational
blueprint and built on top of it. They
customized it. They use different data,
different models for this model or for
this financial uh use case. And we'll
see that for all of the blueprints that
you see on uh build.envidia.com.
Um it's all sort of focused for
particular use cases. And the reason we
do that is because we want everyone to
have a reference blueprint that is
either, you know, perfectly
plug-andplay. they can you can go and
use it and just get up and running and
have something working in your
environment within you know half an
hour. Um and if it's not something
that's perfectly aligned to your use
case at least it's uh close enough where
you can configure it to perfectly match
your use case. So that's why it's all
open source. uh you can go in, rearrange
the code, rewrite the code, plug in
whatever you need to uh to get these
blueprints sort of geared to what
exactly you are trying to accomplish.
Um so definitely a fantastic resource. I
encourage you to check it out,
especially if you're first getting
started with Agentic Workflows. Um this
is a great resource uh to leverage.
So this was that agentic uh or uh
agentic model distillation for financial
data. So there was a question about
which models to use for what. Um and
this diagram shows all of the different
models that we use in this workflow. Um
so like I mentioned we have the uh
larger teacher model um which in this
diagram is the 3.370B.
So 70 billion parameter model um sort of
medium size and then we have the stu the
candidate models which are like the
student models. So 8 billion, 14
billion, 1 billion and a 49 billion. So
this workflow um walks you through
training or fine-tuning all of these
different models and comparing them um
using some special benchmarks.
Um question in the chat, can I rent some
GPUs in Azure somewhere? Absolutely. So
there's a bunch of different ways you
can use GPUs in Azure. Um the easiest
way in my opinion is to uh provision a
Azure VM. So if you use a standard NC24
uh or anything in the NC series will
have GPUs in it. So NC24 is for an A100
standard. NC40 is for an H100 and then
you can sort of multiply on top of that.
You can have VMs with two H100s or two
A100s. Um, so that's that's what I would
uh check out as kind of like the easiest
way to to get started.
[clears throat]
Um, those LLMs are open source or do
they just run in Azure environment? So,
it depends on the models you're using,
right? For for if you're using any
Neotron model, it's going to be fully
open source. Um, if you're using like
DeepSseek or GPTOSS,
of course, those are open source as
well. Um, but you're not going if you
want to run, let's say, Claude or GPT5,
um, that's something that you'd have to
run through, you know, OpenAI or
Anthropics, um, endpoints. Um, so it
depends on how you want to run it. Um,
open source you have a lot more control.
Um, so that's a a great way to kind of
understand how to build these agents on
top.
Um, so yeah, Nvidia Neotron, what I was
mentioning earlier. So these are
Nvidia's family of models that cover a
ton of different use cases. So again to
that question of how do we know which
models to use for different use cases?
This sort of breaks it down for the
Neotron family. So if you want a
reasoning or agentic u model that's
where we we have uh Neotron Nano,
Neotron Super, Neotron Ultra. Um so the
Neotron 3 Nano actually just came out on
Monday. So it is like you know brand new
uh top of its family. Um really
fantastic model and super lightweight
too. It's 30 billion parameters. Um now
if you are looking for a more multimodal
vision language so it can understand
videos um uh images as well as text um
that's where you would look at a vision
language model. So the the VL is sort of
what you would look for in the name to
un to make sure you're sort of choosing
the right uh model there. And then of
course for um information retrieval this
is something you would use in rag. So
document processing where you need to
take let's say uh scans of documents and
pick out the text. Um so you can parse
that and put that let's say in a rag
database or processes it process it and
a number of different ways. That's where
you'd use information retrieval and
content safety. Of course, you know,
goes without saying. Just make sure that
your LLM is outputting um the
appropriate responses and not doing
anything harmful that we wouldn't want
our end users to do.
[clears throat]
So um whenever we so that sort of
answers the question of like which
models to use and then there's another
question of how do we run these models
right we we have the open-source
uh models like deepseek neotron etc but
you know how do we how do we run them
right and the best way to get up and
running is to use something called
nvidia nim which is nvidia inference
micros service. Um, and essentially the
NIM is a Docker container that contains
the model. Um, it contains the uh
inference engine. So like VLM or Tensor
RTLM or SGLANG that's baked into this
container image. Also, it comes uh
already baked in with lots of
observability tooling and capabilities.
Um, so you could think of it as a docker
container that has everything you need
to run a model and you just say docker
run, you know, and then then the uh
container name and it will spin up your
your LLM. Um, and it makes it really
easy to get up and running. Only thing
you would need for this is uh an NGC
account. So if you go to just ngc.com
um that will uh and then you can create
an account here and create an API key
and um it will you plug that in with
your docker run command and it gets you
up and running very very quickly. Um so
if you want to see which nims are
available uh you can go to
catalog.gcinvidia.com ngcinvidia.com
and scroll down and see Nvidia NIM. So,
we don't only create a NIM for our own
models like our own Neotron models, but
also for tons of different open-source
models. So,
if I go here, I can find I've been
picking on DeepSeek a lot today,
but I can find a NIM for DeepSseek R1.
So again, this is a Docker container has
everything you need inside it to run
Deep Seek R1. Of course, it doesn't have
the hardware you need. So, uh that sort
of comes outside of the container. Uh
but Azure again would be a great place
to secure that hardware and those GPUs.
Um Deepseek V3.1. So all of these NIMS
um makes it really easy to get up and
running, not just with Nvidia's models,
but also with um thirdparty open-source
models.
>> [clears throat]
>> All right. So, now that we know, you
know, which models to use and how we can
get up and running with reference
workflows and get started, um, Gwen is
going to show us how we can take these
workflows and these models and integrate
them into our realworld applications.
So, I'll hand it back to you Gwen.
>> Yep. Thank you, Stephen. Right. So we,
you know, saw on the slides a couple of
options of, you know, workflows we can
leverage and we saw in the dev UI like a
more visual representation of how how
they run. But let's uh let's make it
more concrete as to like why you would
even want to use something like this and
where you could actually integrate them
into your applications. So here we have
a sort of website for retail store.
We're calling it the Zava live popup
shop and we sell premium technical
apparel, right? So obviously at the
front page we have a bunch of products
and we can you know purchase things,
right? I also have the option to to log
in and I can log in as a some customer
or a manager, right? And I'll show you
first logging in as a a customer. You'll
see your your dashboard of things that
you've purchased. Obviously you can
purchase more things, right? But the
first sort of neat integration here,
we're actually leveraging OpenAI's chat
kit, which allows you to have an agent
and also create this sort of nice chat
UI experience here. And in this case, I
just asked here, what was my most
expensive item purchased? And we see
here that it tells us that we purchased
running athletic shoes at a total amount
of $647
with 92. And that is because we
purchased nine pairs. We also would have
the option to return it, but I think
this was uh in yeah, it's telling us
here this was in June, so we're well
beyond that uh return period. So six
months ago. I don't think any store
would let you do that. But you can
create these sort of experiences in code
and then you know plug them right into
your application and have a lot more
sort of things that your customers can
ask and accomplish
uh thanks to your agent without having
to uh rely on um like waiting on someone
and you can leave your your team of
customer representatives to do stuff
that really need you know human in the
loop things right now. Another option we
have is logging in as a manager. Here
you see I'm logged into the Zava
management side of things and you see
here information that would be relevant
to a manager of a store, right? So we
have top categories by revenue and we
also have these weekly insights here.
Now remember that flow I showed you
earlier which is I might still have it
open. No, do I not? Oh, this one here.
So, remember I showed you this one where
we have uh the data collector and then
we have a a concurrent workflow where we
have three individual agents
accomplishing a task and then we fan
back into the insight synthesizer. That
is actually the
uh work that we see here, right? So, we
have some weekly insights and here it
says AI generated insights based on
weather forecast, inventory data and
local events. So again in this case here
it says the next seven days expect
fluctuating temperatures with rainy day
midweek. So increase stock on winter
coats, sweaters and waterproof footwear
because they will provide warmth and
protection during cold and with
conditions. Right. So we have our
weather agent that went and got us that
information. Then we have a top selling
products which give us just a preview of
the five bestselling products in the
last 21 days. This is specific to this
store. And this also gives us an idea of
like oh what's moving more in inventory
and all that kind of stuff, right? And
then uh we have our events agent that
tells us here that several major outdoor
events including a holiday festival and
New Year's Eve celebration are expected
to drive significant foot traffic and
clothing sales in the coming week. Great
now from here we have all of these
insights and we actually can uh generate
or kick off another agent with this. But
before we sort of look into that I want
to show you another integration that we
could leverage. Here we have if we go to
our inventory I'm going to click here.
Here you would see what you would expect
in inventory management right total
items what's low in stock some standard
information there. But what we can also
do is click on our launch AI agent. And
what this will do is do a real-time
inventory analysis and it'll make sure
it's policy aware and then budget
optimization. And we can also provide
instructions that are specific for this
restocking analysis that we want. So
they has here some default text here
that says analyze inventory and
recommend restocking priorities. But if
we went back to our dashboard, then we
click on generate insight based
analysis, we can go ahead and send that
context that those agents and that
workflow created for us. So here we're
saying based on the weather conditions,
local events and current sales
performance, what items should we
restock? And then we just send over the
the weather forecasts and all the other
information that we got there. And then
we can launch that analysis. Now I did
this already before and just to kind of
save us a little bit of time here and we
can see here that we kicked off I will
show you our right our stocking
restocking workflow. This was just a
sequential workflow right one after the
other here. So the stock agent the
prioritization agent and the summarizer
agent. And if we go back there we can
also see that that work was kicked off
here. stock agent, prioritization agent,
and then summarizer agent. And if we
scroll down here, we have a bit of an
activity log there just to make sure we
understand what's working and uh how
things are going. And this returns us uh
restocking recommendations. Tells us 15
items need restocking. Select items to
reorder. And it gives us a list of this.
It also should explain to us why it gave
us this. So,
uh let's see. Notably, the peacicoat
wool blend is currently out of stock and
requires replenishment. Other items with
low stock levels include the pullover
fleece hoodie and several accessories.
And it gives us, you know, some more
information there. I think I ran this
with the
just default. So, let's So, we'll see
here. We have the the peac coat. We have
some warm
uh warm clothing, I would say. We have
like a belt there, some shoes, right?
So, I'm going to try to kick this off by
using
that uh those insights that we got for
the weekly insights. So, if I click
launch AI analysis Oh, come on. Of
course, it doesn't want to work right
now. Is our uh Let me take a look at
what's going on here.
>> It's the curse of the live demo.
>> It is the curse of the live demo, but I
think it might be go here.
No, it's not. It didn't want to work.
Wait, let me
No. Is it running still? Let me take a
look here.
We should be running.
The good thing is I did run it before,
which is how we got that uh restocking
before.
Interesting. I wonder what what exactly
happened. Perhaps uh some token or
something expired. But anyway, you get
the gist. You provide those additional
instructions there. So, uh, this I I'm
pretty sure I ran this with just like
the generic one that we saw at the
beginning and then it gave us this list
here. But what it does is it'll say
like, oh, it's, you know, the weather
context. Oh, it's raining. And the next,
let's stock up on rain coats or things
like that, right? And, uh, the I guess
last thing I want to show you is
observability.
We have here a dashboard. This is
Aspire. And this is fantastic because
not only does it allow us to see all the
services, we have agent dev here. Uh we
have our API running, our finance MCP,
we have a front-end application in our
supplier MCP.
And we also get pretty rich data for all
of our logging, right? So it should come
in here. And I think this might have uh
kicked off. Oh, there we go. So our
our restocking agent it does things in
batches. So it'll go and it'll call our
uh finance MCP. So it does a lot of lot
of calls. So that's why our sort of
console here is seeing a lot of
information. Uh but we also get sort of
structured logs. This is great if you
want to see the different like levels.
So here we have information
uh level in case you get any errors.
It's really easy to see errors in here
as well. And then we also have traces
which are really valuable when we're
working with well in general when we
need telemetry. But when we are working
with calling different LLMs and things
like that, we can see what gets called
which prompt is sent here. For example,
our finance MCP is calling our uh SQL
light database here and getting
information. Uh, and this is all thanks
to having those integrations with open
telemetry and being able to send things
off that way as well. And I think I I
could show you just uh let me see
a little code here. If we look at our
insights MCP
or insights.py, I will show you an
example of let's look for our top
product. This is
top product. Okay. Here. So the cool
thing here and we had spoken about like
tools extensibility and things like
that. Our we see here we have our top
selling our top selling product analyzer
right and this is an agent here. And
here what we're doing is having it call
the finance MCP to use as its tools.
Right? We're not telling specifically,
oh, you have to use this tool. But MCPs
are awesome because they work in ways
where it's like, here's a list of all
the things I have. You pick what you
want. And if you are detailed enough
with your instructions and the context
you are providing to your agent, it will
most likely be able to go and pick the
correct tool. Right? So, if we take a
look at our actual MCP server, I'm going
to click into MCP and I'll click into
our finance server because that is what
we're sending to this one or making
available.
We see that we have a bunch of tools
here and you can see that because we
have this decorator MCP tool here and
this is the get top selling products
tool and in here you have the code that
you would expect, right? We're doing
some some queries here returning some
data depending on the uh parameters that
we are providing right so you create
your MCP servers you leverage those
extensions and things that you can use
provide them as tools to your agent
and then you can make these agents much
more capable much more smarter and uh
you know accomplish a lot more work
there and we have about five minutes
we'd love to answer any questions if we
have uh before we end things. Uh but
yeah, once again, get hands- on with
agent framework, play around with
building these multi- aent uh
applications and uh yeah, it's quite
it's quite an exciting time to be uh to
be a developer.
>> Oh, absolutely. Um and I actually have a
QR code on my screen to take you to a
Microsoft blog which will walk you
through getting up and running to create
your first agent with Microsoft agent
framework. So really great way to get up
and running on your feet um and get your
hands dirty with agent framework. So
I'll leave it up here for a couple
seconds um while we maybe answer a
couple questions and then I have a QR
code for the next episode.
>> Awesome. Did we we might have you were
doing a great job at keeping up with
these uh these questions. So I think we
got through got through them all. There
was one about the slides being available
after the presentation. Do you know if
that's the case?
>> I
believe so. I'm not 100%
sure, but if anything, wherever they
you registered for the series, you
should have been sent an email with uh
resources. And if and I know that we can
also there's there will be like a
followup so we can make that available
to people like a link to that in there
as well. Um or just you know
subscribe to the YouTube channel any
updates we'll uh we'll we'll share as
well.
>> Yeah.
>> And then definitely
join us for the next episode. Um this
will be going a lot deeper into one of
our blueprints. Um this is the AIQ
blueprint for deep research. So really
great way to get um a fully deep
research rag system up and running very
quickly. It even has its own Helm chart
so you can deploy it into your
Kubernetes environment and it just it
takes maybe 20 minutes, half an hour to
get fully up and running. Um, so it's
there's a lot of great stuff coming in
this episode and this is coming up in
January. So make sure you tune in for
that. And um, yeah. Anything else from
your side, Gwen?
>> No, January is a great time to, you
know, kick off the year learning
something new. So definitely tune in for
for that episode. Uh yeah, really
appreciate the the people commenting now
on uh great insights. Thanks for your
time. Thank you all for for being here
and yeah, just make sure to catch the
rest of the episodes and dive deeper and
learn all this cutting cutting edge
stuff. It's quite it's quite cool that
it's available via like APIs or like
coding like you don't have to
necessarily rent out or purchase
expensive physical hardware. So, the uh
yeah, like like you said, Steve, the
world is is everyone's oyster with with
all these things. But hey, it's been
great. Uh thank you for the invite and
uh yeah, I hope everyone has a great
rest of their day.
>> Yeah, thank you everyone. Take care.
>> Thank you. Thank you. Thank you Gwen for
the session today and thank you all for
joining us. Uh we are always looking to
improve our sessions and our experience
here at the reactor. So, if you have any
feedback for us, we we would love to
hear what you have to say. You can find
the link to our survey on the screen or
in the chat. And we'll see you on the
next one.
Explore how to leverage multi-agent systems in your applications to optimize operations, automate recommendations, and enhance customer experience. The solution utilizes Microsoft Agent Framework, OpenAI ChatKit and NVIDIA Nemotron model on Microsoft Foundry to seamlessly connect with store databases, integrate human oversight, and deploy scalable chat agents. This approach enables real-time analytics, predictive insights, and personalized interactions, resulting in improved decision-making, operational efficiency, and a superior user experience for both application developers and users. š This episode is a part of a series. Learn more: https://aka.ms/AIAgentsApps/y-MSFT #microsoftreactor #learnconnectbuild [eventID:26559]