Loading video player...
Hey everyone, thanks for joining us for
the December session of our AI apps and
agents dev day series. My name is Anna.
I'll be your producer for this session.
I'm an event planner for Reactor joining
you from Renman, Washington.
Before we start, I do have some quick
housekeeping.
Please take a moment to read our code of
conduct.
We seek to provide a respectful
environment for both our audience and
presenters. While we absolutely
encourage engagement in the chat, we ask
that you please be mindful of your
commentary, remain professional and on
topic. Keep an eye on that chat. We'll
be dropping helpful links and checking
for questions for our presenters to
answer live.
Our session is being recorded. It will
be available to view on demand right
here on the Reactor channel.
With that, I'd love to turn it over to
our presenters for today. Thank you so
much for joining.
>> Thank you Anna. I really appreciate it.
Um and thank you everyone for joining.
Uh this is the third episode of our AI
apps and agents dev day series and we
are super excited to show you lots of
great stuff about AI scaling and agents
and orchestration.
Um now my name is Stephen McCulla and
I'm an AI solutions architect with
Nvidia. So, I work very closely with
Microsoft on integrating all of the
latest technology between Microsoft and
Nvidia directly into Azure. And I'm
joined here today uh with Gwyn.
>> Hey, Stephen. And uh hey everyone else,
welcome to this series. My first time
here. Thanks so much for the for the
invite. Stephen has been awesome working
together. My name is Gwen. I'm on our
Python advocacy team and yeah, excited
to show you a bunch of cool agent agent
stuff.
>> Yes. Awesome. Lots of great agent stuff
to show today. Um really just no
shortage of great things going on in
this world. Um so a bit about the
program before we jump in. Um the
partnership between Microsoft and Nvidia
is a very deep one, very long lasting
and really this um webinar series is a
big part of this partnership. We want to
show developers and users like you how
you can really leverage the latest and
greatest technology coming out of this
partnership. So, make sure that you tune
tune in to um the rest of this series so
you can learn how you can best leverage
all of this great technology. Now, that
being said, let's go ahead and jump on
in. So, today it's all going to be about
scaling and orchestrating your AI
agents. So, we're going to start off
with a quick touch uh touch point about
AI agents, what they are, how they work,
um how you can create your AI agents.
Then we're going to dive into Microsoft
Agent Framework. And this is really um a
great um tool that helps you orchestrate
and build multi- aent frameworks that
can scale and can orchestrate across
hundreds of agents and can help you
achieve very complex tasks um completely
automatically.
Then we'll dive into NVIDIA AI
blueprints which are you can think of
them like u recipes for complex uh
multi- aent workflows. Um I'll show an
example of uh one of our AI blueprints.
Then we'll go into a couple demos
showing how you can integrate your
agentic workflows into your application
and how you can use it um for batch
processing for asynchronous workflows.
So, lots of great stuff to jump into
today. I'm just going to go right into
AI agents. So, um quick review. Um what
exactly is an AI agent and how does it
differ from an LLM? Um well, an agent
has a couple different capabilities um
on top of sort of the um you know basic
LLM.
Um the first is that it has reasoning
and tool calling capabilities.
So um not every LLM can be an agent but
if it does have that reasoning and tool
calling capability then it can be used
as an agent. Um on top of that we also
introduce long-term and short-term
memory capabilities. Um, so if you're
familiar with something uh with tools
like lang chain or Microsoft agent
framework um that help introduce that
memory aspect into these LLMs um that's
a big part of what makes an LLM an
agent.
So whenever you introduce these
capabilities to your LLM and you sort of
encapsulate it into an agent um you open
up a huge world of possibilities because
not only does it become more intelligent
with this reasoning capability but it
can also interact with the outside
world. So example um if I want to create
an agent that is my content creator for
let's say LinkedIn um I can give it an
API tool to where it can create some
content on LinkedIn maybe a paragraph or
an announcement and um it can
automatically call this tool to post
onto my LinkedIn page for example. So
that's just one sort of possibility that
this agent can do. But once you
introduce it to um tools that you create
with your own code, the the world, you
know, is is your oyster in that case.
You can really do um just about anything
you put your mind to with your agents.
So these are very powerful um tools that
you can use for pretty much any complex
uh task. So
whenever we think about agentic
workflows, you know, how how does that
work? Why do we even need an agentic
workflow if our agent can already
reason? It's already intelligent and it
can already call tools and interact with
the outside world. You know, why do we
need to introduce um these workflows
into into our uh systems and into our
company? Um well the the key word here
is specialization.
Um there isn't a there isn't a single AI
model out there that is the best at
everything. Um some AI models like
Neotron parse are specialized for
document analysis. Some models like you
know G GPT5 are um you know powerful for
coding and and chat. And some models are
great for audio processing. Um, now if
you get into fine-tuning models, then
you can get even more specialized. You
could take your model that's great at at
chat and document analysis and fine-tune
it for, let's say, the uh, you know,
financial sector or fine-tune it for
health care. Um, so you can get even
more specialized models. Um, and it's
really that uh the idea that multiple
specialized models will beat one
generalized intelligent model that
introduces the need for agentic
workflows. Um so for example let's think
about an AI system where you know you
might you like a doctor might bring a uh
a device into a diagnosis room uh where
they're meeting with the patient and
that device might listen to the whole um
conversation going on between the doctor
and patient and in the background this I
AI agent workflow
um is number one transcribing the audio
into text. Um it number two is
researching everything it's learning. Um
and it's uh creating a uh sort of
diagnosis suggestion for the doctor
based off of what the patient is saying.
And number three, it's creating the
appropriate plans uh for age uh for the
patient to follow um depending on their
diagnosis.
So in that system there likely there
there isn't really one model that can do
all of those things, you know, just the
absolute best, right? We probably want
to have fine-tuned models, uh
specialized models that are focused on
each one of those areas. the
transcription from voice to text. Uh the
research into u medical history and u
other past medical diagnosis and three
um looking into the patient patients
past research history and then four uh
creating a uh action plan for the
patient. So that's an example where you
might see an agentic workflow, but there
are
almost endless work uh use cases where
you can apply this methodology. Um and
we'll be going over that today. So um
I'll hand it over to Gwen to talk more
about how you can actually build these
workflows yourself and um customize them
and deploy them using Microsoft Agent
Framework. So over to you Gwen.
>> Awesome. Thank you Stephen. Yeah, let's
talk about Microsoft agent framework for
context. You might have used something
like autogen or semantic kernel before
which is which are frameworks that
Microsoft has created in the past. Agent
framework is sort of the uh journey
moving forward. So if you're building
something new, it's what we recommend
you leverage. But to take a step back
and kind of discuss why you would want
to build agents in code in the first
place
as developers you already have many
reasons most likely like you as an
example you probably know Python that
means building something else but with
Python you already have some
similarities there right you know you
have uh you can leverage different SDKs
different tools different data sources
uh you can make them as custom and as
fluent controls as you'd like and that
just comes with the flexibility of of
programming something, right?
Additionally, you can create these
things locally. You can leverage local
models, you can leverage tools that you
would leverage your day-to-day
development journey, right? And then you
can have them in your CI/CD pipelines.
All these types of things that we've
known from from this development journey
and uh portability. If we leverage a
framework, we're using code. We can
deploy to one cloud. If we want to try
it to a different cloud, if we want to
have it deployed locally on a container
or on some random computer that you have
sitting around, we can do that as well.
Right? So those are awesome advantages
as to why we want to use code to create
these agents. But that doesn't mean
you're limited to
only using code, right? you have sort of
the same abstractions that we have when
we think of cloud infrastructure is ps
fast container as a service so many
options we see that same pattern
happening with these
agent developments geni uh apps and
things like that right so with is
the same way that you think about with
you know cloud infrastructure you bring
your own containers and frameworks and
you are more in control of the infra but
that also means that you are in control
of the maintenance and all that kinds of
things right uh in this case you'd have
like open source LMS and frameworks and
things like that that you can leverage
for pass so platform as a service we
have things like uh foundry now uh that
have an agent service there as well and
then SAS traditionally you could think
of something like uh logic apps or um
you know all your software as a service
type things we have actually copilot
studio that allows you to create agents
uh without code which I think is pretty
awesome. So there are there are options
for everyone but for today we uh want to
focus on
code and Microsoft agent framework right
and I will move over here. Microsoft
agent framework is in preview. We do
have a lot of changes happening so do
keep that in mind but it is something
very very fun to build with and expected
to be a G sometime in the new year early
in the new year. uh keep that in mind
but it is the open source engine for
building and orchestrating intelligent
AI agents. We do have integrations with
you know things like open telemetry
which are golden standard when it comes
to observability. You have all the
benefits that you get from using the
Azure platform, guard rails, security
evaluations, all that types of things.
Um integrations into Entra ID and all
all those things and yeah very cool
product. Check it out. So for
orchestrating uh multi- aents, we have a
couple of workflows. And I'm actually
going to load this here.
Let's go back. My uh animations are
somewhat slow here. Anyway, we have
multiple workflows. And these aren't
specific to Microsoft agent framework. I
think you will see the sort of same set
of workflows across various
frameworks that are meant to be used to
build agents. and they might just have
different names, but the patterns
themselves are quite similar. The most
easiest, and if you haven't built
anything before, I recommend you start
with the sequential, is you really just
have one agent do a task, then the next
agent does the task, then the next agent
does the task, and then you end with
some result, right? Sequential. With
concurrent, you kick off something, but
you have various uh agents working um at
the same time, right? And this is ideal
for when you have work that isn't
necessarily dependent of each other,
right? So, uh they can all do their own
things and you save on like processing
time, right? Hand off. This would be
ideal for something like a like a
customer support experience. So, the
customer sends some kind of message and
then the first agent it's goal is to
triage, right? So, then it understands
like okay, this is like a tech support
request. then it will go and hand that
off to the tech support agent or maybe
it's um a refund uh request. So then
it'll go and hand that off to whichever
is the right agent for that workflow. Uh
and then they just you know hand off the
work until they complete. Right. Now,
those three I feel like cover a good
amount of
sort of getting started and actually
quite quite complete uh workflows and
examples and things that you most likely
would want to build. But if you need a
little bit more, a little bit more
planning, a little bit more management,
I would say the group chat and the ones
are the way to go. Group chat is you can
think of it as like a writer's room.
Like we we have someone who pitches an
idea or is looking for feedback on
something and then you have a bunch of
other agents that can either provide
feedback, provide iterations, edits, and
things like that. And then it comes back
to the the reviewer and then it's sort
of like a collaborative back and forth
and the goal is to know get to the
result and you will define at the
beginning which type of pattern you want
to use for this specific group chat.
There's like roundroin. Uh there's a
couple of other options there and it's
great for when you are not 100% sure on
it should be sequential or things like
that. Um I'll actually show you an
example in a little bit of this one. And
then magentic I like to think of it as
like a souped-up version of group chat
because the goal of the the magentic
workflow is to
plan ahead of time. So you give it some
kind of task. it will turn it into
subtasks and then outline a plan for
okay I'm going to give this subtask this
agent this subtask and this agent but
the reason there's a little document on
this uh workflow here is because it
keeps updates on the progress if any of
the agents are not working any anything
that is sort of relevant to the
execution of this work it'll keep track
of that and then if for some reason
something doesn't work it goes back into
planning mode it's quite flexible quite
robust to figure Okay, how do how do we
get to the end result that we need to
and you can envision that that's going
to take you know more resources more
time but more robust. So for specific
workflows that would be ideal right and
then the ultimate one ultimate
complexity workflow process which is
essentially taking all of these
workflows and turning them into agents
themselves. So each one of these little
items could be like a sequential or a
concurrent or a handoff and then they
are all interacting that well right it's
like it's work like workflow exception u
but many many options again if you
haven't worked with any of these types
before I recommend starting with at
least the first first two and going from
there
now uh on top of having all of these
workflow options and like I mentioned
before with entra ID and things like
that you can also plug in a lot of
different tools and make your workflows
quite extensive, right? Have like MCP
servers. We have like Cosmos DB, SQL
Server, uh Lang Chain, and many many
more coming. So, it's not just about
creating agents that can interact with
LMS. It's about creating agents that can
interact with LLMs and are also grounded
and sort of powered by different data
sources, different tools, uh different
inputs that will come from all of these
tools and extensions, right?
And the other sort of key part to make
agents work well is having some kind of
memory tool or mechanism. In agent
framework, we have something called
agent thread. So we can just kind of
work through this example here. So let's
say I have a travel planning
application. I have a user that sends I
need to book a hotel in New York for two
stays, right? And then the agent will go
and use the trip advisor API, search for
the nearest hotel, creates the message
and then sends it back there, right? And
then I ask another question. Again,
let's kind of think of these threads as
like chats. Like when you're inside like
a chatbot, you have like one chat
thread, right? And then I'm asking
another question here which is what's
the daily meal allowance for the
business trip and then again the agent
goes in this time it goes to max
shareepoint queries the comp the company
travel policy creates a message and then
sends a message back right so the thread
itself has the in context memory here of
what is going on because the next time I
send a message in the same thread it
would be helpful to have all of this
sort of relevant information that we've
worked on before right but on top of
that when I create another thread,
right, maybe it's a different day, a
slightly different topic, whatever it
is, or maybe I need to book um I don't
know, I want to know about
transportation options or something like
that. We also have the memory mechanism,
which this keeps track of important
things of all across all your threads
for specific user, right? That way your
user is not always starting from zero,
right? It depends on what you want like
short term, right? shortterm you would
keep in mind these threads and like long
term you keep in mind these memory uh
options and for that you would need a
database of course you wouldn't want to
be storing that in something like I
don't know JSON file or anything that
now um
I had mentioned a little bit about us
having options for observability with
open telemetry and then if you're
familiar already with the Azure platform
you've most likely seen that you can
[clears throat] get very very rich
insights in application insights, active
monitor, right? You can see traces, you
can see logs, you can use uh custoto to
query for all of those things. And we
also have constant policies uh human in
the loop sort of approval flows, things
like that. And you can also uh we also
support longunning things. I'm actually
running something right now that I
started like half an hour ago. And so
I'll show you hopefully it's done by the
time uh we have some demos at the end
and yeah we have uh policy enforcement
thanks to you know we have entra ID we
have content filters uh yeah bunch of
amazing stuff that comes thanks to
leveraging the foundry platform uh for
building these things. Uh and before we
move on to this I do just want to show
you an example of what like a really
simple workflow will look like. I think
sometimes people think like
oh um you know agent frame agents
building agents can be quite complex and
things like that. I'm just going to show
you here. So this is something we call
dev UI.
I'm going to zoom in a little bit here.
So this is something we call devui
and move this over here. And just
because I get asked this a lot. Yes, you
can switch it to dark mode if you want
to. Um but anyway, so here what I'm
going to do is just run this, right? And
I'm going to put in a message here. Um
tomorrow
I am free. All right, I'm just going to
send this over. So what we have here is
a a workflow that we can actually see it
execute uh one by one, and we'll see
what decisions it actually makes along
the way. Its goal is to grab some kind
of input and translate it to Spanish,
right? with the reason why I tried to
make it somewhat of a vague uh English
text is because the goal here is for
between the reviewer and the editor and
the final input and then the
re-reviewer. It can only sort of give us
a final input if we have above 95%
accuracy. But this one it is giving us
let me see
uh it gave us
and okay that's actually a good one. I'm
trying to get it to the These models are
getting quite good at uh translating to
Spanish. Let's see. This is a Let me do
some grammar issues. Um Microsoft agent
framework.
I'm trying to get it to trigger that uh
rear reviewer here. Uh but you can start
to kind of [clears throat] visually
easily see here how it works, right? And
then we also have the events here on the
right side. So we can see what happened
step by step. And we also obviously have
it in here as well. And yeah, I'm really
trying to get it to No, it keeps Oh,
here we go. No,
no, it went straight to the final. All
right, you get the point here is you can
have the option where you sort of get
things to work depending on specific
conditions. And if we look at the code
here, uh let's go on here.
So it is this single file. The I guess
me oh there's a bunch of meat and
potatoes but the most important thing is
if we go to the bottom here we can take
a look at our actual workflow here which
is defined in this code. I'm going to
make this just a little larger. Go here.
There we go. So we have a workflow
builder. The goal is to start with you
know your executor, right? Which in this
case is just our translator. And then we
have an at edge, right? because here
each uh each one of these are sort of
considered when you branch out we
consider that an edge. So we'll go back
here and then uh we'll say add edge the
translator which will send off to the
reviewer and then here it tells us here
if the high quality go to output else
you go to the editor which we have here
a switch edge switch case edge group so
again depending if we hit above that
percentage of quality uh which again is
subjective because we're giving these to
LLMs but you get the point right and
then after editing re-review and if it
is high quality we go to the output if
We'll just go back to editor until we
get a good a good one. I just want I'm
going to try once more to see if I can
get it to fail at it.
>> The model is just too smart.
>> Yeah, these the GPT models are fantastic
with Spanish. Maybe I should have pic
picked a different model. Uh let me just
say what is you let's do something like
that. Let's just like some incor very
incorrect English. [laughter]
Um, yeah, I've I've worked ever since
since like GPT3
with like a lot of translation because I
work a lot with like developers that
speak Spanish and G the ever since like
yeah, probably 4
one. It's just been one of the best
models for for working with with other
languages. Uh, so yeah, pretty
interesting. All right, let's see. It is
>> question um how can we use and install
dev UI and is it Python only?
>> Yeah, it's a great question. So a
framework Asian framework works with uh
we have C and Python support now and it
really is just installing I'll show you
in a second here but we did get it to to
run. And we see it's going on run two,
but it really is just using we'll go
over here
projecttoml and here we are using agent
framework and then agent framework devi
right you don't necessarily have to use
wii um but it's a great way to sort of
you have that visual representation of
what's going on and then the actual
configuration of it is just down here
nothing too too
uh crazy just this line here. Sure.
Right. And uh we'll drop some links,
documentations for you to go and uh
review that. But it's um yes, but pretty
neat. Pretty neat tool. Now we'll go
back here. Yes. So here. Okay, cool. Now
you see translating the text. The
accuracy was 90%. So it needs
improvement because it's below 95. Then
it goes back and then it ended and uh so
I said, "What is you?" I guess my I was
trying to say who who are you? So it
does it does get the correct uh
translation in Spanish. Yeah. So anyway,
that was a nice little WI plus uh agent
framework example there. Uh do you want
to talk about Nvidita AI blueprints now?
>> Absolutely. Let's do it.
All right. So yeah, like Gwen showed, um
there's so much capability and so much
flexibility when you build your agent
workflow. Um but you know, admittedly,
it's it could be a bit tricky to to know
how to get started. Um maybe, you know,
you need some reference architectures or
at least some some ways to help you, you
know, get on your feet and start
building your workflows for your um for
your company. Um, so that's really where
Nvidia AI blueprints come in. And you
can think of AI blueprints as like
reference workflows
um, and that can be pre-trained and
customized for specific use cases. So if
you go on build.envidia.com nvidia.com
um and go to slashbloopprints. You can
see all of our blueprints shown here. Um
and there's just so many different kinds
of reference architectures and um
recipes that you can use. So we have a
multilm nim blueprint. We have a
blueprint for um creating a data
flywheel. We have a blueprint for
creating an AI retail shopping
assistant. And all of these are
available on the NVIDIA AI blueprints
GitHub repo. So you can go here and find
the code for all of these different
blueprints. So it's all you know
completely visible to you. You can come
in here, download the code and deploy
these blueprints yourself. Only
requirement is that you would have uh an
NGC account. NGC is Nvidia GPU cloud.
you just create an account, create an
API key, and plug it in, and then you're
ready to go. Um, so it's it's a great
way to get started. And the way that
your uh that the blueprints work is that
we have three foundational blueprints.
So we have AIQ, agentic AI blueprint.
Um, we have the rag blueprint and we
have a data flywheel blueprint. So these
are sort of the three foundational
workflows that we build everything else
upon. So you can see if we go back to
build that each one of these is sort of
built on one of these three foundational
workflows. So AI observ AI observability
for the data flywheel. Of course that's
the data flywheel uh foundational
blueprint. the streaming data to Rag.
That's the blueprint for Rag uh
foundational blueprint. So all of these
are sort of built on these foundational
blueprints and customized for individual
use cases. So this is a great way to
understand how you can build these
agentic workflows um for your own use
case. And we try to really make these as
approachable and as applicable as
possible to as many realw world use
cases as possible. So there's a high
likelihood that whatever orchestration
uh agentic workflow uh you're trying to
orchestrate um there's already a
reference architecture for it or for
something very similar um on NVIDIA's
blueprints website. So, I strongly
encourage you to go check it out and see
what you can build.
Um now
see, so one of the blueprints that I'll
be showing you today is the uh financial
model distillation blueprint.
So what this does is it dis it distills
um or it fine-tunes a smaller model
that's about 1 billion parameters and it
uh fine-tunes it using financial data
and you know uh whenever you are doing
model distillation you are essentially
using a larger model to train a or to
fine-tune a smaller model. So what we're
doing here is we're using uh the larger
models like llama 3.18b,
llama neotron 49b
and uh using those larger models to uh
fine-tune the smaller model. So uh this
is of course using um financial data and
financial modeling things like stock
price information financial news uh to
to to fine-tune the smaller model but
this reference architecture can be used
for multiple different kinds of data.
You can use healthcare data, you can
use, let's say, sports data. Um, and you
can use the same architecture.
And again, all of the code to run this
is available on the NVIDIA AI blueprints
GitHub. So, if we go here um to AI model
distillation for financial data, we have
our entire Jupiter notebook that is used
to create uh this architecture. So if we
deploy this onto a virtual machine which
I'll show you in a second um you can run
this entire uh Jupyter notebook and step
by step understand how this model
distillation is working. So this is a
really really great way to get your
hands dirty to understand how this
workflow is working and to implement um
something yourself.
So um
back here it's important to understand
the different components that are going
on um under the hood in these different
workflows. So of course we have our uh
vector database, we have our
orchestrator and we have some of our
long-term memory data stores here with
Nemo.
Um but let's focus on the models and the
compute side that's going on here. So
these models um and are based on the
NVIDIA Neotron model family. So Neotron
is a suite of models that includes LLM
uh VLMs um and safety models as well as
rag models. So Neotron is a family where
you can pretty much any model that
you're looking for um Neotron has a uh a
model that can use for your use case.
So for example um one of the newest
announcements in the AI world is that
just yesterday um Nvidia released the
Neotron 3 Nano model which is a
fantastic model that can do um
agentic work. it can do um tool calling
and uh very intelligent reasoning um
through in in just a very small
footprint. So it's highly intelligent
model top of its class um and it was
just released yesterday. So it's very
brand new. Um we also have uh the
Neotron Nano2 vision learning model uh
which is great for um multimodal
workflows. And so whenever you're
building these different blueprints um
you can plug these different models in
where you need them. So if you remember
what I was talking about earlier, how we
have different specialized models for
different use cases and that's why we
need this whole agentic orchestration
rather than one intelligent model. This
is exactly sort of the answer to that.
We have these models for all of these
different use cases and they're all open
source, openw weight, so very
approachable and easy to get up and
running.
Now whenever it comes to how we run
these models that's where NVIDIA NIM
comes into play. So uh NVIDIA NIM is
really the answer for how can we bring
this complex process of serving these
models um and tuning them and optimizing
them for our infrastructure.
And it the answer to that question is
running it with Nvidia Nim because
Nvidia Nim is uh you can think of it
like a Docker container with all of this
optimization and tuning baked into it um
that you can run with a simple uh Docker
run command. Um and all of our NIMS are
available um on NGC. So if I go to
ngc.invidia.com nvidia.com
and I go to the catalog, it will take me
to and I go search for the containers.
Um, I can find the Nvidia uh NIM section
that has all of our NIM models here. So,
for example, the Llama 3.1 Neotron
NanoVL um that's available here. We have
a NIM for GPTOSS20B.
So we make NIMS not just for Neotron
models but also for um other open-
source models. Um so really great way to
get up and running and to leverage these
microservices um very quickly.
So um
now that we understand what's going on
under the hood um let's go back to this
AI uh financial distillation workflow.
So, if I open up the uh if I clone the
repo and open up the the workbook, um
you can see that it walks me through all
of these different steps that take me
through the data processing and
preparation part of the workflow, as
well as how we can create a data
flywheel. And if you're not familiar, a
data flywheel is essentially a
continually running process to sharpen
your models using newly incoming data um
from both your um from both the outside
world as well as um any real user
interaction with the model. So this
helps cover a lot of these really core
ideas that are important to agentic
workflow orchestration.
>> [clears throat]
>> So, um, as you can see, it includes all
of the code that you need to get up and
running. You can really just run it as a
Jupyter notebook. Um, and you can also
use it to reference whenever you create,
let's say, your own workflow. Um, you
can plug in the models that you would
like to use. So, right here, we plug in
3.3 Numatron Super 49B uh v1. We could
also plug in a different large uh
reasoning model um to use here because
this is going to be the teacher model
that is sort of larger and teaching the
smaller student model about this uh
financial data.
So um this really walks you through the
entire process um and helps you get up
and running very quickly. Now we also
have the capability to run these
blueprints on uh Brev. So, if I go back
to
um let's see if I go back to the
blueprints and I find that AI model
distillation for financial data, I can
uh go view the code on GitHub, but I can
also just click and deploy it um via
Brev. And with Brev, it launches the um
workflow on a hosted service in the
cloud. and I can choose which one I want
to host. So this would be Lambda Labs
and I can just deploy the Launchable and
it creates this entire blueprint on
Brev. So it's another really great way
to get up and running. Um you can test
it out seven less than $10 an hour.
Really great way to sort of get
accustomed and get your hands dirty uh
with these blueprints.
Um, so lots of great stuff going on here
with the blueprints and I highly
recommend it as a way to um to
understand and build your learning with
these AI workflow orchestration tools.
Um, so I'll hand it back to Gwyn and
she's going to show you a couple
different examples how we can use these
workflows for realworld use cases.
>> Yeah. Before we we dive into that, I
want to grab a couple questions.
There's a will blueprints run on Azure
platform or Nvidia platform?
>> Yes. So it it depends. You can run um on
the Azure platform. I recommend you
check out uh the AIQ blueprint. Um there
is a you can install it via Helmchart.
Um so you can run it on Kubernetes which
means AKS. Um, so and then on the Azure
platform,
um, it's not going to be like hosted and
and serverless where you don't worry
about the infrastructure. Um, if you
want to run it on Azure, you would run
it on something like AKS or an Azure VM.
But if you want something more hosted
and serverless, that's where you would
use something like Brev. So it it sort
of depends on what you're looking to do
there, but short answer is you can run
it on both.
>> Perfect. There's um a few more here. Can
the blueprints run on Nvidia's DGX
Spark?
>> So that depends it depends on the
blueprint because um some of these
models uh like if you want to run a
model on the DGX Spark um number one the
model has to be the right size for the
DGX Spark and it has to have the right
uh like software support for the DGX
Spark. So if the blueprint is using
models that do fit that criteria, then
then yes, absolutely. But I would I
would verify um that the blueprints
you're using are using models that have
that support.
>> Awesome. Yeah, we can get through the
rest uh towards the end. We'll I'll try
my best to leave time at the end uh
after we go through a couple of these
examples. Uh okay, so I want to show you
a couple of things. some integrations,
some background running stuff. Uh we and
then you know whatever else I can show
you. So to start, let's go to
uh we can just look at the codebase to
start for example. So we have quite an
involved project here. I'll show you. We
have uh I guess of importance. Uh we
have a bunch of agents, right? This is a
Python project. We're using Microsoft
Asian framework here. And we have a
bunch of agents in here. Here we have
some we have a stock agent marketing
admin insights insights and then these
are all using different types of
workflows and on top of that we also
have MCP servers we have a finance
server and we have a supplier server. Uh
as you know agents work really well uh
with MCPs. Uh so great to have a couple
of those as well. And then we just have
like some front end well a lot of front
end stuff there too.
But uh we can dive more into that in a
bit but we'll we'll stick to kind of
looking at this. I launched the project
and for those who are C developers
you're most likely familiar with
something called Aspire uh and now
Aspire also has first class support for
Python projects as well which is why
we're using this tool. And it's in my
opinion one of the best ways to not only
get an overview of like where everything
is running. For example, my agent dev is
running on uh this URL. My finance MCP
is running on here, but we also have
quite rich uh console output here. I
just ran a bunch of my
like background things. So we'll see
like a bunch of things coming in here.
Uh we also have a more structured login
if uh that's helpful. This is quite
helpful when you want to see like
immediate errors versus information at
different levels. And we also have
traces which when we're calling uh you
know LLMs and things like that quite
helpful to to be able to see this as
well. Uh but this isn't inspired talk
but I wanted to show you that. All
right. So what we have here actually is
a like sort of just like a shop right.
I'm going to go and open it up. Uh here
we'll open this up. Right. So we have
this pop-up shop. you can buy a bunch of
projects um products I mean and we have
the option to log in either as a
customer or as an admin and I think I
have the customer logged in already here
so this is one of the integrations we're
using uh what's it called chat UI kit by
openai which allows you to create these
sort of chat experiences that are
powered by LMS behind them right so I am
signed in here as a customer I just zoom
in here you can see here I'm signed in
as Stacy, right? And then here you get
what you would expect when you sign in
to a customer portal. Just orders,
items total
uh savings in this case. And then I just
opened up the chat and I asked what was
my most recent order and it gives me
information on that. If I wanted to do a
return, I think this is way out of the
return policy. This was in Yeah, this
was like six months ago. So, I'm pretty
sure there's no store out there that
would let me uh do a return, but if we
wanted to, we would have the option to
kick off that type of functionality as
well. And um we have a couple of other
customer ones, but I want to show you
our admin site first. So, let's go in
here.
I can log in as a manager.
And we have different managers for
different stores. You can log in, of
course. But the first integration that
we have here is this weekly insights.
This is actually AI generated. And this
is specific to the store that you are
logging in. Right? So when we think of
managing a store, managing product, it's
important to consider not only like top
selling products, but also maybe there's
going to be a lot of snow in the
following week. So it makes a lot of
sense for us to stock up on like winter
boots or heavy jackets or things like
that. Or maybe it's raining, right? So
having that quick glance of information
here and I'll show you how this works in
a second. We also have those top selling
products like I mentioned and then also
local events. I uh logged into the New
York store. So here it's saying several
major outdoor events including a holiday
festival in the New York new New Year's
Eve celebration. And these expect to
drive significant foot traffic and
clothing sales in the coming weeks. So
these are all things that are important
to keep in mind when we try to restock
things or you know try to make the most
out of this. Right now let me just show
you how that works a little bit before
we see
uh the the stocking functionalities.
So we have I'll start here at the bottom
and I know this looks like a lot but I
promise you don't sense it. So we'll
start here at the bottom and that is
where we define our actual workflow and
sort of the process of how things work,
right? So we have a data collector, we
have a weather analyzer, events
analyzer, top selling products analyzer
and insights synthesizer, right? So if
we look back here again, weather
analyzer is in charge of providing us
this information. Top product analyzer
is the agent in charge of giving us this
information. The events analyzer is the
agent in charge of giving us this
information for the events and then the
summarizer or the one that distills all
of it is in charge of you know giving us
this entire entire thing. Right now if
we head on back here we can see that we
are doing uh we start with a data
collector which just gets information
about what store I'm logged into. Right?
And then we have that fan out. I think
we have uh do we have the WI open here?
Yes. Let me actually switch this to
negative. If we look at our weekly
insights, we see we have we start with
our data collector get information on
the store and because when you think
about this project and let's go back to
those different types of workflows that
I mentioned
it isn't dependent like the top selling
products that insight to the events
analyzer to the winner an analyzer. So
this is a good use case to use something
where we are running these uh
concurrently right. So we kick
everything off. We give the information
that each one of these executors, each
one of these agents needs and they can
go and run at the same time and I don't
have to wait for like the weathers and
events. They don't necessarily depend on
each other and then my synthesizer will
collect everything. So fan in and then
give us [clears throat] the information.
Right? So if we go back here, that's
exactly what we are are defining here.
We have our fan out your our data
collector fans out to our weather, our
events, our top selling and then fans
back in, collects everything and
generates that insights for us. Right?
Now, another thing that's very relevant
here is to take a look at how our let's
look at our top selling product. Right?
This is essentially our agent. Well,
actually a little bit more. It's a
little bit more code here, but here what
we're saying is, okay, I want a chat
agent, right? And we're providing here
the instructions. You are a retail
analyst analyzing product performance
retrieve the top five selling products
right just relevant information here and
as a tool we are sending the finance MCP
so instead of us creating the
functionality in here making like adding
all that code we just created a finance
MCP because other functionality other
agents can leverage this as well and
then we're sending it to the this chat
agent like hey you have these tools I
need you to get this task done go ahead
and do uh do that for us
And just for that, I'll just pop open
our finance table here. And this is
connecting to a database that will go
ahead and
uh run SQL queries, run get relevant
information that it needs uh to
uh answer the question, right? And each
one of these functionalities would end
up being a tool. I'll show you. So
anything that's decorated with a tool
means that that is something that an
agent can leverage to get information to
get results to get answers from. Right?
So for example, get company order
policy.
Uh we also have get supplier contract,
get historical sales data, get top
selling products. Right? Now another
thing to keep in mind here is for
example in our insights, I'm not telling
it specifically which tools to use. I'm
just saying it has the entirety of
finance MCP. And if you are
really good at prompting and you're
quite explicit from what you are
providing as instructions, it will know
which tools to go and pay. Right? And um
that's why we see the relevant
information here on our homepage. Right?
These are all relevant to this. And uh
the last thing oh the last thing that I
think is pretty cool here that I want to
show you before I show you the batch
stuff is we are also leveraging external
tools, right? So, we can use internal
MCP servers that interact with our
information, our data, which is great.
But we can also have a let me take a
look here at our
uh context. Let me look at our weather.
We have a weather analyzer that should
go
uh stock. Oh, no. We're in the wrong
insights. Here we go. Uh we should have
a weather.
All right. I'm just going to search for
it
weather
uh analyzer or is it uh here we go
and here what we are doing is simply
calling an API we're calling open media
which uh working with this API was
awesome uh so highly recommend and the
cool thing here is we are providing a
structured output that way we have we
can sort of be strict with what we
expect And instead of having just you
know random text or random JSON returned
to us and yeah we're just calling an API
here we're providing the proper
parameters that the API expects similar
to just working with normal code and
APIs. Same exact thing here but uh once
we get that information from the API we
send it to an LLM to give us the proper
insights for uh the weather that we want
for the SE right uh that's this part
here. What else I want to show you? Okay
cool. So that's a little bit about how
we would sort of generate this. And then
if we were to log into a different
store, this would look different
depending on that store. And the other
uh really cool functionality that we
have here is we have this if you click
on well we can one click on inventory or
we can use uh this button here that says
generate insights based analysis which
will take all those insights. I click on
this and we have an agent that is
specific to inventory to stocking page,
right? And the instructions for the
agent is all those insights that we just
got, right? We've got the weather, we've
got the top selling products, and
essentially what we wanted to do is,
hey, we need to restock product, use
these insights to go and make a smart
decision on what to stock. Okay. Now, I
don't want to kick this off right now
because it's probably not going to be
done by the time, but I did uh
run it before, but I ran it just from uh
if we go back to dashboard,
I ran it from I think there's like an
inventory. Yeah, actually inventory
here. And it this what you would expect
from an inventory uh dashboard. What's
low? What we can stock up on, right? But
then we also have this launch AI agent
here. And it has some pre-filled
instructions. So that's what I ran here,
right? So you see I kicked it off at
11:32, which was about 20 minutes ago.
And it took about 4ish or so minutes to
go and complete this for us.
And it tells us here uh we should
restock on peacot, peacicoat, wool,
blend, outerwear, uh a couple of other
things here. And it tells us current
stock and um 10. And I'm assuming this,
what did it tell us here? Um, a key
highlight is the peac coat wool blend,
which is completely out of stock,
indicating a need for immediate
replenishment. Other items, while not
critically low, have limited stock
levels and should also be monitored for
potential restocking to maintain optimal
inventory. Let's try let's just try
running this. I hope it
it uh runs quick, but if it doesn't, I
won't, right? But um while this is
running so in the background I do want
to show you a little bit of the code for
the stocking because this is what's
doing like bulk processing or batch
processing I should say. The key here is
we have this collection.
So the first step is to call the MCP and
get information relevant information
based on uh you know the store and all
that kind of stuff. And instead of
having it go to the MCP server, find
one, then calling an LLM, we're batching
everything into this uh collection here.
So that way we just make one call to an
LLM versus making various calls, which
is also why it takes uh a few minutes to
go and collect all the information it
needs, all the products that potentially
need to be restocked based on that
insights that I've provided it. And you
know, it kind of runs we can expand
this. Yeah, it kind of runs in the
background there. It looks like it's
kicked off already. And then if we look
at our console, we should probably see
some stuff coming in here too. Uh we'll
stick this to our API. Yeah. So things
are running here. It's going and making
the calls as well. If I look at traces
here. Yeah, we have a couple things
kicked off as well here for our MCP
servers. Uh I hope that actually it
might uh
Okay. Wait, no. Was this the one I
already did or was this the name?
Okay. I know, but it looks like the new
one. Awesome. Let's see.
So, it says here, uh, well, uh, several
autoear products are available in
sufficient qualities, which may not
require restocking at this time. And
it's selling us combat boots, work boot,
steel toe. What else? Oh, a sports
jacket, a rain uh, jacket. And this is
different than is this different than
Yeah, it's different than these, right?
So, this one looks more like just
products that need to be restocked
versus this one is more specific to the
weather insights, I'm assuming. And um I
guess the loafer slip-ons is because it
had the event of the New Year's Eve
thing. Um but yeah, anyway, so many
things that you can get done with a
variety of workflows. I will also share
their GitHub repo here at some point
because I know people will probably be
asking and I did want to leave a couple
of minutes to just answer what questions
that we had. So if we could just switch
to Q&A um that'd be awesome
would be yes people are asking for let
me let me find the uh the repo.
Give me one sec.
Unfortunately, I don't have the ability
to type in the chat.
>> Um, otherwise I'd be answering them that
way.
>> No worries.
>> Um, find it. Uh, do we have any other
>> um,
>> there's a couple questions. So, is
I'm not sure how to like bring it up on
the screen, but there's one that says,
"Is it okay to say that these are like
customized workflows that we Yeah,
exactly. customized workflows that we
build as models and leverage the VM or
EC2 instances where we will be executing
these workflows. Yeah. Um absolutely. So
you can run these um and I assume you're
talking about um you know just in
general like the blueprints or the agent
framework workflows. Um yeah, you can
run these on a VM, you can run them on
Kubernetes. Um, for the blueprints,
they're sort of like reference
architectures that are meant to be
really flexible and customizable.
So, some of them come prepackaged as
like Helm charts where you can just
install them and run them on Kubernetes,
but if you want to see the source code
and turn it into more of a monolith
where you run it on, let's say, a single
VM, um, that could absolutely work too.
Um, so there's lots of flexibility
there. Um, so yeah, absolutely.
I just uh shared the GitHub repo. It
pasted in that. Perfect. Awesome.
Uh let's see any other. So are we
triggering the events based on the
functionality broken down the functions
into small workflows? Uh we are
triggering events based broken down the
functions into small workflows. Yeah,
you could you could think of it as each
agent should be specific and then your
workflow could also be somewhat specific
but you could have various tasks, right?
So for example, for generating the
insights, it is like a specific task
like generate insights but inside of
that we have specialized agents that are
going to get the weather, going to get
the top products, going to get the um I
already forgot the oh the event popular
events, right? So it is all
functionality that
um is quite cohesive and versus if there
if you find things being completely
random, they might make more sense in a
different workflow or you might like
rethink that architecture.
>> I think we're about time. We can do one
more.
>> Okay. Sorry. I'll do one more. I'll fit
it in. Yeah. This is an alternate
solution to model tuning in Azure
Foundry or could we leverage these
solutions in Azure Foundry also? Yeah,
so good question. Um, so for the model
tuning in Azure Foundry, Microsoft
Foundry now, um, the model tuning, yeah,
it's a great way to like fine-tune your
model and sort of customize it for your
specific workflow. And with that, you
can create the endpoint to interact with
your model. And you can essentially plug
that endpoint into your application. So
let's say that back to that uh financial
you know distillation um endpoint or
sorry financial data distillation
blueprint. Um if I wanted to host all of
the models except one um locally I could
do that. And then that one model that's
fine-tuned and uh hosted in Azure uh
Microsoft Foundry um that could be
hosted there and I just interact with it
through the uh the endpoint that's
exposed from Microsoft Foundry. So that
could absolutely work. Yes, it's a bit
more complex that way. So technically
yes, but um I I would recommend trying
to keep it keep it as simple as
possible.
>> Awesome. Now that's it. But the good
news is we have other episodes people
can come join ask plenty more questions
there as well. Right, Stephen?
>> Absolutely. Yes, we have two more
sessions of this episode. So feel free
to join and um you know ask your
questions there if we couldn't get to
them today.
All right. Well, uh, thank you so much,
Stephen, and thanks everyone for hanging
out here, and I'll see you in the next
one.
>> Thank you, team.
Thank you all for joining and thanks
again to our speakers.
This session is part of a series. To
register for future shows and watch past
episodes on demand, you can follow the
link on the screen or in the chat.
We're always looking to improve our
sessions and your experience. If you
have any feedback for us, we would love
to hear what you have to say. You can
find that link on the screen or in the
chat and we'll see you at the next one.
Thank you all for joining and thanks
again to our speakers.
This session is part of a
>> [music]
[music]
[music]
Explore how to leverage multi-agent systems in your applications to optimize operations, automate recommendations, and enhance customer experience. The solution utilizes Microsoft Agent Framework, OpenAI ChatKit and NVIDIA Nemotron model on Azure AI Foundry to seamlessly connect with store databases, integrate human oversight, and deploy scalable chat agents. This approach enables real-time analytics, predictive insights, and personalized interactions, resulting in improved decision-making, operational efficiency, and a superior user experience for application developers and users š This episode is a part of a series. Learn more: https://aka.ms/AIAgentsApps/y-MSFT [eventID:26558]