Monitor, optimize and scale with AI Observability in Microsoft Foundry | BRK190 | DailyDevLists

Monitor, optimize and scale with AI Observability in Microsoft Foundry | BRK190 | DailyDevLists

Loading video player...

Full Transcript

6,025 words • EN

All right, how is everybody doing today?

I'm Sebastian and I lead product for Foundry Observability.

I'm looking forward to talking to you about all things

Foundry Observability today.

We all know that agents are non deterministic, creating reliability

and consistency challenges for developers and operators.

This is why reliable AI agent development needs observability to

elevate performance, quality and safety, monitor, debug and remeded issues

and optimize agent performance.

From that perspective, we're excited to announce in public preview

observability in Foundry Control Plane, which provides visibility monitoring optimization

across the full AI agent lifecycle.

It starts with building reliable agents early without of the

box evaluations, tracing for debugging in the agent playground.

As you transition to code, you then incorporate these capabilities

into your CICD workflows.

And finally as you get to production, you get fleet

wide visibility and control.

Now let's get into high level overview of our announcements

so you can see what's coming.

Before we start with demos.

First, we're excited to pre announce that our evaluation platform

will be generally available shortly after Ignite for models and

data sets.

We are also introducing several new observability capabilities for agents

including tracing, new evaluation capabilities, production monitoring with alerts and

new features that power optimization and finally new agentic safety

risks in our AI red teaming agents.

Now let's dive a little deeper into some of our

key announcements starting with tracing.

Tracing for multi agent systems is now in public preview

with enhanced OTEL semantics, the power observability for any agent

hosted anywhere for many of the most popular agent frameworks.

As part of our commitment to open standards to enable

seamless interoperability with the Microsoft Foundry, we actively partner with

the OTEL community to continue to evolve the OTEL standard

to enable continuous monitoring, tracing and debugging to keep up

with industry advancements.

Next, let's highlight some of our key announcements for our

evaluation platform.

As mentioned earlier, evaluations for models and data sets will

be generally available shortly after Ignite.

This concludes all evaluators currently in public preview.

We're also announcing public preview of evals for agents including

new risk and safety evaluators, agent specific evaluators for tools

and multi agent systems.

And underpinning all of these capabilities are a flexible LM

as a judge and code based custom evaluators, which you

can use to create and run context specific evaluations for

your agents.

Now let's walk through what this looks like in practice

using a weather agent.

I know the weather here in San Francisco can be

unpredictable in the fall.

And I want to make sure that my I'm fully

prepared and have, you know, all of the right gear

with me so that I don't get wet and I

don't get a sunburn, which might not happen at this

time of the year.

To ensure that it does so correctly, we provide an

intent resolution evaluator.

So if I, you know, interact with my weather agent

and I ask it about the weather, the intent resolution

helps me understand whether or not it the agent has

resolved the intent correctly.

Once the agent has figured out the intent, it then

needs to call the right set of tools to provide

a response.

Here we provide several tool called evaluators and operational metrics

to measure quality and success.

Finally, as we get to Step 3, we can use

task adherence and other evaluators to measure whether or not

the agent provided a high quality response.

Now let's get into specific launches and demos.

We're going to start off with the first phase of

agent development and to build, you know, reliable and agents

early.

And, and we're going to kick things off with your

core set of capabilities and enable you to get comprehensive

visibility into agent behavior.

So when I'm creating a new agent, you know, I

need to know that it's correct and safe.

In the Foundry reports, we provide several new capabilities that

enable you to measure the quality and safety of the

agents before you take it to production.

This includes custom evaluators that you can tailor to your

use case, synthetic data set generation so you can get

started with evals within minutes, human evaluations so that you

or your team can evaluate your agent, and automated red

teaming support so you can probe for safety and security

risks.

Now we're going to show these new capabilities in action.

So for to do this, I'm going to demo a

Zava Outdoors catalog agents.

So Zava Outdoors is launching a new outdoors line.

This is a support agent that answers questions about the

product catalog.

As you can see, I've attached a an index that

contains all of the products in our catalog.

And in order to test this agent, I'm going to

use the evaluation metrics that are provided in the agent

playground to see whether or not it's doing what it's

supposed to do.

In this case, I've selected task adherence and intent resolution

to make sure that the agent is responding as intended

and adhering to the task at hand.

So I'm going to ask the agent, can you help

recommend a jacket for San Francisco in the fall?

And let's see, let's see what happens.

So ideally, the agent should respond and give me a

set of jackets that are appropriate for the current time

of the year.

And yes, it did that successfully.

As you can see, there's an evaluation that just ran

and both intent resolution and task adherence passed.

I can click into a debug window and I can

see the full agent execution.

I can see the input and output.

I can see the evaluation score along with a set

of explanations for why the specific scores were picked.

Again, these both passed.

So five out of five for intent resolution and a

task adherence passed as well.

So now I have one data point that tells me

how my agent is doing.

But how do I know that my agent is performing

well at scale?

This is where our evaluation platform comes in.

One thing that I can do is if I want

to run an evaluation at scale is I can go

into the evaluations page, I can create a new evaluation.

I select the agents that I want to evaluate as

a target.

In this case is of outdoors catalog.

I can either generate synthetic data based on the agent

context.

That's a great way to get started within minutes with

the evaluation or I can pick an existing data set.

In this case I'm going to going to pick an

existing data set just in the interest of time so

that you can see what that looks like.

I get a preview that shows me all of the

the queries in my data set and the ground truth

response and context.

I can select a judge model for evaluation.

I can select from pre suggested evaluators and finally I

can run the evaluation in the interest of time.

I'm not going to not going to do that.

I'm just going to show you a completed run so

you can see what that looks like.

Here's an example.

And I can see that in this case, you know

everything passed, which is a good sign and I can

move forward and take take my agent to production.

Right now we're going to hop back and we're going

to move on to the next stage, which is monitor

and optimize production.

And I'm excited to introduce Sam, my engineering counterpart to

take it away.

Thank you.

Sebastian, hello everyone.

Now that we've seen how to develop reliable agents early

in the life cycle, let's shift our focus to what

happens once those agents are in production.

This is where observability becomes absolutely critical, not just for

uptime, but for continuous monitoring, improvement, and trust in your

AI systems.

Before I start, quick show of hands, how many folks

have actually built an agent and your agent didn't exactly

do what you expected?

OK, so this is going to be very relevant and,

and, and, and actually interesting to talk about.

So in production, monitoring and optimization are one time aren't

one time tasks, they're ongoing processes.

While agents are non deterministic by nature, we need to

ensure their behavior can be continuously tamed as it can

change over time, sometimes in unexpected ways.

As you observed, to ensure quality, safety and efficiency, we

need robust tools to observe, evaluate and optimize agents as

they operate.

Our new agent monitoring dashboard in the Foundry provides comprehensive

insights across multiple dimensions, including continuous evals of production traffic

where we continuously take a sub sample of queries and

run your choice of evaluations on them to give insights

into the performance of ongoing requests to your agents.

Schedule evals allow for custom set scheduled runs for drift

detection and monitoring, red teaming to probe for vulnerabilities, adherence

to policies, and to offer insights into level of protection

against attacks.

And finally, Azure Monitor powered eval alerts flag operational issues

with evaluation results tied to traces to simplify debugging.

It's worth noting that as you develop more complex agents,

one may leverage multi agent schemes and this can have

detrimental impact on overall execution of the agent since errors

can be compounded across multiple calls to sub agents.

Thus it becomes critical to be able to debug and

trace these agents and pinpoints.

Low scoring traces obtain full visibility into the execution flow

of each agent and their respective evaluation scores.

Our observability stack relies on data and telemetry from apps

and agents, as well as the AI platform itself, where

we host models to provide intelligent observation, insights and control.

In addition to tight integration with Azure Monitor, by being

a core component of Foundry, you can now have a

comprehensive monitoring solution spanning not just your agents and AI

platform, but across your data services as well as Azure

Infra.

So let's talk about a demo here.

We're going to build up on what Sebastian, my partner,

introduced on Zava Outdoors.

And as you can see, what I'm showing you is,

is our monitor tab where you see all your operational

metrics in one place.

So as you can see, we can see evaluations.

There's scheduled evaluation runs and any other metrics that we

want to dive into.

Specifically, I have the ability to set up continuous evaluations

as I discussed setting the sub sample how many runs

per hour I want to run.

I can do scheduled evaluations based off a scheduled runs

that runs at a specific time as well as scheduled

red teaming runs and evaluation alerts seems to be loading

and the evaluation alerts as well.

We can set up thresholds as to when it is

I want to be notified for my alerts and this

can be quite helpful.

In handy to be able to proactively see where the

issue areas are and for me to jump into them.

So as you can see here, we seem to have

an issue on task adherence.

Let's dive deeper here.

So I can click on view details and I can

see that there's something wrong with the task adherence for

our evaluations.

I can quickly click on the view traces that would

take me to the Traces tab.

Doesn't seem to be loading today.

And through the traces I'm able to eventually see what

are the problem areas with regards to these alerts.

So let's wait for that to load.

And so here I can quickly see that my trace

runs through time.

I can sort them by evaluation metrics that we discussed

and be able to deep dive into those components.

We seem to be having a little bit of slowness.

This is part of demo process but inherently looking at

conversations.

I can click on various conversations and be able to

deep dive as to why something is having issues.

And for interest of time I will come back to

this.

Since this demo is not working, let's go back.

So, to optimize your agents, we have built several useful

utilities to dive deeper into your agents and obtain valuable

insights.

These include the ability to compare evaluations, perform cluster analysis

to quickly pinpoint and group issues by their type and

nature, as well as an agentic chat feature that allows

you to ask an assortment of questions to Microsoft Foundry,

get insights, and even control aspects of your projects, such

as curated model deployments and upgrades.

Let's take a peek at some of these features together.

So here we can see that I have my evaluations.

So here I can quickly load my evaluations and be

able to compare them.

So in this case, I'm selecting 2 and I'm actually

comparing these runs.

I seem to be having connectivity issues again.

So when I compare these runs, I'm able to do

run and run comparison to see what were the key

metrics that I'm falling short on.

A specific was a task adherence.

What was the delta between those two runs and be

able to make progress through continuous iteration to improve on

my agents going back to the issue that you all

observed, how to tame the beast.

So here we have various tools to do that.

One of them is our cluster analysis tool, for example,

that allows me to hone in on the various key

areas.

And this is a very useful tool because you have

a lot of agent runs, you have a lot of

different evals, and you want to be able to hone

in a key problem areas.

So in this case, we can cluster those areas and

see that, hey, some of the issues were due to

hallucinated responses.

Hallucinations happen all the time.

So here I can deep dive into unsupported fabricated issues

and be able to look at the details of those

and what happened and what are the AI suggestions here?

And the AI suggestion helps you to continuously improve upon

your agent development process That gives you insights on what

it is you need to modify to make your agent

even adhere to your net end goal, whether it be

accuracy, adherence to task relevance, or anything else in specific.

I can also have a conversation with the system.

This is something we're really proud of and it's quite

exciting.

I can click on our Ask AI and say, hey,

give me a let's go ahead and upgrade our main

model from GPT 4.1 to something better.

So this not only allows you to ask questions, get

analysis on the fly, but also be able to control

facets of foundry through just conversation for interest of time.

I will show how that looks like.

Essentially, you can see that it's taking me through the

process to to upgrade the current model.

It's identified the current model.

It is is indeed GPT 4.1.

And in parallel I can see that it's provided several

options as to what my alternatives are.

I can even go to the detailed model page and

read about that model cart.

But also in parallel, I can finally approve this and

be able to by one click upgrade my model through

a curated process.

Awesome.

Let's go back.

So as a next step, I wanted to introduce one

of our key partners and Mr.

ABI.

Please welcome ABI.

He's the Vice president of data and AI engineering at

CarMax.

ABI is a proven leader with a track record of

delivering transformative data and AI solutions, and driving innovation is

now focused on shaping the future of intelligent experiences and

unlocking new possibilities through emerging AI.

Thank you, Sam.

Hey, good morning, everyone.

I'm excited to be here and share Carmax's AI journey

about innovation partnership at Microsoft.

And I and I have responsible use of AI.

Let's get into it.

So when we think about AI at CarMax, AI isn't

new to us.

For over a decade, we've been built using various AI

techniques such as supervised learning, natural language processing, computer vision,

and process automation to improve our customer experience and also

make our core processes more efficient.

By leveraging our data and AI, we have created a

powerful advantage, one that truly sets us apart in our

industry.

But here's the thing, AI is changing so rapidly, and

today we are in this era of generative and agentic

AI that helps us redefine what is possible.

Through Microsoft Foundry, We've experimented, we've learned, and we've taken

3 noteworthy use cases from prototype to production.

They are search, which is a conversational search that you

can use to find cars on CarMax.

Second is Knowledge Management, which provides prompt, fast and accurate

responses to queries or information.

And the third one is Sky, which is what we're

going to talk about today.

So what is Sky?

Sky is a virtual assistant.

It's purpose is to personalize and elevate the customer experience

by empowering and engaging customers at the right moment, no

matter where they are in their shopping journey with us.

So with our goal in 2020 we launched the first

version which was powered by natural language processing and NLP

did great at the time.

It detected the customer's intent and used pre programmed scripted

flows to direct the customers.

This approach had its clear advantages.

It was predictable, controllable and reliable.

However, as we looked at customer expectations and how they

were evolving, we noticed that our NLP powered Sky was

reaching its limitation.

It felt more rigid and it felt more less intelligent

to responding to customer questions.

So with generative AI, we saw the opportunity to completely

reimagine Sky, making it more smarter and responsive to our

customer needs and setting us up for the future.

But here's the thing, while we had worked and built

expertise in generative AI, such a complex generative AI solution

or an experience had never been deployed to our customer

base until now.

We partnered with Microsoft to redefine what used automotive detail

can do for customer experience and together we are Co

creating the most intelligent, scalable and personalized AI powered experience

in our industry.

We're truly reimagining how we can seamlessly guide both our

customers and associates through every stage of their journey.

So here's what the updated to our OS sky looks

like.

And you will see in this, a customer is shopping

for a car and it's going to ask guy, hey,

can you help me find similar vehicles, but maybe at

a different price point.

And, and Sky does a great job of showcasing multiple

different options to the, to the end shopper or the

customer.

So let's talk a little bit more about the results

as you will see that they truly speak for themselves.

And we looked at the results in two, two parts.

On the experience side, we saw a 10 point improvement

in the net positive feedback score, which is basically telling

us that our customers are more satisfied or are getting

a better experience through the new Sky.

Second is on the efficiency side, we saw an increase

of 25% in containment.

Now, containment alone can be misleading, but when we look

at that with experienced or customer satisfaction, we feel more

confident that customers are getting their questions answered through Sky

without the need for escalation.

And when there is a need for an escalation, Sky

can seamlessly connect customers to our customers experience Center for

more hands on support.

So when you look at this and as you think

about how did we approach our deployment or scaling and

the answer is a very we approach it carefully And

the reason for that is trust.

For us, every Sky interaction matters.

Every response reflects our brand.

We are a trusted brand.

And it's no surprise that our approach to AI is

guided by those core values of honesty, integrity and transparency,

which is why we saw a need for a comprehensive

framework for responsible AI.

And that's what led us to creating this evaluation framework

for Sky 2 dot O, which we created using Microsoft

Foundry and Log Analytics.

And think about it, this has two parts to it.

The first one is what do we get out-of-the-box from

Microsoft Foundry.

And 2nd, we had to build our own custom evaluators

to supplement our use cases and the need for how

do we measure or evaluate our AI.

So let's talk a little bit more what we are

using and getting out-of-the-box from Azure.

So we get, we think about them as three parts.

The 1st is runtime guardrails, second is evaluation such as

safety and jailbreak, and then for AI specific tracing.

So when you're using AI, you want to know what,

hey, what was the input prompt?

What is the output prompt?

What sort of tools were called from my LLM for

that AI specific tracing?

We are using a combination of open telemetry as well

as Azure Monitor.

So with that, let's see, let's look at a demo

of how we are using these tools.

To give an example of how we use Azure Monitor

and Log Analytics, we're going to go back to the

example we showed earlier.

Hey Sky, can you help me find cars similar to

this one, but a little cheaper?

And as you can see, Sky replies to this helpful

car carousel and some text.

Let's take a look at what that looks like in

telemetry.

So if we take a look here, all of this

gen_AI telemetry traces are actually built in with open telemetry

into semantic kernel.

And this is what we use for monitoring and also

occasionally for evaluations.

So you can see here, this is the user message.

Hey, Sky, can you help me find cars similar to

this one?

If we go to the next message here, you can

see the generative agent uses a tool call here.

This is where it's searching for the vehicle.

You can see it's actually looking for that specific vehicle

that the user was talking about.

So it can like figure out the details on that

and then get more information from there.

This is the results of that tool call.

If I click show more here, you can see a

massive Jason object of the entire result.

And then once Scott has a couple more tool calls,

as you can see here, it gets back to the

response that we were just looking at.

Here are some options.

Now moving on in the conversation, I say.

So it gave me 3 options.

What are the differences between 2:00 and 3:00?

And you can see here Sky gave me a very

helpful like bulleted list of the differences between each.

Like the mileage is different, the color is different, you

know, things like that.

And that looks very, very similar.

So if we go back here again, this is the

user message and then we get the assistant response.

Now, in this case, it's a one to one message

to response because in this case, Sky already had all

the context it needed with the prior messages, the prior

tool calls to make those comparisons.

Before we had, you know, multiple tool calls.

Here we just had the one.

Great.

So let's talk a little bit more about what what

are we doing for customization.

So we got a lot of good out-of-the-box capabilities, but

for our use cases, we knew that we had to

build some more custom LLMS, judge evaluators to help us

with few other things that we wanted to measure and

monitor like legal adherence.

So every conversation that Sky would have with our customers,

we want to make sure it's in line with our

legal guardrails.

Same thing with intent detection.

We want to make sure that Sky is truly understanding

the intent of the conversation with the customer.

And there are a few others that are listed over

here.

All of these evaluators we built using the ecosystem in

Azure, such as Machine Learning Pipelines, Log Analytics Monitor, Azure

Monitor, Workbooks, and Azure Dev OPS.

So let's look at a demo of how all of

these come to life for us.

We utilize Azure AI Foundry and CICD in a couple

of key ways.

One of the main ways is running daily evaluations.

So every single sky response that a customer sees is

omitted as log analytics telemetry.

And once a day, we have a Python script that

runs that collects these, generates a data set, and then

runs an evaluation incident foundry.

So that's what this pipeline looks like here in Azure

DevOps.

As you can see, it runs every single day and

has been running for a while now.

We move over to AI Foundry.

This is what it looks like there.

These are all of the legal adherence runs within our

evaluations blade here.

And if we click into one of these, you can

see all of this very useful metadata that comes involved

with this.

So we have the actual sky version.

We have the Git hash of the eval code that's

being run against it, and we also have the data

set name and the evaluator type.

Now for evaluators, we use AI Founder to store those

as well.

There's a lot of built in Microsoft ones.

We've also had to build a few of our own.

And just to dive into the adherence evaluator for a

second, we actually had specific guidelines, legal guidelines we wanted

Sky to adhere to before putting it in front of

a customer.

So an example of that here.

These are three example guidelines, three example Sky responses, and

then what the LLM judge thought.

So for example, did the assistant avoid applying that a

car is safe?

This is what the user sent to Sky.

This is what Sky responded.

And then the LLM judge had to decide if this

response passes this guideline.

And it did.

And the reason the LLM judge gave was because the

response is not implied that the car is safe.

So it gives it the status of successful.

The other way we utilize Azure AI Foundry and CICD

is in the Sky deploy pipeline itself.

So after we deploy to QA, we see here we

have this blocking evaluation gate step.

And so every time we deploy Sky, we run our

full suite of evaluations up against it.

And that step looks like this nice little Python table

here with a visual indicator as to whether or not

our scores are passing.

Now some of these scores fluctuate a little bit.

Run to run.

So that's why this is not an automated step.

It's a human reviewer has to look at these scores

to aid in that process.

We also have all of these scores saved and these

expected thresholds saved in a marked on file in GitHub

just for easy access.

And then at this point, a human can decide whether

to approve or reject the Sky release.

Great.

So let's talk a little bit more about what's next

for us.

So as we think about the future of AI or

CarMax, we are going to build more agents and as

you have more agents, you need to improve orchestration of

the agents.

And that's this is all in line with how do

we enable more agentic AI for us.

We talked a lot about the new feature functions from

Microsoft Foundry that are coming on Observability.

We are excited to explore that and see what we

get out-of-the-box and what do we need to build as

custom evaluators.

And lastly, I do anticipate that as our use cases

expand, we will need to build more evaluators.

This is in line with ensuring that we still make,

we are still using our AI in a responsible way.

So our journey from traditional AI to generative AI has

fundamentally changed how we serve our customers and the results

speak for themselves.

We are seeing better, better experiences, improved efficiency, and we

now have a foundation for innovation.

But truly, what excites me is that we're just getting

started.

There's so much more for us to do.

And with our robust sets of evaluation, I feel really

confident that we are positioned to deliver even greater value

in the years to come.

And with that, I hope that our CarMax journey has

given you the insights and the confidence that you need

to be successful in your AI journey.

And with that, I would like to now call Sam

and Sebastian back on the stage to wrap us up.

Awesome.

Thank you, Avi.

That was fantastic.

It's great to see CarMax incorporate observability into every step

of their development life cycle.

It's you have really, really fantastic how they've used custom

evaluators and you have customized those to their use cases

and scaled out their, their sky experience.

Now we're going to showcase how you can scale agent

fleet management with observability and governance.

Let's get into the final phase of the AI agent

life cycle.

As more and more agents are deployed across your organization,

it's absolutely critical for management and oversight to be centralized.

Our newly announced Foundry Control Plane is your destination to

manage, observe, and govern your agent fleet.

So if you go from one or two agents you

know, to hundreds of agents, that's the place where you

can go and view all of your agents, see what

they're doing, and govern them from that perspective.

We're excited to announce a bunch of new capabilities in

the Foundry Control Plane that are powered by observability, including

our new Agent fleet dashboard that shows key performance metrics

and alerts and assets inventory for all of your agents,

models and tools, and a registration flow that enables you

to register and observe any agent built using many of

the most popular agent frameworks.

So now I'm going to hand it off to Sam

and he's going to demo and show you what it

looks like.

Awesome.

Thank you so much, Sebastian.

OK folks.

So let's talk about operation.

So as we discussed our operate tab experience is quite

holistic.

It allows you to see your active alerts in one

place.

We can see for this is Java Outdoors project, for

example, we have 83 agents.

They're all operational in one shot.

I can quickly see in my estimated cost and trends

in specific, how am I trending week on week or

month on month.

And in specific, one of the awesome things that we're

proud of to showcase is Microsoft Foundry is not just

one place where you can see your agents, but you

can also see non Foundry agents here as well.

So this is truly exciting.

So I can actually go in and register an agent

that was built, say on Google Cloud, on Google Vertex

or on AWS Agent Core or anywhere else and actually

register and bring it in here.

And for me to have one holistic view that I

can see my operate tab and provide protection to all

our agents.

So here I won't for interest of time walk through

this, but you can basically bring in your agent URL,

your endpoint URL, your hotel specifications, define it here and

set up your agents.

And once it's set up, it'll basically look like this.

You'll see in your assets view a set of agents.

So I can actually go through various agents that have

been brought in from outside.

So in specific non Foundry agents and be able to

see their traces, be able to deep dive and eventually

have some level of protection and support for them.

So here we can see that there are several agents.

If I even change the source to custom, we can

see our agents have been developed and built on GCP.

So mind you connectivity are a little slow and then

we'll come back.

By the way, the previous example that I was not

able to show and show that.

So these are agents that were built in GCP or

AWS Vertex, and I can quickly go and click on

one of them here, for example, and see the traces

within that.

This is the beauty of what we've built.

These traces are actually coming from Google Vertex and I

can see that, for example, this one, there's all of

this complexities being done.

I can see what was input and output and sort

of evaluation scores across them.

So inherently we've gone ahead and somebody's built an agent

outside and we're managing it within our operation tab here

in which is awesome.

So does that use the hotel correct?

Use the open telemetry semantics to bring in those sort

of traces and logs and have one place to view

them all.

That's right.

So I actually want to take a minute and go

back and showcase our traces which I was not able

to show prior.

So in this case, as you recall here in the

settings I have ability to showcase continuous evaluations, scheduled evaluations.

I can set up scheduled runs, you know, whether it

be run hourly or daily.

I can do scheduled red teaming runs, same thing with

a schedule as well as set up my evaluation alerts.

And so here we can go into specific set of

traces that due to connectivity I was not able to

show.

And here I can quickly look at, say, a particular

conversation or something that is not doing so well.

So here we can see that for this one, the

task adherence was inherently 0.

Let's try to deep dive and see what's going on

here.

So when I look at these evaluations, we can see

here, oh, it says the assistant claimed to have explored

tents available and provided detailed product information.

But this is a clear example of hallucination.

How would it able to answer what jacket to suggest

when it doesn't have the right tool call?

It's hallucinating clearly.

And so here we can learn from this and be

able to inspect what was the input and output.

It says, do you have an explorer tent?

That says, yes, we have an explorer tent.

It's listed, I'll put explorer, blah, blah, blah.

But it didn't actually make a tool called it's hallucinating.

So you can actually leverage our utilities here to hone

in on key issue areas within the operation of your

agent to make better, more decisive decisions in that element.

OK great.

So we talked about operation, we talked about management of

three P agents and now we'll go back.

Yeah, that was awesome.

That was a really, really great demo and exciting to

see it all come together in the Foundry control plane.

We have a bunch of related sessions that we encourage

you guys to attend, specifically BRK 202, which starts right

after this session.

That session is going to go even deeper into the

Foundry control plane covering compliance, security, and a bunch of

the the governance features.

We also have several related sessions and labs that incorporate

many of the features that were shown today.

We're, you know, excited for you to explore all of

these.

You can learn more, you know, from our documentation.

Check out what's new in in the Microsoft Foundry and

we're looking forward to your feedback as you as you

try out the new capabilities.

And with that, we'd like to thank you for attending

the session and we're looking forward to seeing you all

at the Foundry control plane observability booth.

Thank you very much.

Thank you so much and have a great Ignite.