Build agents with knowledge, agentic RAG and Azure AI Search | BRK193 | DailyDevLists

Loading video player...

Full Transcript

8,191 words • EN

He gave us a thumbs up.

Thumbs up.

OK.

All right.

We want to welcome you all to our session today

about building agents with Knowledge Agentic, RAG, and Azure AI

Search.

My name is Matt, I'm a program manager on Azure

Search.

I'm Pamela and I am a Python cloud advocate.

Does it say to clap right now?

All right, let's talk about our agenda for today.

First, we're going to cover the basics of RAG retrieval,

augmented generation.

Then we're going to do a deep dive on knowledge

bases inside Azure AI Search.

We're also going to cover how foundry and knowledge bases

are connected through Foundry IQ.

And we're going to have time to take some questions.

Let's dive right in.

All right, so I'm going to talk a bit about

RAG.

So how many of you are using RAG today, right?

That's a very good number.

Awesome.

Now just to, you know, get us all on the

same page here, right?

We're all building these applications.

We're starting to build agentic applications and agents and we're

seeing agents in many different parts of our work flows,

right?

And these agents can be conversational.

That's what a lot of us originally started building.

And now we have more task oriented agents that are

actually taking actions and doing stuff on our behalf.

Now the majority of these agents need domain specific context

to ground themselves in your organization's data, right in the

information that matters for you, for your task.

And that is where we need to have very good

retrieval and very good RAG.

So RAG is the name for the technique that LLMS

use in order to use your information.

RAG stands for retrieval augmented generation.

The basic idea is that you get in a question,

use that question to search a search index.

You get back those results and then you send them

to an LLM and say, hey, LLM, here's the results,

here's the original question or intent.

Now please answer based off the results and provide citations.

Right?

I actually have a demo that I can show of

a very basic, the most basic rag application.

Right?

Of course we're asking about Zaba.

What is the best Zaba paint for bathroom walls?

It says it's the interior semi gloss paint, very important

for washing when your children decide to draw all over

your bathroom, as happens to me a lot.

And if we look at the the process here, what

we do in this very basic rag is we take

that question, we send it to our AI search index,

we get back results.

These are chunks of documents that we've indexed and you

know, they have file pages and scores.

And then we send those results to an LLM and

say, hey, your job LLM is to answer the question

based off these sources.

And so then we get back that answer with citations.

That's the basic RAG.

And that is what we have, you know, many people

started with in, you know, in this space of RAG.

So let's talk a little bit more about Azure AI

Search is actually going to help you run advanced RAG

for better generative AI applications.

Azure AI Search provides customers with a feature rich vector

database built on an enterprise ready foundation so you can

focus on growth and scale.

It's integrations offer an end to end data management pipeline

for all types of data.

A full stack RAG solution available.

Where and how you want to build your retrieval strategy

matters.

Our comprehensive search technology allows you to go beyond vector

search.

We ensure that you have the capabilities you need to

retrieve accurate information for every question, regardless of your use

case or data.

So how many of you are using AI Search already?

Let's see some hands again.

Make sure everyone's listening to the right headphones.

Awesome.

Look at that, all your customers.

Very good.

All right.

So the thing that's fantastic about Azure AI Search is

that it has this state-of-the-art retrieval strategy.

It uses a full hybrid search.

So it's not just using vector search or just using

keyword search.

It is combining vector and keyword search together, merging those

results using reciprocal rank fusion and then re ranking those

results to get the very best results.

So we're going to start off with talking through this

search stop so you all understand hybrid search.

And then we're going to go on to all the

new agentic search strategies that you can layer on top

of hybrid search to get even more powerful results.

So there's just so much stuff that AI search has

to offer.

But let's start with the basics, right?

So keyword search is the traditional search that we've been

using for decades now, right?

And the idea of keyword search is that we store

an inverted document index, which says, hey, you know, for

this particular term, this is how often we saw in

the documents and the general frequency relative to the length

of the document.

So if I'm searching for hose and there's a document

that says hose, hose, hose, hose, hose all over the

place, that's probably a very good document for my, you

know, for my search for hose right now, as your

AI search uses BM 25, which is basically the best

in class full text algorithm and does a very good

job at handling keyword search.

Let's see, I've got a little example over here with

keyword search.

So what I've got is I'm using Python because it

is the best language and I'm setting up the Azure

AI search client, connecting to my search service and then

doing a search here for 25 foot hose.

OK, and this is searching.

This is, you know, fake Zaba product catalog that's got

names description categories price etcetera.

And what you can see is the very top result

is in fact a 25 foot hose.

So you can see here that keyword search did a

good job.

It actually found exactly what we needed, right?

And it did a good job here because 25 matched

foot matched hose match like, you know, like it's kind

of like a easy one for it right?

Now, the issue with keyword search is that it does

much worse with other kinds of queries, right?

So if I'm searching to figure out how do I

water my plants efficiently without waste.

Now when I do keyword search, I can see the

1st result is water based polyurethane.

The 2nd result is water based wood stain.

You should not water your plants with either of those.

I tried it just didn't work.

So this is an example where keyword search just utterly

fails right?

Because we asked this query that was much more broad,

ambiguous.

It doesn't have, you know, exact keywords in it.

And you know, and that's what keyword search does well

at right?

So This is why people get so excited about vector

search right?

This has really become big in the last three years

or so when we came up with these new vector

embedding models like the ones from Open AI, Tex embedding,

Tex 8 Embedding, all those great models.

And the idea with vector search is that we take

our information, we turn it into a vector, and that

vector represents the information in this multidimensional space.

And then when we get in a new query, we

convert that into a vector using that same embedding model.

And then we check and see like, hey, for this

vector, which vectors are the closest?

And we go and we see, oh, OK, like for

dog, cat is closest and puppy is closest, right?

And we find which ones are closest.

And the idea of that multidimensional space is to represent

similarity.

And it's it's been trained off the Internet and seeing

generally what terms, you know, course, you know, show up

together in in the Internet corpus, right?

So that's the idea of vector search.

And with Azure AS search, it has very good support

for vector search, one of the first search engines that

added it.

And you know, you can use it on your documents

and you can also use it if you have massive

vector databases.

So it can actually scale to handle searching across billions

of vectors because it can use this approximation algorithm called

HNSW, which can scale to huge amounts of vectors.

So vector search is very powerful.

So let's see a little demo of that.

So we set up our search curves again.

And so now when we perform the vector search, we

have our query.

We're going to turn that into a vector and we're

turning it in the vector using Azure Open AI, one

of the text embedding models.

And then when we search, we're going to only pass

in that vector here.

So I'm actually not passing in the text at all.

I'm only passing in the vector.

So I can see what can I get with just

that vector, right?

What is the most semantically similar to that vector?

So here I'm trying that, you know, the, the query

that keyword search struggled with water plants efficiently without waste.

And this time the results are much better, right?

So the very first result is a self watering planter.

Then we get our 25 foot hose, we get some

tomato plant food, you know, and that's not quite as

good and then some planters.

So maybe the, you know, the ones that found that

really good are at the very top there.

So there, you know, you're looking at that going like,

wow, vector search was way better for that query.

Can we just use vector search for everything, Right.

And you'll hear many people say like, Oh yeah, just

set up a vector search.

Just set up a vector database.

That's all you need.

That is not all you need because vector search can

fail in so many ways.

So we look for this example 100 foot hose that

won't break.

This is something I've been searching for my entire life.

If anyone can find it please tell me because Zava

doesn't exist.

So I really really need this.

So what we can see with vector search, it's very

interesting.

The top result is actually 50 foot hose and the

2nd result is 75 hose.

The 100 foot hose is result #3 and why does

that happen?

Well, in the vector and betting space, it doesn't really

care about numbers to to like a vector and betting

model 5075100.

They're all kind of the same thing.

They're just like a number you put in there.

Like it doesn't really think that they're semantically that different.

So we don't end up getting 100 foot as being

the top result here, right?

So this is where you can see where vector search

like it did, still did a decent job because it

did find hoses and we do see 100 foot hose

somewhere in there.

But as a user, I might be thinking, wait, if

you have 100 foot hose, why isn't that just number

one right?

So what we're going to do is use the best

of both worlds.

So the first thing we need to do is we're

going to take that search query, we're going to take

that search vector and we're going to use it to

search with both keyword search and vector search.

So we'll get back the results for each of those

and we'll have relative ranks in each results.

Then we're going to merge them together using this algorithm

called reciprocal rank fusion.

And it sounds super fancy because computer scientists like to

write papers with fancy names, but what it really is,

is just looking at the relative rank and be like,

oh, this was like #3 over there and #5 over

there.

We're just going to kind of, you know, like averages

together and and see what the rank ends up being,

right?

So it's a good way of like just representing the

relative ranks across both of them.

So, you know, we can go and check it out

in our example.

So here in order to do reciprocal rank fusion, I'm

going to take that search query.

I'm going to take that search vectors.

I'm going to pass both of those in.

So here's the search query, here's the vector query, and

then I'm going to look at the results.

So this time we still have a 50 foot hose

is #1 but that 100 foot hose did move to

the number two spot and that's just because of where

it was in the keyword results and the vector results

that got it high enough to get to #2.

So here you can see that we are getting better

by using the RF, but we can get even better

than that.

So the next step is to bring in the RE

ranking model.

This is a particular kind of model.

It is not a large language model.

It is something called a cross encoder model that has

been specifically trained to look at a user query, look

at search results, and then assign scores.

And so this was actually trained with humans where humans

would look at search queries, look at results and say,

hey, I'll give that like a four.

That was really good.

I'll give that a one.

That was really bad, right?

I'll give it a two.

It was kind of meh, right?

So you can actually look at it and be like,

oh, it got a four.

That's a really good result.

Oh, it got a one that's bad result.

We're just going to throw that out, right?

So it's incredibly useful, you know, ranking model to use.

Let's see what happens when we use it for for

this example here.

Now, in order to use it with AI search, we're

going to pass in our query, we're going to pass

in our vector and we're going to specify that we

wanted to use the semantic ranker.

So it's just a couple extra parameters that we throw

in.

And so then when we look at the results, we

can see that the hose got up to 1, the

number one spots, right, because the re ranker model is

actually looking at that original query and going, hey, they're

looking for 100 foot hose.

There's 100 foot hose seems like the best result for

it.

Now it did some of the other ones like I'm

like kind of a little dubious about, but maybe it

just didn't find better results there.

I mean, they're all kind of related to hoses that

won't break.

So I wouldn't, I certainly wouldn't be against buying a

lot of those things.

But that's the point of the re ranking model.

It was able to hoist the most important result to

the top that really matched that query.

So if you are, you know, using this hybrid search,

you really want to have that re ranking model.

The other thing that's super powerful about it is that

we get the re ranking score.

It's hard to see it in this model here, but

you can see it's like 1 and twos.

And so actually if some things like oftentimes we use

a threshold of 1.9 and if it's less than 1.9,

we just throw it out.

We just say, hey, that's just not good enough.

We want to have really, really high quality results.

So that's the other nice thing about that re ranking

model is that it's absolute scores where you can actually

just say, you know, after a certain threshold we're we

just think it's not good enough quality.

All right, so this is the complete hybrid search flow

that we showed earlier, right?

We're going to do both those kinds of searches, we're

going to merge them together and then we're going to

do that re ranking step.

If you are searching and you want really good search

quality, you need to be using this stack as your

AI search is a great option for it.

It is also possible to implement on top of a

few other stacks as well if you if you need

to do it.

But AI search has it built in and it's just

super easy to do with it.

Now the AI search team has actually done research in

order to verify why it's so important to have this

stack in this example here where they looked at lots

of different kinds of queries, right?

We look at keyword queries, we'll get short queries, long

queries concepts.

There's so many different kinds of queries that are going

to be thrown out your applications, right?

If they're user facing like users, right.

The darndest thing is right, if you give a user

a text field like, Oh my God, this stuff to

put in there, right?

So you get, you have to be prepared for all

these kinds of queries.

And so they did all this research to show like,

listen, if you want the best results across all of

those, you really need that entire search stack.

So hopefully I've sold you on hybrid search.

Now, the thing about hybrid search is that AI search

has supported this for what, maybe 2 years now?

It's been a couple more.

Yeah.

OK.

So like we've like, that's all.

It's almost old news, but I feel the need to

talk about it because I don't think everybody realizes like

how incredible it is.

So you should definitely be using hybrid Search whenever possible.

However, today we're actually here to talk about when hybrid

search isn't enough, when we need to build additional strategies

on top of hybrid search, right?

So here I have some examples of hard queries that

we need to address with more genetic techniques, right?

So the first example here is having multiple questions in

one query, right?

So this one is like, oh, what type of paint

is most suitable for the bathroom?

What's the price range of all these different options?

Like they're asking a lot of things in one query,

and it's not the kind of thing we can answer

with a single search call, right?

We actually have to decompose that question into multiple questions.

A related kind of question is what I call a

chained query.

This one is like explain how to paint my house

most efficiently, then give me a list of the products

that would help me, right?

So in order to answer the second part of the

question, we first need to get the search results for

the first part of the question, right?

And so that implies we have to do some sort

of sequence of search calls.

So that's quite interesting.

And then the third kind is queries requiring external knowledge,

right?

So a lot of people expect your applications to both

be able to search your data, but also just generally

know things about the world, right?

So they need to be able to search the web

in order to answer those questions.

So those are the kind of interesting queries that you're

going to see in many of your applications that the

AI search team has been figuring out a way to

approach.

So let's have Matt talk about that.

Thank you so much, Pamela, for that great introduction to

RAG.

But we're here today to also talk about knowledge bases

inside Azure AI Search because this feature is what's going

to improve on top of those hybrid search strategies using

agentic retrieval.

So when we talk about agentic retrieval, here's specifically what

we mean.

Every knowledge base inside Azure Search has an Agentic Retrieval

engine, which essentially the whole purpose is to define better

context whenever you're trying to use RAG for agents or

any kind of agentic application.

The three core components of this engine are query planning,

knowledge sources, and merged outputs.

The first part of Agentic Retrieval is using an LLM

to break down a complex conversation into individual queries representing

the the basic information need from that conversation.

Now, part of this query planning process, it goes beyond

just generating the queries.

We're also selecting the knowledge sources which are necessary to

answer these queries.

These knowledge sources represent all the data that your agent

would need to answer questions or perform any relevant tasks

that it needs to do.

The queries that it generated are sent to every selected

knowledge source to gather the relevant documents.

Finally, the results of these queries are merged together and

are used to produce a single synthesized answer with citations.

Now, if the agentic retrieval engine determines that the results

found are not sufficient to answer the queries, it's actually

going to take a second pass.

It's going to repeat that query planning phase taking into

account the results it's already seen.

So this is a very powerful feature that lets you

get the best out of hybrid search and get the

best context for your agents.

So when we talk about knowledge sources, we have two

main different categories.

Indexed knowledge sources represent data which is actually going to

get copied out of some original data repository.

Maybe you've landed a bunch of PD FS inside a

BLOB container.

Maybe you've got a 1 lake lake house full of

relevant files.

We're actually going to take those files and we're going

to copy them into an Azure AI Search index so

we can perform hybrid search with re ranking on them.

Remote knowledge sources are a bit different.

Instead of copying the data directly into Azure AI Search,

we're going to create a connection.

Maybe you need to add information from the web and

in private preview, we're happy to announce that you'll be

able to bring any MCP connector as a knowledge source.

Now, Sharepoint's a bit special here.

You'll see it's in the middle of this Venn diagram.

We're actually offering two ways to talk to SharePoint.

The first way allows you to copy files out of

a SharePoint site into a search index.

The other way is to directly query SharePoint using an

end user's identity.

So let's let's start off by talking a little bit

more about how this remote SharePoint knowledge actually works.

When remote SharePoint knowledge is queried, we actually are going

to need the end user's identity.

This is the person that's actually interacting directly with the

agent because we need to make sure that the documents

that they have access to are only shown to them.

We should not be showing additional documents.

In this example, you can see an end user in

a sales organization is asking about some executive documents they

don't actually have access to, so we are actually going

to go past that identity on to SharePoint.

They perform the access control and trimming so that the

relevant results of I don't know will eventually be generated.

Note that if you already use Copilot, you're going to

get very similar results here, as we actually use the

same underlying index on SharePoint that Copilot uses.

For index SharePoint, it's a bit different because we're actually

taking those files out of SharePoint and creating a copy

inside your Azure Search Index.

We do this using an existing feature in Azure Search

called Indexers and Skill sets.

The indexer is actually going to be responsible for going

out to the SharePoint site and fetching the files.

The skill set actually takes those files and it chunks

and vectorizes them, which is a very critical step to

allow hybrid search to be successful.

Note that even though we're actually copying the data out

of SharePoint, we're actually going to preserve as much permission

metadata as possible, and you can still use an end

user's identity to filter the results that are coming out

of SharePoint.

So in general, any indexed knowledge source is going to

use the same strategy.

We're going to leverage the same indexer and skill set

integration, and we're actually going to be using what are

called skills, which are basically reusable components that apply AI

enrichment to your documents from your data repository.

This allows you to get the best search results possible.

Want to really highlight a brand new feature we're also

announcing at this conference, which is a better integration with

content understanding.

You have two main options when indexing knowledge from outside

data containers like BLOB and one link.

You could use a built in free parsing strategy from

indexers, what we call minimal.

And just like the name suggests, we're going to be

creating a minimal representation of that content.

So we'll be able to queried.

But if your content has images embedded tables, you're actually

going to benefit from using the standard strategy, which leverages

a content understanding deployment that you bring to create a

significantly richer representation of this content.

In the example you can see on the screen, you're

going to notice that we have a flow chart, and

if you were to use the default minimal strategy, this

content would be completely missing.

Because we're using content understanding, we're actually going to convert

it to this figure tag, and the text is actually

going to be OCR Ed and made available so that

if you were to use an LLM to reason over

it, it could actually see the underlying text in the

diagram.

Now, when you're using knowledge bases, you're probably using this

in a larger agentic context where you've got a lot

of moving parts and you're probably worried, how can I

control the cost and latency from retrieval from my knowledge

base?

We offer a single control today we call retrieval reasoning

effort.

There are three main levels here.

The first is the minimal effort, which is the cheapest

option for getting information out of agentic retrieval.

Low effort is a more balanced option that allows you

to get good results at higher latency.

And finally, medium effort is the step that's going to

take the most effort to get the most comprehensive results.

Now, Matt, you seem to be missing a high.

That's a great point.

In the future, we hope to extend retrieval reasoning efforts

to offer a more advanced capabilities for retrieval.

Maybe a super high?

Maybe, But for now, these are the three options we

have.

So let's start off by talking about what this minimal

effort actually is.

Minimal effort is actually really interesting because it is effectively

a way to use knowledge bases without any LLMS at

all.

You are giving up some advanced features like query planning,

knowledge source selection in order to get lower latency.

If you need results out of your knowledge base fast,

this is definitely the right effort for you.

Now note that because you need to do a query

planning anyway, you have to actually give us the queries

you want to run.

This is a great fit whenever you want to combine

an agent with a knowledge base.

Now let's go check out a demo of this minimal

effort.

All right, so here I have the conversational RAG application,

and this time I have a gentle retrieval enabled.

I've set that reasoning effort to minimal and I have

included SharePoint.

So it does have the option to search both a

search index and SharePoint.

So this is a knowledge base that has two sources

configured, a search index and a SharePoint source.

And you can see that I'm logged in.

Thank you.

And so this is where I'm logged in.

So that's the SharePoint it would have access to.

So here I've once again asked the question, what is

the best Zaba paint for bathroom walls?

And this time I actually get a slightly different answer

because it has access to both the search index and

SharePoint.

And we can see citations here.

And some of these citations are actually files on the

SharePoint.

That's right, SharePoint.

Now, if we look at the process that it used

to get this right, we took the user's query and

we just directly in this case, we just directly sent

it to that minimal knowledge base.

We said, hey, here's the user question.

Just, you know, just use it to search all the

sources right?

So with minimal, it always searches every single source you

configured for that knowledge base, right?

So it took that question and it sent it to

the index, it sent it to the SharePoint.

It got back, you know, like 6 results for the

1-2 results for the, the second one.

All of those go through the semantic rancor and get

merged together.

And then, and then we use our own model in

this application in order to answer the question based off

those results.

So this is an example of how you might integrate

minimal into an application.

It is the easiest switch if you're already using like

the search function of the search SDK and you want

to start using multi sources.

The easiest switch is just to bring in minimal and

and try that out and you'll find that you could

kind of just, you know, just swap it in there.

Thanks for that demo.

That was great.

Now let's talk more about the low effort option.

Low effort is going to give you access to those

more advanced features from the Agentic Retrieval engine.

Because we're using an LLM, this is the mode that

instead of just taking individual queries, it's going to take

an entire conversation, run it through that query planning process,

and break it down into these decomposed queries.

Now we run this knowledge source selection process as part

of query planning.

So in addition to getting those queries, we're also going

to pick which knowledge sources we're going to use.

Now we send these queries to either remote or index

knowledge sources to fetch the relevant documents.

And finally, we have an answer synthesis option, so you

can actually get a complete answer that you could render

directly in your application that includes citations.

Yeah, So this query planning step that does knowledge source

selection, let's talk a little bit more about how exactly

that works.

There are really three key factors in how knowledge sources

are selected.

We actually use that LLM to decide.

So the three main inputs are the name of the

knowledge source.

You have an optional description, so you can give it

like say like, hey, I have a BLOB container that's

a knowledge source.

Maybe this BLOB container contains a bunch of HR documents.

I have one leg, it contains a bunch of invoices,

and then the web should only be used in certain

circumstances.

So these descriptions are really critical to getting the most

out of knowledge source selection when you choose to use

it.

Finally, you can also provide some custom retrieval instructions which

allow you to basically customize this selection process in natural

language.

It looks very, very similar to a prompt.

Let's do a demo of the low effort.

We're back to our application.

You can see this time we have low effort enabled

and we are including SharePoint, not including web yet.

We'll see that soon.

And this time I'm asking a more complex question.

I'm asking which Saba Paint can I use to paint

my bathroom and how much does it cost?

Now remember this has access to two sources, a search

index and a SharePoint.

And what we can see here is that it decided

to only get results from the search index.

I can tell that from the citations.

I can also click on and see this thought process

where we can see it does some query planning.

It breaks down that query into multiple queries.

So it says, oh OK, I'm going to search for

Zaba paints for bathrooms and it gets back 6 results

there from the search index.

And it's also going to search for Zaba paint prices

and it gets back 10 results.

So those must all be paints.

And here you can see it actually decided not to

search the SharePoint at all.

And actually I agree with that decision because when I

set up the knowledge sources, I told it, hey, listen,

if you need Zaba products and prices, just search the

search index.

That's where they are.

That's the only place you need to go.

So it decided that it wasn't worth searching SharePoint, right?

So this is a way that we can like save

some costs where we don't have to search those sources

if we don't need to.

There is actually an option where you can say always

query source.

So if you are in a situation where you do

want it to force query every source, you can do

that.

But it is nice to have this dynamic source collection

because you can save time, you can save money, you

can save tokens and it can do a good job

deciding.

And so then we get back the results there.

So that is low without web.

Thank you.

So let's talk a little bit more about how this

answer to this process actually works.

The main feature that we're offering here, in addition to

kind of pre generating an answer, is allowing you to

customize the style and tone with an additional set of

natural language answer instructions.

So we have a couple examples here to kind of

illustrate how much of an impact you can have on

the style and tone of the generated answer.

The first example is this kind of defaults what we

get out-of-the-box and you're going to notice it's a little

more verbose, which is good.

And that's generally speaking, a good starting place.

But maybe you want to give a guidance to just

hey, answer with bullet points only.

And I my personal favorite is going to be a

more stylized poetic answer.

So for that bathroom paint question we had, we can

kind of get a much more poetic answer.

You know this this moisture in the air If you're

in a bathroom semi gloss shines strand song here bathrooms

47 dollars.

So you can really customize exactly how this answer shows

up, even a little bit silly, but it's a really

powerful feature.

So let's talk now about the web knowledge feature because

many times your knowledge sources are going to cover a

lot of internal information your organization would know.

For example, it's like, hey, I've got manuals, I've got

training information.

But a lot of times your agent agent can benefit

from public, up to date information that's available on the

web.

So by adding the Bing Web knowledge source, you can

actually fill in this gap.

You are able to search the entire web or specify

a custom list of domains.

Now note that in order to use this feature, you

do have to opt in to answer synthesis.

That is not optional.

So let's go take a look at a demo.

All right, so here we can see that we're on

low and we are including the web source.

Now this time we're asking what is the best Zaba

paint for bathroom walls and how does it compare to

other brand paints.

So it's obviously a question that requires going out to

the web to find out about other brands paints.

And so when we look at the answer here, we

can see that there's in fact a lot of websites

that are cited.

We've got fixer.com, we've got the ultimate paint brand comparison

from Perfect Touch PTP, right?

So it's gone out and found all these additional web

sources.

And we can look at the process here and we

can see that it decomposed that query into multiple queries.

So it searched for best Zaba paint and then it

searched for comparison to Zaba Bath and paint to other

brand paints.

It searched the web for both of those as well,

right?

So it takes both those two decomposed queries and sends

it to both the search index and the web, get

backs lots of results and then merges those together and

uses the answer synthesis in order to come up with

a result.

That answer, that synthesized answer still has citations.

So you can see I can still make everything clickable,

the stuff in my BLOB, the stuff on the web,

they can reference everything to find out if they actually

trust fixer.com, right?

Which is important, right?

That's the whole point of these RAG applications, is to

give users a way to get accurate information that they

can back up with citations.

Thanks for that great demo.

All right, now let's finish up the reasoning efforts here

with medium effort.

Medium effort is the one that actually adds this optional

iterative retrieval step.

This means that if the agentic retrieval engine determines that

results retrieved from the initial search aren't sufficient to answer

the question, we're actually going to do a second pass

of query planning and retrieval to try to get a

better answer.

So we have a big problem here.

In order to know if we have to do the

second iteration, we have to actually decide how to do

that.

We have actually introduced a new model for the first

time into the Agentic Retrieval engine.

It's only accessible on the medium retrieval reasoning effort mode.

We call this model Semantic Classifier.

It performs really two key tasks that enable this confident

iteration.

The first is to decide, hey, is there enough information

in the results of each query to actually answer the

results of the underlying question.

In addition to that, we also want to be sure

we found at least one highly relevant document to answer

the question in these queries.

Now, if we don't meet these conditions, we're actually going

to go and do that second iteration because we want

to try to get the best results.

So basically this is allowing us to confidently iterate rather

than just iterating all the time, which can be a

big challenge in a Gentek rag.

Now, when the second iteration is performed, we're not just

saying, hey, try again.

We're actually passing additional contacts to that query planning that

wasn't present in the first iteration.

We're actually going to use the documents that were retrieved

from the first retrieval and the original queries so that

we can better formulate a more intelligent second pass that's

taking into account the results from the first.

So let's go take another look at a demo of

how this looks like.

All right, so here we have medium enabled and we've

got web, we've got SharePoint and we've got our hardest

question.

So it says explain how to paint my house most

efficiently.

Then give me a list of the Zaba products and

prices for each supply.

So let's see what it decided to do.

So it it came up with a query most efficient

way to paint a house worse, that'd be fun.

Then Zaba products and prices for house painting supplies.

And this first query actually didn't get any results from

the search index.

It did get a lot of results from the web

when it searched for, you know, efficient way to pound

a house didn't get any for Zaba products because Zaba

is a made-up brand.

So it's a little hard to search the web for

it, but hopefully all of you are working for companies

that exist and it's going to be easier to find

you on the web.

So then it looked at those results and said, OK,

it, you know, it found a lot of results that

could answer the question about painting a house most efficiently.

But it realized it couldn't yet answer the question about

Zaba products.

So it decided it needed to do a second iteration.

And in that second iteration it comes up with a

new set of queries and source selection that will help

it get a comprehensive answer.

So then the second iteration, this time it searches the

search index again, it comes up with a different query

and it gets 4 results right?

It also did a bunch of other searches.

It got more specific.

It was like looking for drop cloth cock like prep

materials.

Unfortunately Zava doesn't have drop cloth, but if it did

it would find it right?

So it actually gets very clever in that second iteration

and comes up with some really good queries based off

that first iteration to get much more focused results.

So it can really help in getting much more comprehensive

answers to these complex questions.

And there we go.

So all of the examples that I have been showing

are from an open source repo.

So and this repo, we added the agentic retrieval feature

to it today on Monday.

It's been a very exciting week for us.

And so any of you who want to get started

with RAG, A conversational RAG in your domain, definitely check

out that repo.

We've had thousands of developers deploy it.

It's got tons of feature, multimodal data access, cloud ingestion,

just all the different features that people are wanting out

of a conversational rag solution.

So certainly check it out.

It can be a great starting point and great inspiration

for all of you to see how we tackle common

common issues in these sort of applications all.

Right, that's awesome.

So now let's switch gears a bit and let's talk

about how knowledge bases fit in the Foundry Azure Search.

Knowledge bases are going to give us reusable topic centric

collections to actually ground our agents.

Now with Foundry IQ we're actually able to take those

knowledge bases using MCP to give our agents a unified

knowledge layer.

The result is going to be it's much simpler to

build agents instead of stitching a bunch of separate data

retrieval tools together to get the same results you could

get from a single knowledge base.

Now the magic question is how exactly is Foundry IQ

going to enable us to use knowledge bases to ground

agents?

So we are actually going to be using delegation the

same way that you might be familiar with the MCP

protocol to use external services in your agents.

You're able to use MCP to connect to your knowledge

base.

The agent is actually going to play the role the

role of the query planner and the answer synthesizer here

as input.

We're going to take a bunch of separate intents or

queries and the output is just going to contain the

merged results.

So let's check out a demo of Foundry IQ you

want.

To do this one.

Yeah, So what I'm seeing I'm showing you here is

the new experience inside Foundry.

It's a kind of unified agent builder.

So I'm able in a single place to see all

my tools, knowledge, data, evaluations.

This is a demo of the agent playground.

So once I built an agent, I want to try

it out, customized instructions.

I'm able to use this UX to actually see how

my changes make.

So I'm using the same exact knowledge base that Pamela

was showing in her demo, but this time I'm using

it through an agent rather than through that deployed application.

So when I ask the same question, what's the best

solve of paint for bathroom walls, you're going to see

that I reach out to the knowledge base using MCP.

I use this knowledge base retrieval tool which I have

been approved and I end up getting an answer that

looks exactly the same to what I got in the

application.

So this is kind of a great way to lift

and shift your knowledge bases so you can actually use

them inside your Foundry agents.

Yeah.

So that about wraps up our presentation today.

I want to put up a call.

You can take it.

Feel free to take a picture of this slide.

Again, we invite you all to sign up for the

private preview of MCP Now sources and we'd also like

to open up the floor.

Anyways, any questions do you like to share please?

Do we have a mic?

Oh, they could probably go there.

Yeah, there's a mic.

And all right, if you do have a question, there

is a mic right there.

You can come up and ask it.

Don't be shy.

We've got 4 minutes left to answer questions.

There we go.

I have no idea how this works.

No, I think you can hear us.

So up to you.

I have I have a question with regards to the

knowledge sources that you can add in the knowledge base.

I was looking at it from the Foundry portal.

I don't know like on the Azure AI search side,

can you add as a knowledge source only a specific

index from like an Azure AI search resource or you

need to add the full search resource?

That's a great question.

So it depends.

With index knowledge sources, what you're actually going to do

under the hood is create an indexer and a skill

set so data from an outside source is brought into

an automatically created search index.

With remote knowledge sources, you don't ingest any data at

all.

You directly connect to an external source of information and

query it at retrieval time.

Yeah.

And with those remote knowledge sources, we get this question

a lot.

Like with SharePoint, if you do need to filter the

SharePoint, you can use a filter expression you could pass

in like a site ID if you want to restrict

it to a particular part of your SharePoint authors, etcetera.

And similarly with the web, you can specify domain filters

to just, you know, limited to websites that aren't sketchy,

right?

Small list, but but you've got these abilities that you

can that you can filter down those remote searches as

well.

We also have a survey slide too.

Yeah, we do also have this QR code for you

to give feedback about the session, and you can scan

that QR code in order to fill it out.

We have a couple more minutes if anyone does want

to go with Mike.

Of course, we'll also be here after the session to

answer any questions that you don't want to ask in

front of everyone.

And again, we thank you all so much for joining

us.

I know it's pretty late in the day, so thank

you very much for attending our session.

Thank you for joining.

Us at Microsoft Ignite Silence Stages Please leave your headset

at your seat as you exit.

Thank you for your cooperation.

Can I ask a question?

Oh, of course, yeah.

This is like magic.

You've kind of taken what I wanted to build and

commoditized it.

So I think you're like, this is can you go

closer to the.

Yeah yeah.

So this is kind of you've commoditized exactly what I

think we need to do to provide agents and knowledge

bases.

I wonder you haven't is there any worth of storing?

Graphs and relationships of separate entities, kind of more connections,

the connections between certain entities.

Yeah.

Does that have value in the knowledge base or is

that something that you confer anyway?

Yeah.

So I can tell you that graphs are something we

are very interested in as well.

We don't have a built in graph, but if you

can join the MCP private preview, if you have a

graph database, you are able to add that as a

knowledge source.

Awesome.

Thank you.

Of course.

Thank you.

So what kind of chunking strategies do you recommend when

you use this from the portal?

We don't get a choice of the chunking strategies.

So so we offer a built in chunking strategy which

is some defaults.

So there's two options.

The first option is you can do the chunking totally

yourself, push the data into the search index and complete

control.

The second option is to use an indexer with a

custom skill set and we actually have a built in

skill called split skill and you can customize the chunking

strategy to some degree on that skill or you can

define a custom skill that does the chunking completely the

way you want it.

Actually in the repo we added support for custom skill

sets and we use our custom chunking strategy with the

built in indexers that way.

Build agents with knowledge, agentic RAG and Azure AI Search | BRK193

Microsoft Events

101 days ago

44:17

RAG & Vector Search

Rank #1

Description

Start building your next agent with the latest knowledge features from Azure AI Search. In this session, we will demo how to connect your agentic retrieval engine to new knowledge sources like Sharepoint, web and blob. We will also walk through new controls available to improve your RAG performance, across query planning, retrieval and answer generation. Join this code-focused breakout for samples and step-by-step guidance on connecting knowledge to your next agent. Delivered in a silent stage breakout. To learn more, please check out these resources: * https://aka.ms/ignite25-plans-agenticsolutions 𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Pamela Fox * Matthew Gotteiner 𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This is one of many sessions from the Microsoft Ignite 2025 event. View even more sessions on-demand and learn about Microsoft Ignite at https://ignite.microsoft.com BRK193 | English (US) | Innovate with Azure AI apps and agents, Microsoft Foundry Breakout | Expert (400) #MSIgnite, #InnovatewithAzureAIappsandagents Chapters: 0:00 - Introduction and Session Overview 00:10:34 - Hybrid Search and Reciprocal Rank Fusion Explanation 00:12:45 - Applying Semantic Ranker in AI Search for Improved Results 00:14:10 - Overview of the Complete Hybrid Search Flow 00:18:49 - Indexed vs Remote Knowledge Sources and SharePoint Integration 00:27:21 - Knowledge Source Selection and Query Planning Explained 00:31:08 - Using Web Knowledge Sources with Bing Integration 00:34:41 - Second iteration enhances query with additional context from first pass 00:35:09 - Demo: Medium mode retrieval solving complex multi-query example

Video Details

Category

RAG & Vector Search

Featured Date

December 29, 2025

Quality Rank

#1

AI Recommended