Loading video player...
He gave us a thumbs up.
Thumbs up.
OK.
All right.
We want to welcome you all to our session today
about building agents with Knowledge Agentic, RAG, and Azure AI
Search.
My name is Matt, I'm a program manager on Azure
Search.
I'm Pamela and I am a Python cloud advocate.
Does it say to clap right now?
All right, let's talk about our agenda for today.
First, we're going to cover the basics of RAG retrieval,
augmented generation.
Then we're going to do a deep dive on knowledge
bases inside Azure AI Search.
We're also going to cover how foundry and knowledge bases
are connected through Foundry IQ.
And we're going to have time to take some questions.
Let's dive right in.
All right, so I'm going to talk a bit about
RAG.
So how many of you are using RAG today, right?
That's a very good number.
Awesome.
Now just to, you know, get us all on the
same page here, right?
We're all building these applications.
We're starting to build agentic applications and agents and we're
seeing agents in many different parts of our work flows,
right?
And these agents can be conversational.
That's what a lot of us originally started building.
And now we have more task oriented agents that are
actually taking actions and doing stuff on our behalf.
Now the majority of these agents need domain specific context
to ground themselves in your organization's data, right in the
information that matters for you, for your task.
And that is where we need to have very good
retrieval and very good RAG.
So RAG is the name for the technique that LLMS
use in order to use your information.
RAG stands for retrieval augmented generation.
The basic idea is that you get in a question,
use that question to search a search index.
You get back those results and then you send them
to an LLM and say, hey, LLM, here's the results,
here's the original question or intent.
Now please answer based off the results and provide citations.
Right?
I actually have a demo that I can show of
a very basic, the most basic rag application.
Right?
Of course we're asking about Zaba.
What is the best Zaba paint for bathroom walls?
It says it's the interior semi gloss paint, very important
for washing when your children decide to draw all over
your bathroom, as happens to me a lot.
And if we look at the the process here, what
we do in this very basic rag is we take
that question, we send it to our AI search index,
we get back results.
These are chunks of documents that we've indexed and you
know, they have file pages and scores.
And then we send those results to an LLM and
say, hey, your job LLM is to answer the question
based off these sources.
And so then we get back that answer with citations.
That's the basic RAG.
And that is what we have, you know, many people
started with in, you know, in this space of RAG.
So let's talk a little bit more about Azure AI
Search is actually going to help you run advanced RAG
for better generative AI applications.
Azure AI Search provides customers with a feature rich vector
database built on an enterprise ready foundation so you can
focus on growth and scale.
It's integrations offer an end to end data management pipeline
for all types of data.
A full stack RAG solution available.
Where and how you want to build your retrieval strategy
matters.
Our comprehensive search technology allows you to go beyond vector
search.
We ensure that you have the capabilities you need to
retrieve accurate information for every question, regardless of your use
case or data.
So how many of you are using AI Search already?
Let's see some hands again.
Make sure everyone's listening to the right headphones.
Awesome.
Look at that, all your customers.
Very good.
All right.
So the thing that's fantastic about Azure AI Search is
that it has this state-of-the-art retrieval strategy.
It uses a full hybrid search.
So it's not just using vector search or just using
keyword search.
It is combining vector and keyword search together, merging those
results using reciprocal rank fusion and then re ranking those
results to get the very best results.
So we're going to start off with talking through this
search stop so you all understand hybrid search.
And then we're going to go on to all the
new agentic search strategies that you can layer on top
of hybrid search to get even more powerful results.
So there's just so much stuff that AI search has
to offer.
But let's start with the basics, right?
So keyword search is the traditional search that we've been
using for decades now, right?
And the idea of keyword search is that we store
an inverted document index, which says, hey, you know, for
this particular term, this is how often we saw in
the documents and the general frequency relative to the length
of the document.
So if I'm searching for hose and there's a document
that says hose, hose, hose, hose, hose all over the
place, that's probably a very good document for my, you
know, for my search for hose right now, as your
AI search uses BM 25, which is basically the best
in class full text algorithm and does a very good
job at handling keyword search.
Let's see, I've got a little example over here with
keyword search.
So what I've got is I'm using Python because it
is the best language and I'm setting up the Azure
AI search client, connecting to my search service and then
doing a search here for 25 foot hose.
OK, and this is searching.
This is, you know, fake Zaba product catalog that's got
names description categories price etcetera.
And what you can see is the very top result
is in fact a 25 foot hose.
So you can see here that keyword search did a
good job.
It actually found exactly what we needed, right?
And it did a good job here because 25 matched
foot matched hose match like, you know, like it's kind
of like a easy one for it right?
Now, the issue with keyword search is that it does
much worse with other kinds of queries, right?
So if I'm searching to figure out how do I
water my plants efficiently without waste.
Now when I do keyword search, I can see the
1st result is water based polyurethane.
The 2nd result is water based wood stain.
You should not water your plants with either of those.
I tried it just didn't work.
So this is an example where keyword search just utterly
fails right?
Because we asked this query that was much more broad,
ambiguous.
It doesn't have, you know, exact keywords in it.
And you know, and that's what keyword search does well
at right?
So This is why people get so excited about vector
search right?
This has really become big in the last three years
or so when we came up with these new vector
embedding models like the ones from Open AI, Tex embedding,
Tex 8 Embedding, all those great models.
And the idea with vector search is that we take
our information, we turn it into a vector, and that
vector represents the information in this multidimensional space.
And then when we get in a new query, we
convert that into a vector using that same embedding model.
And then we check and see like, hey, for this
vector, which vectors are the closest?
And we go and we see, oh, OK, like for
dog, cat is closest and puppy is closest, right?
And we find which ones are closest.
And the idea of that multidimensional space is to represent
similarity.
And it's it's been trained off the Internet and seeing
generally what terms, you know, course, you know, show up
together in in the Internet corpus, right?
So that's the idea of vector search.
And with Azure AS search, it has very good support
for vector search, one of the first search engines that
added it.
And you know, you can use it on your documents
and you can also use it if you have massive
vector databases.
So it can actually scale to handle searching across billions
of vectors because it can use this approximation algorithm called
HNSW, which can scale to huge amounts of vectors.
So vector search is very powerful.
So let's see a little demo of that.
So we set up our search curves again.
And so now when we perform the vector search, we
have our query.
We're going to turn that into a vector and we're
turning it in the vector using Azure Open AI, one
of the text embedding models.
And then when we search, we're going to only pass
in that vector here.
So I'm actually not passing in the text at all.
I'm only passing in the vector.
So I can see what can I get with just
that vector, right?
What is the most semantically similar to that vector?
So here I'm trying that, you know, the, the query
that keyword search struggled with water plants efficiently without waste.
And this time the results are much better, right?
So the very first result is a self watering planter.
Then we get our 25 foot hose, we get some
tomato plant food, you know, and that's not quite as
good and then some planters.
So maybe the, you know, the ones that found that
really good are at the very top there.
So there, you know, you're looking at that going like,
wow, vector search was way better for that query.
Can we just use vector search for everything, Right.
And you'll hear many people say like, Oh yeah, just
set up a vector search.
Just set up a vector database.
That's all you need.
That is not all you need because vector search can
fail in so many ways.
So we look for this example 100 foot hose that
won't break.
This is something I've been searching for my entire life.
If anyone can find it please tell me because Zava
doesn't exist.
So I really really need this.
So what we can see with vector search, it's very
interesting.
The top result is actually 50 foot hose and the
2nd result is 75 hose.
The 100 foot hose is result #3 and why does
that happen?
Well, in the vector and betting space, it doesn't really
care about numbers to to like a vector and betting
model 5075100.
They're all kind of the same thing.
They're just like a number you put in there.
Like it doesn't really think that they're semantically that different.
So we don't end up getting 100 foot as being
the top result here, right?
So this is where you can see where vector search
like it did, still did a decent job because it
did find hoses and we do see 100 foot hose
somewhere in there.
But as a user, I might be thinking, wait, if
you have 100 foot hose, why isn't that just number
one right?
So what we're going to do is use the best
of both worlds.
So the first thing we need to do is we're
going to take that search query, we're going to take
that search vector and we're going to use it to
search with both keyword search and vector search.
So we'll get back the results for each of those
and we'll have relative ranks in each results.
Then we're going to merge them together using this algorithm
called reciprocal rank fusion.
And it sounds super fancy because computer scientists like to
write papers with fancy names, but what it really is,
is just looking at the relative rank and be like,
oh, this was like #3 over there and #5 over
there.
We're just going to kind of, you know, like averages
together and and see what the rank ends up being,
right?
So it's a good way of like just representing the
relative ranks across both of them.
So, you know, we can go and check it out
in our example.
So here in order to do reciprocal rank fusion, I'm
going to take that search query.
I'm going to take that search vectors.
I'm going to pass both of those in.
So here's the search query, here's the vector query, and
then I'm going to look at the results.
So this time we still have a 50 foot hose
is #1 but that 100 foot hose did move to
the number two spot and that's just because of where
it was in the keyword results and the vector results
that got it high enough to get to #2.
So here you can see that we are getting better
by using the RF, but we can get even better
than that.
So the next step is to bring in the RE
ranking model.
This is a particular kind of model.
It is not a large language model.
It is something called a cross encoder model that has
been specifically trained to look at a user query, look
at search results, and then assign scores.
And so this was actually trained with humans where humans
would look at search queries, look at results and say,
hey, I'll give that like a four.
That was really good.
I'll give that a one.
That was really bad, right?
I'll give it a two.
It was kind of meh, right?
So you can actually look at it and be like,
oh, it got a four.
That's a really good result.
Oh, it got a one that's bad result.
We're just going to throw that out, right?
So it's incredibly useful, you know, ranking model to use.
Let's see what happens when we use it for for
this example here.
Now, in order to use it with AI search, we're
going to pass in our query, we're going to pass
in our vector and we're going to specify that we
wanted to use the semantic ranker.
So it's just a couple extra parameters that we throw
in.
And so then when we look at the results, we
can see that the hose got up to 1, the
number one spots, right, because the re ranker model is
actually looking at that original query and going, hey, they're
looking for 100 foot hose.
There's 100 foot hose seems like the best result for
it.
Now it did some of the other ones like I'm
like kind of a little dubious about, but maybe it
just didn't find better results there.
I mean, they're all kind of related to hoses that
won't break.
So I wouldn't, I certainly wouldn't be against buying a
lot of those things.
But that's the point of the re ranking model.
It was able to hoist the most important result to
the top that really matched that query.
So if you are, you know, using this hybrid search,
you really want to have that re ranking model.
The other thing that's super powerful about it is that
we get the re ranking score.
It's hard to see it in this model here, but
you can see it's like 1 and twos.
And so actually if some things like oftentimes we use
a threshold of 1.9 and if it's less than 1.9,
we just throw it out.
We just say, hey, that's just not good enough.
We want to have really, really high quality results.
So that's the other nice thing about that re ranking
model is that it's absolute scores where you can actually
just say, you know, after a certain threshold we're we
just think it's not good enough quality.
All right, so this is the complete hybrid search flow
that we showed earlier, right?
We're going to do both those kinds of searches, we're
going to merge them together and then we're going to
do that re ranking step.
If you are searching and you want really good search
quality, you need to be using this stack as your
AI search is a great option for it.
It is also possible to implement on top of a
few other stacks as well if you if you need
to do it.
But AI search has it built in and it's just
super easy to do with it.
Now the AI search team has actually done research in
order to verify why it's so important to have this
stack in this example here where they looked at lots
of different kinds of queries, right?
We look at keyword queries, we'll get short queries, long
queries concepts.
There's so many different kinds of queries that are going
to be thrown out your applications, right?
If they're user facing like users, right.
The darndest thing is right, if you give a user
a text field like, Oh my God, this stuff to
put in there, right?
So you get, you have to be prepared for all
these kinds of queries.
And so they did all this research to show like,
listen, if you want the best results across all of
those, you really need that entire search stack.
So hopefully I've sold you on hybrid search.
Now, the thing about hybrid search is that AI search
has supported this for what, maybe 2 years now?
It's been a couple more.
Yeah.
OK.
So like we've like, that's all.
It's almost old news, but I feel the need to
talk about it because I don't think everybody realizes like
how incredible it is.
So you should definitely be using hybrid Search whenever possible.
However, today we're actually here to talk about when hybrid
search isn't enough, when we need to build additional strategies
on top of hybrid search, right?
So here I have some examples of hard queries that
we need to address with more genetic techniques, right?
So the first example here is having multiple questions in
one query, right?
So this one is like, oh, what type of paint
is most suitable for the bathroom?
What's the price range of all these different options?
Like they're asking a lot of things in one query,
and it's not the kind of thing we can answer
with a single search call, right?
We actually have to decompose that question into multiple questions.
A related kind of question is what I call a
chained query.
This one is like explain how to paint my house
most efficiently, then give me a list of the products
that would help me, right?
So in order to answer the second part of the
question, we first need to get the search results for
the first part of the question, right?
And so that implies we have to do some sort
of sequence of search calls.
So that's quite interesting.
And then the third kind is queries requiring external knowledge,
right?
So a lot of people expect your applications to both
be able to search your data, but also just generally
know things about the world, right?
So they need to be able to search the web
in order to answer those questions.
So those are the kind of interesting queries that you're
going to see in many of your applications that the
AI search team has been figuring out a way to
approach.
So let's have Matt talk about that.
Thank you so much, Pamela, for that great introduction to
RAG.
But we're here today to also talk about knowledge bases
inside Azure AI Search because this feature is what's going
to improve on top of those hybrid search strategies using
agentic retrieval.
So when we talk about agentic retrieval, here's specifically what
we mean.
Every knowledge base inside Azure Search has an Agentic Retrieval
engine, which essentially the whole purpose is to define better
context whenever you're trying to use RAG for agents or
any kind of agentic application.
The three core components of this engine are query planning,
knowledge sources, and merged outputs.
The first part of Agentic Retrieval is using an LLM
to break down a complex conversation into individual queries representing
the the basic information need from that conversation.
Now, part of this query planning process, it goes beyond
just generating the queries.
We're also selecting the knowledge sources which are necessary to
answer these queries.
These knowledge sources represent all the data that your agent
would need to answer questions or perform any relevant tasks
that it needs to do.
The queries that it generated are sent to every selected
knowledge source to gather the relevant documents.
Finally, the results of these queries are merged together and
are used to produce a single synthesized answer with citations.
Now, if the agentic retrieval engine determines that the results
found are not sufficient to answer the queries, it's actually
going to take a second pass.
It's going to repeat that query planning phase taking into
account the results it's already seen.
So this is a very powerful feature that lets you
get the best out of hybrid search and get the
best context for your agents.
So when we talk about knowledge sources, we have two
main different categories.
Indexed knowledge sources represent data which is actually going to
get copied out of some original data repository.
Maybe you've landed a bunch of PD FS inside a
BLOB container.
Maybe you've got a 1 lake lake house full of
relevant files.
We're actually going to take those files and we're going
to copy them into an Azure AI Search index so
we can perform hybrid search with re ranking on them.
Remote knowledge sources are a bit different.
Instead of copying the data directly into Azure AI Search,
we're going to create a connection.
Maybe you need to add information from the web and
in private preview, we're happy to announce that you'll be
able to bring any MCP connector as a knowledge source.
Now, Sharepoint's a bit special here.
You'll see it's in the middle of this Venn diagram.
We're actually offering two ways to talk to SharePoint.
The first way allows you to copy files out of
a SharePoint site into a search index.
The other way is to directly query SharePoint using an
end user's identity.
So let's let's start off by talking a little bit
more about how this remote SharePoint knowledge actually works.
When remote SharePoint knowledge is queried, we actually are going
to need the end user's identity.
This is the person that's actually interacting directly with the
agent because we need to make sure that the documents
that they have access to are only shown to them.
We should not be showing additional documents.
In this example, you can see an end user in
a sales organization is asking about some executive documents they
don't actually have access to, so we are actually going
to go past that identity on to SharePoint.
They perform the access control and trimming so that the
relevant results of I don't know will eventually be generated.
Note that if you already use Copilot, you're going to
get very similar results here, as we actually use the
same underlying index on SharePoint that Copilot uses.
For index SharePoint, it's a bit different because we're actually
taking those files out of SharePoint and creating a copy
inside your Azure Search Index.
We do this using an existing feature in Azure Search
called Indexers and Skill sets.
The indexer is actually going to be responsible for going
out to the SharePoint site and fetching the files.
The skill set actually takes those files and it chunks
and vectorizes them, which is a very critical step to
allow hybrid search to be successful.
Note that even though we're actually copying the data out
of SharePoint, we're actually going to preserve as much permission
metadata as possible, and you can still use an end
user's identity to filter the results that are coming out
of SharePoint.
So in general, any indexed knowledge source is going to
use the same strategy.
We're going to leverage the same indexer and skill set
integration, and we're actually going to be using what are
called skills, which are basically reusable components that apply AI
enrichment to your documents from your data repository.
This allows you to get the best search results possible.
Want to really highlight a brand new feature we're also
announcing at this conference, which is a better integration with
content understanding.
You have two main options when indexing knowledge from outside
data containers like BLOB and one link.
You could use a built in free parsing strategy from
indexers, what we call minimal.
And just like the name suggests, we're going to be
creating a minimal representation of that content.
So we'll be able to queried.
But if your content has images embedded tables, you're actually
going to benefit from using the standard strategy, which leverages
a content understanding deployment that you bring to create a
significantly richer representation of this content.
In the example you can see on the screen, you're
going to notice that we have a flow chart, and
if you were to use the default minimal strategy, this
content would be completely missing.
Because we're using content understanding, we're actually going to convert
it to this figure tag, and the text is actually
going to be OCR Ed and made available so that
if you were to use an LLM to reason over
it, it could actually see the underlying text in the
diagram.
Now, when you're using knowledge bases, you're probably using this
in a larger agentic context where you've got a lot
of moving parts and you're probably worried, how can I
control the cost and latency from retrieval from my knowledge
base?
We offer a single control today we call retrieval reasoning
effort.
There are three main levels here.
The first is the minimal effort, which is the cheapest
option for getting information out of agentic retrieval.
Low effort is a more balanced option that allows you
to get good results at higher latency.
And finally, medium effort is the step that's going to
take the most effort to get the most comprehensive results.
Now, Matt, you seem to be missing a high.
That's a great point.
In the future, we hope to extend retrieval reasoning efforts
to offer a more advanced capabilities for retrieval.
Maybe a super high?
Maybe, But for now, these are the three options we
have.
So let's start off by talking about what this minimal
effort actually is.
Minimal effort is actually really interesting because it is effectively
a way to use knowledge bases without any LLMS at
all.
You are giving up some advanced features like query planning,
knowledge source selection in order to get lower latency.
If you need results out of your knowledge base fast,
this is definitely the right effort for you.
Now note that because you need to do a query
planning anyway, you have to actually give us the queries
you want to run.
This is a great fit whenever you want to combine
an agent with a knowledge base.
Now let's go check out a demo of this minimal
effort.
All right, so here I have the conversational RAG application,
and this time I have a gentle retrieval enabled.
I've set that reasoning effort to minimal and I have
included SharePoint.
So it does have the option to search both a
search index and SharePoint.
So this is a knowledge base that has two sources
configured, a search index and a SharePoint source.
And you can see that I'm logged in.
Thank you.
And so this is where I'm logged in.
So that's the SharePoint it would have access to.
So here I've once again asked the question, what is
the best Zaba paint for bathroom walls?
And this time I actually get a slightly different answer
because it has access to both the search index and
SharePoint.
And we can see citations here.
And some of these citations are actually files on the
SharePoint.
That's right, SharePoint.
Now, if we look at the process that it used
to get this right, we took the user's query and
we just directly in this case, we just directly sent
it to that minimal knowledge base.
We said, hey, here's the user question.
Just, you know, just use it to search all the
sources right?
So with minimal, it always searches every single source you
configured for that knowledge base, right?
So it took that question and it sent it to
the index, it sent it to the SharePoint.
It got back, you know, like 6 results for the
1-2 results for the, the second one.
All of those go through the semantic rancor and get
merged together.
And then, and then we use our own model in
this application in order to answer the question based off
those results.
So this is an example of how you might integrate
minimal into an application.
It is the easiest switch if you're already using like
the search function of the search SDK and you want
to start using multi sources.
The easiest switch is just to bring in minimal and
and try that out and you'll find that you could
kind of just, you know, just swap it in there.
Thanks for that demo.
That was great.
Now let's talk more about the low effort option.
Low effort is going to give you access to those
more advanced features from the Agentic Retrieval engine.
Because we're using an LLM, this is the mode that
instead of just taking individual queries, it's going to take
an entire conversation, run it through that query planning process,
and break it down into these decomposed queries.
Now we run this knowledge source selection process as part
of query planning.
So in addition to getting those queries, we're also going
to pick which knowledge sources we're going to use.
Now we send these queries to either remote or index
knowledge sources to fetch the relevant documents.
And finally, we have an answer synthesis option, so you
can actually get a complete answer that you could render
directly in your application that includes citations.
Yeah, So this query planning step that does knowledge source
selection, let's talk a little bit more about how exactly
that works.
There are really three key factors in how knowledge sources
are selected.
We actually use that LLM to decide.
So the three main inputs are the name of the
knowledge source.
You have an optional description, so you can give it
like say like, hey, I have a BLOB container that's
a knowledge source.
Maybe this BLOB container contains a bunch of HR documents.
I have one leg, it contains a bunch of invoices,
and then the web should only be used in certain
circumstances.
So these descriptions are really critical to getting the most
out of knowledge source selection when you choose to use
it.
Finally, you can also provide some custom retrieval instructions which
allow you to basically customize this selection process in natural
language.
It looks very, very similar to a prompt.
Let's do a demo of the low effort.
We're back to our application.
You can see this time we have low effort enabled
and we are including SharePoint, not including web yet.
We'll see that soon.
And this time I'm asking a more complex question.
I'm asking which Saba Paint can I use to paint
my bathroom and how much does it cost?
Now remember this has access to two sources, a search
index and a SharePoint.
And what we can see here is that it decided
to only get results from the search index.
I can tell that from the citations.
I can also click on and see this thought process
where we can see it does some query planning.
It breaks down that query into multiple queries.
So it says, oh OK, I'm going to search for
Zaba paints for bathrooms and it gets back 6 results
there from the search index.
And it's also going to search for Zaba paint prices
and it gets back 10 results.
So those must all be paints.
And here you can see it actually decided not to
search the SharePoint at all.
And actually I agree with that decision because when I
set up the knowledge sources, I told it, hey, listen,
if you need Zaba products and prices, just search the
search index.
That's where they are.
That's the only place you need to go.
So it decided that it wasn't worth searching SharePoint, right?
So this is a way that we can like save
some costs where we don't have to search those sources
if we don't need to.
There is actually an option where you can say always
query source.
So if you are in a situation where you do
want it to force query every source, you can do
that.
But it is nice to have this dynamic source collection
because you can save time, you can save money, you
can save tokens and it can do a good job
deciding.
And so then we get back the results there.
So that is low without web.
Thank you.
So let's talk a little bit more about how this
answer to this process actually works.
The main feature that we're offering here, in addition to
kind of pre generating an answer, is allowing you to
customize the style and tone with an additional set of
natural language answer instructions.
So we have a couple examples here to kind of
illustrate how much of an impact you can have on
the style and tone of the generated answer.
The first example is this kind of defaults what we
get out-of-the-box and you're going to notice it's a little
more verbose, which is good.
And that's generally speaking, a good starting place.
But maybe you want to give a guidance to just
hey, answer with bullet points only.
And I my personal favorite is going to be a
more stylized poetic answer.
So for that bathroom paint question we had, we can
kind of get a much more poetic answer.
You know this this moisture in the air If you're
in a bathroom semi gloss shines strand song here bathrooms
47 dollars.
So you can really customize exactly how this answer shows
up, even a little bit silly, but it's a really
powerful feature.
So let's talk now about the web knowledge feature because
many times your knowledge sources are going to cover a
lot of internal information your organization would know.
For example, it's like, hey, I've got manuals, I've got
training information.
But a lot of times your agent agent can benefit
from public, up to date information that's available on the
web.
So by adding the Bing Web knowledge source, you can
actually fill in this gap.
You are able to search the entire web or specify
a custom list of domains.
Now note that in order to use this feature, you
do have to opt in to answer synthesis.
That is not optional.
So let's go take a look at a demo.
All right, so here we can see that we're on
low and we are including the web source.
Now this time we're asking what is the best Zaba
paint for bathroom walls and how does it compare to
other brand paints.
So it's obviously a question that requires going out to
the web to find out about other brands paints.
And so when we look at the answer here, we
can see that there's in fact a lot of websites
that are cited.
We've got fixer.com, we've got the ultimate paint brand comparison
from Perfect Touch PTP, right?
So it's gone out and found all these additional web
sources.
And we can look at the process here and we
can see that it decomposed that query into multiple queries.
So it searched for best Zaba paint and then it
searched for comparison to Zaba Bath and paint to other
brand paints.
It searched the web for both of those as well,
right?
So it takes both those two decomposed queries and sends
it to both the search index and the web, get
backs lots of results and then merges those together and
uses the answer synthesis in order to come up with
a result.
That answer, that synthesized answer still has citations.
So you can see I can still make everything clickable,
the stuff in my BLOB, the stuff on the web,
they can reference everything to find out if they actually
trust fixer.com, right?
Which is important, right?
That's the whole point of these RAG applications, is to
give users a way to get accurate information that they
can back up with citations.
Thanks for that great demo.
All right, now let's finish up the reasoning efforts here
with medium effort.
Medium effort is the one that actually adds this optional
iterative retrieval step.
This means that if the agentic retrieval engine determines that
results retrieved from the initial search aren't sufficient to answer
the question, we're actually going to do a second pass
of query planning and retrieval to try to get a
better answer.
So we have a big problem here.
In order to know if we have to do the
second iteration, we have to actually decide how to do
that.
We have actually introduced a new model for the first
time into the Agentic Retrieval engine.
It's only accessible on the medium retrieval reasoning effort mode.
We call this model Semantic Classifier.
It performs really two key tasks that enable this confident
iteration.
The first is to decide, hey, is there enough information
in the results of each query to actually answer the
results of the underlying question.
In addition to that, we also want to be sure
we found at least one highly relevant document to answer
the question in these queries.
Now, if we don't meet these conditions, we're actually going
to go and do that second iteration because we want
to try to get the best results.
So basically this is allowing us to confidently iterate rather
than just iterating all the time, which can be a
big challenge in a Gentek rag.
Now, when the second iteration is performed, we're not just
saying, hey, try again.
We're actually passing additional contacts to that query planning that
wasn't present in the first iteration.
We're actually going to use the documents that were retrieved
from the first retrieval and the original queries so that
we can better formulate a more intelligent second pass that's
taking into account the results from the first.
So let's go take another look at a demo of
how this looks like.
All right, so here we have medium enabled and we've
got web, we've got SharePoint and we've got our hardest
question.
So it says explain how to paint my house most
efficiently.
Then give me a list of the Zaba products and
prices for each supply.
So let's see what it decided to do.
So it it came up with a query most efficient
way to paint a house worse, that'd be fun.
Then Zaba products and prices for house painting supplies.
And this first query actually didn't get any results from
the search index.
It did get a lot of results from the web
when it searched for, you know, efficient way to pound
a house didn't get any for Zaba products because Zaba
is a made-up brand.
So it's a little hard to search the web for
it, but hopefully all of you are working for companies
that exist and it's going to be easier to find
you on the web.
So then it looked at those results and said, OK,
it, you know, it found a lot of results that
could answer the question about painting a house most efficiently.
But it realized it couldn't yet answer the question about
Zaba products.
So it decided it needed to do a second iteration.
And in that second iteration it comes up with a
new set of queries and source selection that will help
it get a comprehensive answer.
So then the second iteration, this time it searches the
search index again, it comes up with a different query
and it gets 4 results right?
It also did a bunch of other searches.
It got more specific.
It was like looking for drop cloth cock like prep
materials.
Unfortunately Zava doesn't have drop cloth, but if it did
it would find it right?
So it actually gets very clever in that second iteration
and comes up with some really good queries based off
that first iteration to get much more focused results.
So it can really help in getting much more comprehensive
answers to these complex questions.
And there we go.
So all of the examples that I have been showing
are from an open source repo.
So and this repo, we added the agentic retrieval feature
to it today on Monday.
It's been a very exciting week for us.
And so any of you who want to get started
with RAG, A conversational RAG in your domain, definitely check
out that repo.
We've had thousands of developers deploy it.
It's got tons of feature, multimodal data access, cloud ingestion,
just all the different features that people are wanting out
of a conversational rag solution.
So certainly check it out.
It can be a great starting point and great inspiration
for all of you to see how we tackle common
common issues in these sort of applications all.
Right, that's awesome.
So now let's switch gears a bit and let's talk
about how knowledge bases fit in the Foundry Azure Search.
Knowledge bases are going to give us reusable topic centric
collections to actually ground our agents.
Now with Foundry IQ we're actually able to take those
knowledge bases using MCP to give our agents a unified
knowledge layer.
The result is going to be it's much simpler to
build agents instead of stitching a bunch of separate data
retrieval tools together to get the same results you could
get from a single knowledge base.
Now the magic question is how exactly is Foundry IQ
going to enable us to use knowledge bases to ground
agents?
So we are actually going to be using delegation the
same way that you might be familiar with the MCP
protocol to use external services in your agents.
You're able to use MCP to connect to your knowledge
base.
The agent is actually going to play the role the
role of the query planner and the answer synthesizer here
as input.
We're going to take a bunch of separate intents or
queries and the output is just going to contain the
merged results.
So let's check out a demo of Foundry IQ you
want.
To do this one.
Yeah, So what I'm seeing I'm showing you here is
the new experience inside Foundry.
It's a kind of unified agent builder.
So I'm able in a single place to see all
my tools, knowledge, data, evaluations.
This is a demo of the agent playground.
So once I built an agent, I want to try
it out, customized instructions.
I'm able to use this UX to actually see how
my changes make.
So I'm using the same exact knowledge base that Pamela
was showing in her demo, but this time I'm using
it through an agent rather than through that deployed application.
So when I ask the same question, what's the best
solve of paint for bathroom walls, you're going to see
that I reach out to the knowledge base using MCP.
I use this knowledge base retrieval tool which I have
been approved and I end up getting an answer that
looks exactly the same to what I got in the
application.
So this is kind of a great way to lift
and shift your knowledge bases so you can actually use
them inside your Foundry agents.
Yeah.
So that about wraps up our presentation today.
I want to put up a call.
You can take it.
Feel free to take a picture of this slide.
Again, we invite you all to sign up for the
private preview of MCP Now sources and we'd also like
to open up the floor.
Anyways, any questions do you like to share please?
Do we have a mic?
Oh, they could probably go there.
Yeah, there's a mic.
And all right, if you do have a question, there
is a mic right there.
You can come up and ask it.
Don't be shy.
We've got 4 minutes left to answer questions.
There we go.
I have no idea how this works.
No, I think you can hear us.
So up to you.
I have I have a question with regards to the
knowledge sources that you can add in the knowledge base.
I was looking at it from the Foundry portal.
I don't know like on the Azure AI search side,
can you add as a knowledge source only a specific
index from like an Azure AI search resource or you
need to add the full search resource?
That's a great question.
So it depends.
With index knowledge sources, what you're actually going to do
under the hood is create an indexer and a skill
set so data from an outside source is brought into
an automatically created search index.
With remote knowledge sources, you don't ingest any data at
all.
You directly connect to an external source of information and
query it at retrieval time.
Yeah.
And with those remote knowledge sources, we get this question
a lot.
Like with SharePoint, if you do need to filter the
SharePoint, you can use a filter expression you could pass
in like a site ID if you want to restrict
it to a particular part of your SharePoint authors, etcetera.
And similarly with the web, you can specify domain filters
to just, you know, limited to websites that aren't sketchy,
right?
Small list, but but you've got these abilities that you
can that you can filter down those remote searches as
well.
We also have a survey slide too.
Yeah, we do also have this QR code for you
to give feedback about the session, and you can scan
that QR code in order to fill it out.
We have a couple more minutes if anyone does want
to go with Mike.
Of course, we'll also be here after the session to
answer any questions that you don't want to ask in
front of everyone.
And again, we thank you all so much for joining
us.
I know it's pretty late in the day, so thank
you very much for attending our session.
Thank you for joining.
Us at Microsoft Ignite Silence Stages Please leave your headset
at your seat as you exit.
Thank you for your cooperation.
Can I ask a question?
Oh, of course, yeah.
This is like magic.
You've kind of taken what I wanted to build and
commoditized it.
So I think you're like, this is can you go
closer to the.
Yeah yeah.
So this is kind of you've commoditized exactly what I
think we need to do to provide agents and knowledge
bases.
I wonder you haven't is there any worth of storing?
Graphs and relationships of separate entities, kind of more connections,
the connections between certain entities.
Yeah.
Does that have value in the knowledge base or is
that something that you confer anyway?
Yeah.
So I can tell you that graphs are something we
are very interested in as well.
We don't have a built in graph, but if you
can join the MCP private preview, if you have a
graph database, you are able to add that as a
knowledge source.
Awesome.
Thank you.
Of course.
Thank you.
So what kind of chunking strategies do you recommend when
you use this from the portal?
We don't get a choice of the chunking strategies.
So so we offer a built in chunking strategy which
is some defaults.
So there's two options.
The first option is you can do the chunking totally
yourself, push the data into the search index and complete
control.
The second option is to use an indexer with a
custom skill set and we actually have a built in
skill called split skill and you can customize the chunking
strategy to some degree on that skill or you can
define a custom skill that does the chunking completely the
way you want it.
Actually in the repo we added support for custom skill
sets and we use our custom chunking strategy with the
built in indexers that way.
Start building your next agent with the latest knowledge features from Azure AI Search. In this session, we will demo how to connect your agentic retrieval engine to new knowledge sources like Sharepoint, web and blob. We will also walk through new controls available to improve your RAG performance, across query planning, retrieval and answer generation. Join this code-focused breakout for samples and step-by-step guidance on connecting knowledge to your next agent. Delivered in a silent stage breakout. To learn more, please check out these resources: * https://aka.ms/ignite25-plans-agenticsolutions š¦š½š²š®šøš²šæš: * Pamela Fox * Matthew Gotteiner š¦š²ššš¶š¼š» šš»š³š¼šæšŗš®šš¶š¼š»: This is one of many sessions from the Microsoft Ignite 2025 event. View even more sessions on-demand and learn about Microsoft Ignite at https://ignite.microsoft.com BRK193 | English (US) | Innovate with Azure AI apps and agents, Microsoft Foundry Breakout | Expert (400) #MSIgnite, #InnovatewithAzureAIappsandagents Chapters: 0:00 - Introduction and Session Overview 00:10:34 - Hybrid Search and Reciprocal Rank Fusion Explanation 00:12:45 - Applying Semantic Ranker in AI Search for Improved Results 00:14:10 - Overview of the Complete Hybrid Search Flow 00:18:49 - Indexed vs Remote Knowledge Sources and SharePoint Integration 00:27:21 - Knowledge Source Selection and Query Planning Explained 00:31:08 - Using Web Knowledge Sources with Bing Integration 00:34:41 - Second iteration enhances query with additional context from first pass 00:35:09 - Demo: Medium mode retrieval solving complex multi-query example