Loading video player...
Um today's topic is going to be building
enterprise agents using Azure AI search.
So we going to explore um um how AI
search um you know supports the agentic
mode agentic retrieval agentic pipeline
and so on. We've been hearing the agents
for last one year um but we're going to
deep dive into particular techniques
technology called a search. And my name
is Uda Ramachant. I shortly go by UDA. I
work for a company called Acumenta Inc.
as a CTO CSO and I am also a Microsoft
MVP content. Um this is my website if
you want to check my community profiles
and there's my LinkedIn if you want to
be connected with this if not connected
already uh please reach out to me. Uh
that's a meet up Nashville UG and the
Del Boston.
So today's agenda is going to be the
Azure A search right that these are the
features that um available for last five
months. The first feature we going to
look at is the query pipeline. Um it's
also called agentic retrieval. We will
deep dive into this. What is agent
retrieval? How you can configure how
what is the benefits and so on. The
second feature we're going to look at is
a data pipeline. Um the main point of
data pipeline is how you can um crack
the document uh in a complex document
into a multiple different um slicers um
you know it can be image it can be text
or it can be graph or whatever you call
it as the multimodel notion how we can
get that into the third thing that we
can look at it the um enterprise grade
security um by default you can take the
advantage of if you have if you're
running your application in Azure and
you can take advantage of Android most
you know mostly all the services in a
secure way but this particular one we're
going to explore how you can data
partitioning right so the grounding data
is available to the user you know what
they can see is what they can get
through the groundings the next we're
going to see the MCP and and search for
development it's not particularly the
aspha search but it is about the MCP how
you can get some help um from you know
using the MCP how you can navigate your
asset services and get some help. The
final thing you're going to look at is a
sensitive labels. Um so this is this is
not part of the as it's part of the
Microsoft PV governance. Um we will talk
a little bit about it what it is. It's
still a preview feature. The final thing
is the demo. This demo is based the demo
that we're going to see today is a net
but that net demo is based on as a
search multimodel sample. Um that's a
python driver. So this completely a
python codebase and what I did was I
took that python code base converted to
net and added some features to it before
we go in there. So you can take a look
at that over here. So if you go back um
going to see it's over there what you
can configure and all those things as we
progress the topic and we we'll talk
more about it. U before we dive into
agentic rack let's talk about what's the
difference between rag and agentic rack
and why we need a aentic rack. Until
last year we all used to hear the term
rag right a ritual augment generation
and starting this year everybody talks
about agentic agent take agent is the
big marketing buzz word today. Um
everybody moving towards the agent
anything they talk in AI they talk in
the notion of agentic but it is a good
idea to understand what is a rag versus
agentic rag can rack what the agentic
rack solve the problem that rag cannot
solve the rag come into a picture if any
knowledge is not available part of the
training model. So you go and ask about
you know the good example everybody uses
is you know you you going to ask about
what is my company policy right or what
is my company vacation policy right also
known as PTO when you talk to that
context it is your internal data that
data leaves in your company or in your
employer and somewhere in the company
database but that data is not available
to the model that you're going to use in
this context you're going to query your
data and send the data part of the
prompt. So when you ask what is my PTO
policy you query your data and then you
put the data from your database and then
send back to the LLM. Now LLM craft a
new response hey here is your company
policy you sending the prompt you also
sending the grounding grounding source
or the knowledge or the response that
you got from your internet database.
Then the LLM takes all those things and
draft a new response sending it back to
you. Here is your policy. That's the
rack, right? What is agentic rack?
Agentic rack does the same thing but it
does way more than the rack again too.
So in the retrieval the rack is always a
one side. You ask a question it calls
one of the plug-in or tools. They both
are ter plug-in is the terminology used
in the darknet stack. Tools is the
terminology used in the python stack.
But now everybody converging towards
calling it the tools. Everybody is going
to call tools. Um so you know you you
feed in a tools it looks at the your
prompt and then you know it may open a
tools to get the knowledge or the
grounding source you call it as and then
it's going to give you the response. But
agent I can do iterative meaning that
based on your your um complexity of the
prompt right you may ask um you know I I
have an event here um which my you know
is it the event is going to happen what
is the weather is going to be tomorrow I
have an event and what's the weather is
going to be tomorrow you know based on
that you know what you can suggest right
that's that's a complex action you need
to find the weather it needs to find
your event and then based on that it
need to suggest what you can do and you
may al also ask some actions to move to
the event different place or different
date and so on. So it can take your
prompt based on your prompt it can
create a plan that plan can be iterative
process. it gets the data based on the
data it it it re recreates another step
you know and so on until the entire um
objective is meet right so that's the
iterative so this is just a one shot you
call get the answer send response is
over but agentic is a retriever it
create it splits your prompt into a
multiple different simple prompt is not
going to make any difference but when
you talk about the complex prompt it is
all you know it is going to be a huge
difference um it takes the prompt. It
creates a plan by calling the LLM. It
sends the prompt to LLM and then it ask
on all the pro if you have a tools it
sends all the tools description also
known as metadata and then LLM returns
okay here are the plans um you can do
then agent takes takes the control and
executes in a loop until all the the
code is achieved right the reasoning is
a limiter there's a multi-step because
it's going to go back and forth um rag
is going to be fast because it's just a
one shot um but agentic rack goes slow
based on the prompt that You asked based
on the number of grounding knowledge
source is going to interact. Right?
Sometimes you ask for the policy by so
and so department by so and so title and
the so and so date. Now it needs to run
a multiple it needs to split into a
multiple um small subqueries and send
back to the data and query and and then
join everything and return you a
response may take a long time not long
time but it may add some latencies to
complete all the iterative steps. So it
is slower but in a simple case you are
not going to see any difference but in a
complex case um you you always going to
see some slowness. Um complexity rag is
just a one shot so simple um agentic rag
is a complex based on the nature of your
your prompt and then how many data
source or tools that you configure to
work with your agent. The use case is
question and answer. You ask a question,
it returns. Right? That's the simple
rag. But agentic rag is a multiple hop.
You can ask them to do some action. For
example, you find a so and so person,
you know, so and so problem. Email to so
and so person or call so and so person.
Right? Those kind of automation you can
pipe into the actions that agent can
take over that scenario and action on
your prompt. Accuracy. Since it's a
oneshot, there is always going to be
moderate. It may hallucinate sometimes
but the agent can act you know as it's
iterative multi-step process um your
accuracy can be as high as possible. So
you can have a agent. Now that agent can
talk to you know if if the agent needs a
knowledge as I said an example what is
my company policy right your company
policy in the document in some database
and you can vectorize the document put
them in the search index before agent
retrieval if you ask find a document
that PTO policy it's going to go back to
the index most likely vectorized you
know it's look for the you know PTO
policies and return back the document
right just a search vectorized search.
It takes your incoming keyword as a
vectorized and then go back to the
search index and you're going to look in
the column vectorzed. If you need some
additional filter, you can add it um in
a hybrid search mode, you can add a
filter. If you don't have hybrid, you
purely go with the vector. Then you go
with the vector. It gives the top
refined result and that's all you do.
You take that result sent back to the
LLM. So you but here what happen is
you're still going to use LLM but it
breaks down when you sent the complex
query to the search engine the search
engine now has agent it's called
knowledge agent right it's going to send
that query back to the knowledge agent
knowledge agent now can slice into a
multiple queries before it runs a
oneshot query but now it can slice into
a multiple queries I will show you an
example over here so see is find all
company policy pages that mention hybrid
work arrangements, summarize key points
and include related HR guidelines from
the last 12 months. If you take this as
a complex query, this query makes a LLM
call to make a plan. But this is all
happening inside the Azure a search
knowledge agent. Okay. And the three
calls and subqueries, it divides the
hybrid or policies. Instead of running a
one full query, it needs to run one
query look for hybrid or policies.
Second query HR guidelines for hybrid or
last 12 months is kind of filter right
creator article to recent hybrid policy
updates. So, so one LM called creating
the plan, three search calls and then
one reranker. But these three may give a
different result, right? This may give a
30, this may give a 40 result, this may
give, you know, another 30 results all
to 100 results. But the the reranked
result can cannot exceed 50. So out of
100 results, it's going to rerank it and
return the top 50. Okay? And we're going
to take out of that, we are not going to
send all the top 50 to the LLM. That's a
lot of data. You're going to lose a lot
of money. So we may take like a top five
or top 10 at the max. All right. Uh
let's go back here. Right. That runs a
subquery
merges the result and you know if it
returns the 50 then we don't want it
merges all the three subqueries and then
it turns a top 50 then we don't want to
send all the top 50. We may send like a
top three or the max top 10 because you
don't want to pay a lot of money for the
tokens. Um the best for complex
multi-art questions as we saw before. So
here is the architecture. It works um is
a conversation history you have it and
you know you you interacting with the
agent which is your agent right um or
completion and then that completion is
going to talk to um um search service
agenting it's a beta right now so it's
going to talk to the agentic of search a
search agent and that agent is going to
that's what is going to slice your
queries into multiple sub queries and
run all the queries merge all the result
take top 50 if it is more than 50 then
send back to you the number of result
that you asked. You asked the top 10,
it's going to send you top 10. You ask
top five, you can send it top five. But
the max it can return is the 50 and then
it sends the response back to the user.
Okay. So, and then agent receives that
data and then send back to the LLM to
create the final draft of the response.
Then it send back to the agent. Okay. So
that's all we talk about it here. Uh and
you can look at these articles.
Everything is provided at the end of the
uh slide.
Um we we just discussed it you know
that's a complex query you know um it
comes from you know the these are the
filters date department all those things
is plan created by the agent it runs by
itself and returns the data all right
let's look at in a practical way right
uh the easiest way you can start this um
go back to this system
um there is a
there is a Python uh quick Start agentic
search. Okay, we will start from here
and then I will run the application that
I built which you can as I said before
you can download this. This is the
working application. You can just
download and then you need to do a
little thing over here. So you have to
complete the um um app settings um you
know your if you have Azure ID it's
authentication enabled. So you create
app ID and your tenant ID and then
register this as your redirect URL. Um
that's all you need and then client
secret to get the um token open ID token
right the same thing you're going to do
on the uh server side you go back to the
API open up the settings um you complete
all those things okay uh here it is set
service endpoint um you can leave it and
I will tell you how the application runs
as we move forward and then document
service and all those things you you
will get to it as we see the demo Okay,
that's the full application. But we will
run the full application. Before the
full application, then it's a good idea
to understand the fragment of how it
works, right? So I configured the Python
notebook. Okay, the Python notebook,
it's easy way to understand because you
can run one step at a time, right? So
you can start from here um and then run
step by step to understand how it works.
This Python notebook walk you through
how you can create the um um agent
retriever. How you can achieve the agent
retriever. What they do here they
creating an earth at night search index
and then they create the agent right as
I said you know the search service
supports the agent. So agent is created
part of your search index. I'm using the
agent rank result and so on. So let's go
back to our application browser over
here. So let's look at it.
azid.com
I already created a search service I'm
you know in your case if you exploring
it you know you have to create one but
the agentic in my case the agentic um
retrieval works only in the central US
so if you if you if you're going to
explore tonight or later um I would
suggest you create in a north central
and if you create in a north central
it's guaranteed all the features are
enabled um all Right. So now I already
created some some search indexes. We are
not going to worry about it. But let's
go back and run the samples step by
step. Right? Uh this sample as explained
here. It's going to create a search
index called earth at night. Right?
Which we don't have it. Um in here I we
have provided all the details as in as
open AI endpoint because it's going to
use a agent. It doesn't mean it's going
to have its own LLM connection. You
still have to provide LLM connection.
Okay. that LLM comes from here. Okay. So
you have to give the endpoint. So that
means if you run a lot of queries,
you're also going to pay a lot of money
too. So it's not going to give you a
free LLM um to you. You know, you will
have to give the LLM connection. Okay?
It can only provide you agent. But we
need to provide what LLM to use. And
that's the the supported LLM. In other
words, the Aza and OpenA they support I
guess and you have to provide that in
search. I don't have any access key
because it's going to use the um managed
um service principle to connect to the
services. Okay. Now let's run the
samples. If you go to the samples, we
did all the prerequisites. We created
the service. We you know we enabled the
smantic enabled um you know we have the
open a resource and then supported
model. So there are some limitation what
model they support. You know that will
tell you while creating oh this model is
not supported. But in this case GPD 40
mini or GPD41 is supported. Okay. So now
portal you know all those things
um you know if you use the um managed
identity then you have to enable that
particular identity to have the search
service contributor search index data
contributor and search um index data
reader those permissions I already
granted it. So my demo is good to go.
Then you take a samplev and then convert
back to the env again this is we are
running everything on net. If you go you
can download this to run in a Python
book and then install the packages. So
this is where we're going to start
right. So these are the package we have
to install right. So we are running in a
Python notebook search documents 1170
beta 4 and then the identity openai and
then the net environment. So and you see
tick up done it's it's completed. So
this step is completed. These packages
are installed. The second step we're
going to load those environment. Okay.
So this environment is loaded. So this
is done. I know it still say spinning
but it is done. The third step we're
going to see is create an index right?
So we said that we you know it's we
going to create the index which is not
exist already. Um we saw the name it's
not there. So we're going to run this
script to create index.
Um so if you look at this script over
here again this is you know pretty
simple right? So as a such documents
index um you're just creating the index
with the with the embedding enabled. If
you have done before you know what it is
but otherwise it has to create the
embedding vector or go to the index and
explore how it's been mapped. Okay you
have to you have to create that you know
model and then what is the embedding
field and vector search field what is
the algorithm you're going to use and
all those things we have to define it
and then create it. All it does is it
creates a U text with embedding field
enabled. Okay. So you can store the
vectors in the embedding field. Uh the
embedding or vectors are
interchangeable. Some people call
embeddings, other people call vectors
because it's the array of float numbers,
you know, vectors. But when it comes to
the API level, they most likely use the
O embeddings,
right? We enable the semantic search.
That's how you know what is a semantic
search mean here. You know for example
if you want to look for a restroom you
know some some country they may call a
bathroom other country they may call a
washroom but the document itself can say
the restrooms right if it is semantic it
knows a restroom is washroom restroom is
a bathroom and so on um so that's why we
need to create the semantic search so
everything is going to be semantic even
if you if you're looking for apple you
make a one mistake in the keyword search
it will not find it a pl is apple but
you know you put a you know miss or miss
e or any character you mess it you'll
still find it because of the systematic
search nature. Okay. And then index
client and create the index. So now if
you go back to our service here and then
you refresh it um you have the index
here that night you know you look at the
index over here and look at the field
you know these fields are created. You
can look at the semantic configuration.
It is the page junk as a content field.
That's where we're going to throw all
the all the vectors over there. Okay.
And then if you look at sorry page junk
is the semantic config vector profile is
the um you know this vector as you're
opening a text three
large you look at the field over here um
um this one page embedding that's why
we're going to uh store all the vectors
over there we didn't do anything much we
just created the index we configured the
semantic we configure the vector profile
we have four field and now we will
explore how to upload a document and
then see how the document has been
sliced.
Okay. So now we're going to upload some
documents, right? So to upload a
documents, they take the sample
documents from here and then take the
content of the document and then um you
know you just upload it. Okay. Um they
run it the documents has been uploaded.
So if you go back to our index over here
um search explorer you look for it.
There is a document right. But if you
look at this document, there's a page
number 5 4 9 8. Okay. So when we need
auting or a vectors, we have to slice
the document. Your document may have a
two pages. But you are not you know you
you have to slice them into maybe four
pages. I mean two pages document is
okay. But if you have like a 100page
document, you you know you have to slice
the document number of lines that you
want to take from the document and the
number of overlap lines. So you take a
10 lines you want to overlap two lines.
It will it will include two lines from
the previous page and then take a 10
lines right and so on. Those are the
algorithm you know those are the
critical critical thing to do the right
thing to do right if you design that
correctly your semantic lookup is going
to be accurate right if you don't do it
correctly your semantic lookup you know
may be a little weird it might not
return the result that you wanted that's
the most important part that's what the
Azure intelligence um document
intelligence service does but there are
a lot of open source that can um help
you crack the document into multiple
different parts and then you take the
parts and then you convert vectors and
then display it. If you look at it here,
it's not displaying the vectors because
it's too big. So, we hide it. So, if you
go on in it, select the vector. Now, you
go back to the search explorer and
searches back. Um, that's the vector,
right? So, that vector is generated and
then uploaded over here. So, you look
for something for example for word and
then we going to convert certain vectors
and we look against it if we look up the
vector lookup. But in our case, we going
to the agentic lookup. So whatever you
send the agent will take care of
everything. Sending back to the uh
sending back to the search index and
then take all the values. Let's
continue. We'll come back to that. Okay.
So we upload the document. Um so far we
only uploaded the document. You know we
haven't done anything much right. So we
we can go here and then look set
something maybe a forward. Uh we have a
forward over here, right? So let's go
look for the forward. But that's you are
looking at the page chunk. It's going to
look at it and return the word. return
the value but we still not doing
anything in the vectorzed search or
semantic search in other words if I go
and say this may not return any result
or may return okay so you know but it's
irrelevant return as you can see so when
I put a word okay so use uh you see the
different result when I have typo it's
not but that's what the semantic fixes
it for all right continue on so now what
we going to do we're going to create an
agent inside the search right this is
the new feature
Okay, it's a cool feature. I using it.
I'm using it for for my project. It's
it's it's amazing. So, we're going to
run it. So, let's see whether it's
create the agents. Okay, it's created
0.3 seconds. Right. So, if you look at
the code, you know, you take the Azure
open a vector parameter where it is and
then you know agent model and the
knowledge agent target index which is
our current index and then you know we
create it. Okay, that's it. Create or
update knowledge index. We already have
the index. Now, we're going to update it
with this agent. So the agent name is
going to be created all those things.
But there is no way you can see it in
the UI. Uh it's coming but right now you
cannot see it in the UI. It's going to
say um earth search agent is created or
update successfully. You will see how
you can delete it at the end. But there
is no UI that you can see it. If you're
trying to delete now this index it will
say nope you have to delete the delete
the agent first but there is no UI to
delete the agents.
We'll continue on. Um so now we created
the agent
now let's set up some message right so
then now we're going to interact with
the agent so this is the first thing
what you do is the system prompt system
prompt you know you you set up the
system prompt hey you are a Q&A agent
you know what you want to do all those
things that's a system instruction
that's the most important step so you
you have to say what is the the role of
the agent right so that's what you call
it as you know um meta prompt or the
system prompt or you can call it a
system prompt, meta prompt. Um, you
know, that's the most important. That's
a agent instruction. So, you set the
agent instruction and then agent
retrieval for fetch result. Okay. So now
we're going to retrieve it. Um, but when
you want to retrieve it, you call it as
the role user. Now the agent role is
done. We set it up. But now you as a
user want to query something, right? Why
do you know some question why do you sub
pairs display large disable operating or
whatever the whatever the user wanted to
to as a prompt is going as a role user
right and then we're going to retrieve
the agent um and then we're going to add
assistant whatever the result is
returned is you know so this is this is
user now is a system it's going to
perform it and then the assistant
message will come in the role assistant
whatever it returned by the system is
going to be come back over here so when
I run
So this step will take up to you know 10
seconds because you know now it needs to
send the data to um um search query.
Search query needs to say you know send
back to the knowledge agent. Knowledge
agent needs to be split talk to the LLM
to split this and all those things will
happen right all the cool things that we
talked. So now it it it has the result
as an assistant. So how we can see the
result? So you can go and run this one
and then you can see this is the result.
Okay. And so on you can keep going. And
but if you want to print the result in a
different way then you can start looking
at it's 11 retrieval result you know it
returns all the results input token is
1390 the output token is 458 and you can
keep look at what is what is being
returned right so you can expand it know
scrollable element you can you can
expand it and then see it
so far we send a query to um a search
agent a search agent you created
a plan and looked at it, returned the
result. But that's not enough, right?
That's that's only the retrieval part.
Now you got the best best grounded
knowledge possible, okay? Than before
because now your query is split into
multiple plan and executed parallelly
and and then reaggregated and then
grouped and then return a top ranked
rows. But that's the grounding. So far
what we did was grounding, right? These
are the result. Now how the end user
gets the polished output. Now we need to
send that to LLM. So that's that part is
you know before you user ask something
you take the intent you ask LLM what is
the user's intent and then user you know
LLM gives you the intent and then you
run some query without agent and then
you return the data but now you take
that query as it is sending back to the
search and search has the agent and it
sends back to the agent. Now agent
creates a multiple plan, executes, finds
the best result possible, return back to
you. That's the agent's job and agent is
not drafting the AI search agent is not
drafting the final response. Now we need
to call the Azure open a or the open or
any model that you wanted to use. You
take that response and then you create
the create the LLM connection, right? Um
then you use the completion API. Uh
okay. uh completion chat in this case
you can also use um agent um and then
that's the final output okay but this is
this is the uh um agent return agent
returned output the search agent output
this then we feeding back to the llm and
the llm returns this final uh final
output okay and then you can continue
the conversation and the history is
attached to it and it goes back and
forth um you know you ask whatever the
question you know How do I find the lava
at night is a question and then it's
still working on it. Uh it didn't tell
me. Okay, so it found it. The value is
there. We have to print it over here. So
we print the value. Yeah, it's a light
source at night, you know. So you get
what what it says, right? And then you
continue on. Okay. And then it displays
the result. So now so far we consumed
100 sorry um 1822 tokens. The output
token is 129 and these are the things is
happening. Okay. Now if you need to you
know continue on generate an answer and
you keep going on
the the the role is the very important
user role come back and you got the
answer. Okay that's how it works. But it
is the most important things to
understand you know every time you ask
something and you have to make sure that
you are not exceeding the token
threshold. If you look at it, this token
so far since we've been interacting back
and forth. So you also sending the
history, right? Part of the prompt,
right? So you ask a question, you got a
response and then you send a question
and response to LLM and LLM puts the
output response, right? Now questioning
data LM LM response then your question
comes in that all goes back to the LLM.
So the token over the time it's going to
be huge. So any input token you have to
divide by th000 every thousand token is
going to cost you 0.02
um 0.2 which is 2 cents. Any output
token is going to cost you 0.004
cents which is 1/5 of um 1/5 of 2 cents.
Right? So that's you can find in a
calculator but you have to be very
careful what you send what you ask.
Right? So so you know if you if you send
a huge history you know you're going to
you're going to pay more money for the
input tokens. Okay, so we got the
answer. But now finally the clean it up.
So the first cleanup step is happening
as a delete the search agent and you
delete it and finally sorry search
agent. Yeah you can delete the search
index without deleting the search agent
and then you delete the search index and
we got it off it. Um again this is
available online. You can run it but you
know it's mostly available in Python but
you can change to net and just just run
this in this mode. If you go back and
then search the indexes, our index is
made gone, right? So the that index is
gone. Okay, but now we have the complete
understanding of how it is. Why don't we
look at our application? Okay, so in our
application, we're going to run it.
So we have a two endpoints. A search
a search.b web is just a front end MVC
application just for understanding
purpose. And then the a search is
complete back end. Okay. Um so if I go
sign in as I said you have to set up the
sign in parameter now I'm sign in and I
can start asking a question I have one
document over there you know so um you
know I can ask this question on the
search index it's going to go to a
search it's a normal search nothing to
do with the agentic you know it's going
to find some result may not be useful it
returns me the result if I go to the
chat interface I ask this the same
question but I provide more options over
here um use knowledge agent
require security trimming and enable
text to speech and you can turn on then
you know instead of um you you reading
it if you're on a mobile driving mode um
it can read it for you as the response
comes in okay um it'll automatically
start reading you can pass it um you can
use use knowledge or not but for now
let's disable the um knowledge and then
let's see so this is going to use the
index without a knowledge agent right so
if you look at it that means it's going
to use the chat completion. Okay. So if
I use a chat completion, it didn't go
get in get in those block over here. I
set a break point. So next time when I
come in um you know what it what it's
doing. So it is thinking
um so it found the result right so you
know this is very simple document I
uploaded it you know that value is
coming from technology change past
present future PDF and then if you look
at the step it gone through the
grounding user message. You asked this
question um whatever the question that
we posted. Uh where is it? You asked
this question and then now that question
is grounded. It got the result. In this
it's a very simple use case. There's
only one document. So it's coming back.
Um grounding result received and
preparing the LM message. LM response.
So we going to send the prompt with what
we received from here. Step two send
back to the LM. LM return back the
response. That's the response is this
response. Right. But now let's turn on
the knowledge agent. Okay. Now we're
going to clear this chat and then put
the same exact message and then we're
going to send it. Um so if you look at
it is knowledge agent true, right? So
I'm going to run it. Uh
so here I have two parameter. One this
is like a search knowledge agent. This
one is we simulating it as if a
knowledge agent would do. That's because
we want to understand what's going on
over here. Let's run it. Okay. So, it's
thinking you got we got output, you got
document. If you look at the steps here,
this is for again illustrative purpose
only, right? You have a knowledge page
knowledge agent pattern and then agent
configuration, you know, agent message.
Then we run it, we run the query, we run
the retrieval and then, you know, and
the response generator and so on. But
what if if you you know if you clear
this chat if you going to ask the um the
the Azure a search agent right so we're
going to set a break point we are here
so now what you going to do we turning
on the knowledge but we're going to set
the a search agent right so this this
parameter is coming as a false so we're
going to leave it over here and then we
run
Okay. So it may be the same result but
you can see slightly different result um
you know but but the concept here is now
this is using the um search engine but
you need to have more more data you know
more parameter more filtering then it
makes sense this is a very simple
example okay this UI is also if you
don't if you want to create a knowledge
agent you can go and give the name you
know uh dev Boston demo right and then
you can provide some description the
description is very important uh uh this
is here uh demo index for dev post and
talk or something like that right so you
can create an index this description is
important because this is where the MCP
is going to find your description to
give you guide you what you can use what
you can do so anything that if you see
the description field you have a plan to
use the index um you plan to say MCP
it's most important to provide the you
know valid instruction so if you say
create um it's going to work on it you
create the agents Right. So there is
where is it dev Boston demo. Um if you
go back to the index over here refresh
it it will create a dev Boston demo and
then you can look at it other parameter.
I'll come back and explain to you later.
This also creates a agent body.
Okay. So now what you're going to do um
you know once you have the index you can
upload the document. That document we
have something called knowledge in text.
So we're going to upload the document to
knowledge in text. You can browse the
document and upload it. Um it's all
working. So you can take a look at it.
But let's move on to the next concept.
Okay, let's go back to the power point.
Any questions so far? I don't see any
question. I'll move on. Um
the next one is data pipeline. So
the real business document is not as
simple as you know you write a simple
resume or you find the content blog on
internet or you know some news stuff
like that, right? The complex content
type might contain a graph, chart, you
know, words, you know, you know, video,
audio imported in it or some image into
it, right? Um, that's the real business
data. So data can come from any sources,
one drive, asset storage, shareepoint,
you name it or database or you name it,
you know, you know, any any any file
system or any cloud storage or any
databases or any online software service
materials. Okay, they handles some
multiple uh media types. Turn image into
text. If your document contains the
image, then it can extract the image
alone, put them in a separate uh
separate file. So when you look for
something, it can give you the reference
to the image. You have a document, the
document contains the word image, then
you can take the image alone, put them
in a separate document, part of the
slicing process. Okay. Um keep the
document layout intact and then break
long text into smaller usable parts.
That's the part of slicing and
overlapping. So you have a document per
page you have like a 20 lines. You don't
want to take entire 20 lines. You may
have to take like u maybe 20 lines but
you want to overlap it like four lines
go to the next page right maybe the four
lines goes to the next page and so on.
Uh you can also use a logic app to
automate integration from the connectors
like a workflows power app or anywhere
you come in from or you have any
software as a service you want to bring
a data like a salesforce you can use the
logic app to connect to your salesforce
connect to your workday or connect to a
service now and bring the data over here
everything can be set up directly in the
Azure portal. So that's the data
pipeline advantages but let's take a
look at it how the data pipeline uh can
be utilized in our demo here. So once
you have the asset search you go back to
the overview and then there is import
data right so that let you import the
data to existing index and then it will
index it automatically if you have the
um indexes and the data source
configured correctly right but now we're
going to look at it import and vectorize
the data okay so you can bring a data
from asis blob storage these are the
supported data source uh or you can
bring it from the as data lake storage
the the advantage of bringing data from
the asa take data lake storage. They
also provide the out of the box ACL
import meaning that if you any any file
in the Azure data lake contains the
security group policy enabled. For
example, this document can only seen by
so and so group. It will acknowledge
that when you try to look for grounding,
it will make sure okay who you are and
then it can give you know it will make
sure that not if you if you do not have
access to the document you are not going
to see in the grounding result. Okay,
that's the advantage of as data lake
storage gen 2 that provides out of the
box. Um you can also bring your own uh
permissions but we will talk in the next
next slide. So for now I'm going to take
it. You can do a simple rack or
multiodel rack. Um I'm going to select
my storage account. Um I'm going to
select my container. Uh you know I have
some documents over there. And I'm going
to go next. Um
the first step will take a slightly
longer because it it needs to validate
everything. Okay. So you can do a
default as a intelligence but in this
particular demo we're going to use a a
document intelligence. Um mean once you
pick your own service then you have to
provide the credential to access those
service. What does that mean? You also
we also have to pay for the money right.
So we picked the a document intelligence
service that means we have to create the
document intelligence service and any
consumption goes we we are responsible
for paying for it. You can use the API
key but they they can make the
connection querying it or system
assigned identity that's the best model
um you know um to securely connect the
system. So you don't need to worry about
but some people might disable the uh um
it's called a local mode they can
disable the API key then it won't work.
So this is the right approach. Go to
next. And now we can do image vization
with the text vectoration. So either we
can use the image or multimodel
embedding. This some reason it's not
working. So we're going to select this
one. Um and then and now it's asking me
what is the open service you want to
use. Right? Because as I said before we
have to give them the LLM. We have to
give them all the we have to give them
the LLM. We have to give them the
embedding LLM. Right? So what is your
embedding LLM? Embedding LM is where
where you generate the vectors where the
LLM that's where where your you know
grounded grounding knowledge is going to
get in you know track your response or
the plan is created right so we need to
provide both so in this case this is llm
it's asking me model deployment so I'm
going to pick the GPT40 okay again I'm
putting a system assigned identity now
we need a text vectorization uh we're
going to select the service open a
service and the model is text embedding
three large it used to be a text add 002
but now the two two embedding model
popular is the text embedding three
large text embedding three small based
on your based on your need if you really
have a requirement that deals with
multi- language and you know multimodel
the real notion of multimodel then it's
recommended to go embedding three large
and I will enable the system identity I
acknowledge it because you know it's
also incredational cost you better read
it from there as well uh click next and
then now the storage account um
image output location. Okay, so this is
image output location. What does that
mean? Your document as I said before
your document may have a image button in
it, right? So what it's going to
document intelligent service, it's going
to take the document, it extracts the
image alone and also it will extract the
text from the image. So it creates
another document called image document.
So you have the main document, you have
the image document. If you're looking
for something that come from the part of
the image document, it will display the
image document. If you're looking for
something that come from the main
document, display the main document. So
that's the that's the reason that we
have to um give a output location.
Right? So we're going to say just
output. It's already created. So I'm
reusing the same thing. I will leave I'm
not going to worry about that parameter.
Um you know and then the indexing right
indexing is how often you want to index
the data. So maybe in this demo you
would just say it once, but based on
your need you can read. Um you need to
enable this sematic anchor. If you want
to add a more field and you can add
there are some fields are they
predefined that's um that's the um um u
what do you call the cons of this
approach because you know you have to
use the field they used if you
programmatically then you can overcome
it your own field name but you can add a
field from the data from the data source
you know we select the the blob stores
from the um um as a data lake it gives
you all the meta metadata field you can
add and put it in but if you have the
custom metadata field it will display
here as well. So once you know you need
a more field for example you know I'm
should holding it here you want you know
friendly title or display name or
something like that you create the
metadata in the block and it'll bring
that bring that metadata here and then
you can create the feed okay let's
delete this one
right cancel it and then next
uh we're going to say um we just call it
as Boston
do a box so you know what it is and then
you create it. I have five documents
uploaded but when it slice and creates
all the all the all the vectorization it
could look like a 50 documents uh say
start indexing.
So if we look for something we don't
have anything we'll go check whether
it's really created it. So if I go back
to my data sources
uh yeah the data source looks correct I
go back to my indexer it's still working
on it right so there are three part
comes in one is the index where you
query indexer is one transfer
scheduleuler to take the data and bring
it back and then the data source is the
your ultimate source of truth okay so
it's still running on it uh okay it's
done now if you go back to the index if
you look look up the search here um you
may see some data now okay so we see the
data I mean if you annoyed by seeing the
content embedding as a you know maybe we
turn it on 372 vectors people used to
use a 1 1536 vector um I seen a lot of
people use a 1536 vector but recently I
start noticing everybody use the 3072
vectors size right that means it's going
to create that many dimension of the
data that we providing and since it's a
too much of data we seeing it we remove
it. Now we go back to the search
explorer. But we removing it from the
retrieval. We don't need to. Nobody
going to retrieve the float. It's
meaningless data for the end user. But
it's a meaning very meaningful data for
querying the data. But when the end user
seeing the floating, they got they're
not getting any any insight. It's just a
number. So we can remove from the
retrieval. Um that's why it's
searchable. But retrieval is always you
can um remove and add it back. We look
at it. So it's a 50 we only upload a
five documents um I can share the repo
but the repo only contains the five
documents but once it slice and does all
those things it's become like a 50
documents okay so now if I go back to
the um the um here what it does is the
data pipeline is also creates something
called skill sets okay remember on the
way we provide the document intelligence
service credentials we provided the um
embedding uh embedding model we provide
model you can see all those things over
here that's a skill set is created in
the skill set we say like a documents
take the document from here and then the
field data and then it's a skill set is
over here you know it's analizing the
image you put it over here in you know
here is the open a connection which
model it's going to use for in a
embedding creating the embedding of the
sliced documents um you know and all
those things okay so you can look at the
skill set and then see how it's been
doing it um then it's putting the output
normalized image to the output folder
and so on. So this this is how it works.
So what it does is the indexer runs the
skill sets as a step approach, right? So
when the indexer runs, it's going to
invoke the so if you look at the indexes
code, you go back to the indexes and
then you pull the de Boston and then you
can always look at the edit JSON. Um it
will have pointer to the skill sets.
Okay, so it's going to run the skill
sets.
it will have pointed to a data source
point to target index. So indexer is
orchestrator that uses the skill set and
data source to populate the index and
then the skill set handles all the rules
how you want to decide and all those
things you can do skill set way that's
the most effective way um you know and
you know as a service handles it or you
can manually slice it you know in our
demo that we had it we manually slicing
it right so if you look at this demo if
you run this demo we manually taking the
document we slicing it we taking the we
creating the emping you can step in the
code we creating the emping we put them
in the index but in this case skill set
does all the work for us. Okay. All
right. Move on to the next topic.
So the next topic is uh where we are
right now data pipeline.
Yeah we looked at the data pipeline
document level access right we talked
about it you know um um we talked about
it um as a data lake supports the um
these tendra ID permission models.
That's the one way but not all the
documents always come from the data uh
lake right so you may have a document
from your database you may have document
from somewhere else or flat file system
or anywhere but the bottom line they are
loaded into the index that's how you're
going to query it right but once you
start loading the index but you want to
apply the ground groundness then how you
can do that um so you can do it in a two
different way one is you know pull model
that's the u you know um a as a search
will handle it from if you have a
certain data source in this case as a
data lake. Other one is the push model
meaning that you going to create the
field and you put it over there and then
you're going to start querying that
field. Okay. Um so we're going to look
at the push model today. That mean that
we're going to push the data and then
how we can query the data back. The full
model is mainly for some predefined data
sources like the um adl.
Okay. Uh, all right. Let's go back over
here. So now, um, let's get back to this
page. All right. I'm going to go back
here in overview. I'm going to go back
to my resource. Um, in the resource, if
you we talked about some storage, right?
So, if you look at the storage over
here, storage browser
um container, we have documents,
some five documents here, right? some
documents over here and then it created
the image library which we call it as
the um document out right okay so we put
on this library right this is the area
library so we have a share point um we
named it but we have some documents here
and then uh when we go back to the
container and then this is the output
container this you know this created
some images if you look at the image you
probably know what I'm saying so view
you know it's a it's a logo there's
nothing but but the idea is you just
slice the document right this is another
document okay so this is the image
inside the document it extract and
putting it over here but if you're
looking for something like uh you know
redirect to URI it might exactly point
to this image
okay so now come to the permissions
model so if I go back to the search
service over here so if I look at some
of the indexes or let's go back to the
index the knowledge text that we have
it. Uh if you go back to the fields, we
have some fields called user ids and
group ids. Okay, user ids you can put
list of user ids. Group ids you can put
list of group ids. Okay. Um again it's
array array of string. So um there is a
limitation how many data you can store.
Um you know if you use a metadata if you
use if you use it directly you can
upload as many row as many data as you
want. But some people may prefer to use
this as a metadata. Right? So for
example, we talked about the data lake
storage. Data lake storage is stored as
a metadata. There is a limitation how
how much the metadata can grow. Right?
So if you use bottom line if you use a
metadata to store the group ids and user
ids part of the blob that you are
uploading. So you are putting a document
called company policy and you set there
the who can access it. There's a
restriction. It can exceed you know more
than 8k bytes. But that includes all the
metadata, right? So that means you can
only have about 50 group ids or 50 user
ids of the guided. So that's not
recommended. The best way maybe you you
put the document first and then you run
the data base call and then take
everything and then inject into the
index on this particular call. Okay,
that's that might allow you to insert
more data. But it's still you have to
come to an conclusion how you want to
organize this. You don't want to allow
like a thousands of user ids or
thousands of group ids. Instead, you
want to make sure that you know if it is
a you know define the um definite number
of groups maybe you can say and I'm
going to allow 50 groups. I'm going to
organize everything into this 50 groups.
Okay, that's more than enough. Often
time you know three groups is enough but
if you want to go you know 50 groups or
100 groups but I seen the customers who
have like 100,000 groups I don't know
how they get into there. Um I I had seen
couple of customer they have 200,000
groups um just groups groups after
groups after groups so they have a lot
of groups right but you don't want to
put all those thing that's going to be
disaster so you can design your system
how many groups you want to allow based
on that you can either set that groups
part of the metadata or set that group
if you set part of the metadata the
indexing process will take care of it
indexing it otherwise you have to
manually inject in ingest that data into
the index okay so if um to run this
demo. So if I go back to um go back to
here, I'm going to go back to asive
directory. Now we are looking at the
index, right? So we only have one
document, right? We want to make sure
that we get a result or not, right? So
we go back, we go back to the chat and
then we're going to put the we have the
message lower there. Um this is a
document. U my ID is the Microsoft ID.
Uh that's my ID. So what I'm going to
do, I'm going to go back to this index
and then the users.
Then I'm going to pull meaning. Okay. So
that could be this one. Right. I'm going
to go back to the groups.
Okay. So now I'm part of the 851431
group. So let me see whether I have the
data index to here.
Um you go back to the index search
index. Go back to the knowledge index
fields. Sorry, search explorer search.
Uh I don't have that group added. Um
let's take a different user here. 0251
as a group ID. Okay. So let's take this
users who is over here.
So that's my account, right? So let's go
back. Let's browse this site using my
account. So now what I'm going to do um
where's the UI? Here it is. So I'm going
to take this URL
and go back to
so I'm allowed. Let's browse another
user called this user. So let's see that
this user can see the data. So I'm going
to sign in
now. I sign in as that user. I'm going
to go chat box and I'm going to take um
this query here.
paste it over here.
Uh I'm I'm skipping the stepping part,
but obviously if you're a net developer,
you'll get it. Uh you still find it in
it shouldn't be finding it. Uh
oh, maybe he belongs to another group. I
guess let's go and remove this person.
But you got my answer, you know. We
don't we don't have to stick over here.
But he got me concerned. Maybe he
belongs to this group. Okay, let's move
on.
So the next thing is
next thing is MCP. Azure MCP. This has
nothing to do with Azure index itself.
Azure index itself is overall concept
that how we can use Azure MCB to
discover the services provisioned in
your uh subscription. Right? um you know
you use the IDE you know cursor or
resource studio code or you you name it
the your your popular IDE uh popular ID
any ID that supports the MCP integration
so in this case we're going to look at
the VS code use the VS code um you know
how you can ask a data about the Azure
services okay so what I'm going to do
here so I'm going to open up Visual
Studio Code you know I'm going to um ask
in this agent mode Okay. Um
uh can so to enable the MCP you have to
first enable the Azure login. So you
have to go here and then you log in you
you make sure your subscription is
there. But any point in time if you
stuck something you select this agent
mode and then you can ask say for
example you are not sure how to um see
the ACMCP you are not sure how to enable
the as a as a connection right to see
these resources you can simply ask hey
can you tell me the steps how I can
enable it would even tell you go back to
left or right based on where you are and
click on the icons and then you know
then you sign in and complete the
process. So it will tell you completely
like like a master um then you can see
it right. So you can start asking the
right questions and it can help you. So
if your prompt is good your A is good.
If your prompt is not good your A is not
good. So in this case I already
connected everything. I'm going to ask,
can you suggest me an index that can be
used to find easil group mapping
for security filter?
Okay. So, I just asked a very very very
raw question, you know, can you tell me?
Um, as I said before, this has nothing
to do with um such itself, but it it it
gives me some response. So if you go and
look at it all I'm looking for index
name right for user to group map nodes
to support index uh is it's completely
design you know sometimes it does this
so I can ask I can ask in a different
way um
that I'm looking for
an index name in my search service
uh
what is The search service we are using
it
demo one.
Okay, it now asking me um you know
enable auto approve. I can say hello or
whatever it is but let me say hello for
now. Or sometimes you can enable auto
session.
You got the concept. Sometimes it finds
it faster, other times it just takes. Uh
I know I just wanted to approve it so I
don't have to answer every time.
I mean it's a demo machine. I'm fine.
But you have to be very careful when you
run always, you know, because you are
running agent mode. If you ask can you
delete my C c files it's going to delete
it. So you have to be very careful what
you auto approve.
Okay. So I need resource group name. Uh
but it's it's not going in a very uh but
sometimes it gets in a right you know uh
one one click other times it's kind of
you know keep going in a loop but it'll
find eventually. But you get my answer.
Let's go back to the next one.
is the last one. Um so what is this
means right? This is the um sensitive
labels. I don't have demo because I
didn't you know this is a preview so I
didn't I I signed up but some reason I
didn't get access to it. Um this is a
pre preview feature. This is not again
it's not part of the search. Um this is
part of the um um you know um
um
Microsoft purview information
productions and it sensitive labeling
capabilities. Um sensitive labels in
Azure is that you have classified
product sensitive information ensuring
compliance and the organizational
policies and regulatory requirements. If
you already used um perview you probably
know. So that's that tells you what what
is the sensitiv sensitivity of your
document somebody looking at it and
classification all those things uh if
you haven't used um it's not very useful
but a lot of enterprise will use this
making sure the sensitive informations
are staying um secure and sensitive
right so those classification is very
important in a high enterprise
so um I think you know that's as
sensable available over there you can
take a look at it um That's the one and
then this one is is a material that you
can find. Uh this is a blog. Um you know
there are a lot of blocks you can go and
find. You know I will put the link to my
slide. It's already in the slides share.
Um, I will also post these in a blog
post tonight and which
Unlock the next generation of enterprise AI with Azure AI Search. In this session, Udaiappa Ramachandran (Udai) — CTO/CSO at Akumina Inc. and Microsoft MVP — walks through how to build agentic, enterprise-grade retrieval systems using Azure AI Search. Learn how query pipelines, RAG and Agentic retrieval, and data pipelines come together to deliver grounded, secure, and intelligent answers across multimodal data sources. We’ll explore: Query Pipeline: Search to agentic retrieval architecture Data Pipeline: Logic Apps + Azure AI Search for ad-hoc chunking and multimodal ingestion Security: Enterprise-grade access control with Entra ID, sensitivity labels, and encrypted indexes Azure MCP Integration: Context-sharing for AI agents Live Demo: The azure-ai-search-multimodal-sample showcasing RAG and policy-based search By the end, you’ll understand how to move from flat search to intelligent, secure, context-aware enterprise agents — all within your Azure environment.