Building Enterprise AI Agents using Azure AI Search by Udaiappa Ramachandran | DailyDevLists

Loading video player...

Full Transcript

11,580 words • EN

Um today's topic is going to be building

enterprise agents using Azure AI search.

So we going to explore um um how AI

search um you know supports the agentic

mode agentic retrieval agentic pipeline

and so on. We've been hearing the agents

for last one year um but we're going to

deep dive into particular techniques

technology called a search. And my name

is Uda Ramachant. I shortly go by UDA. I

work for a company called Acumenta Inc.

as a CTO CSO and I am also a Microsoft

MVP content. Um this is my website if

you want to check my community profiles

and there's my LinkedIn if you want to

be connected with this if not connected

already uh please reach out to me. Uh

that's a meet up Nashville UG and the

Del Boston.

So today's agenda is going to be the

Azure A search right that these are the

features that um available for last five

months. The first feature we going to

look at is the query pipeline. Um it's

also called agentic retrieval. We will

deep dive into this. What is agent

retrieval? How you can configure how

what is the benefits and so on. The

second feature we're going to look at is

a data pipeline. Um the main point of

data pipeline is how you can um crack

the document uh in a complex document

into a multiple different um slicers um

you know it can be image it can be text

or it can be graph or whatever you call

it as the multimodel notion how we can

get that into the third thing that we

can look at it the um enterprise grade

security um by default you can take the

advantage of if you have if you're

running your application in Azure and

you can take advantage of Android most

you know mostly all the services in a

secure way but this particular one we're

going to explore how you can data

partitioning right so the grounding data

is available to the user you know what

they can see is what they can get

through the groundings the next we're

going to see the MCP and and search for

development it's not particularly the

aspha search but it is about the MCP how

you can get some help um from you know

using the MCP how you can navigate your

asset services and get some help. The

final thing you're going to look at is a

sensitive labels. Um so this is this is

not part of the as it's part of the

Microsoft PV governance. Um we will talk

a little bit about it what it is. It's

still a preview feature. The final thing

is the demo. This demo is based the demo

that we're going to see today is a net

but that net demo is based on as a

search multimodel sample. Um that's a

python driver. So this completely a

python codebase and what I did was I

took that python code base converted to

net and added some features to it before

we go in there. So you can take a look

at that over here. So if you go back um

going to see it's over there what you

can configure and all those things as we

progress the topic and we we'll talk

more about it. U before we dive into

agentic rack let's talk about what's the

difference between rag and agentic rack

and why we need a aentic rack. Until

last year we all used to hear the term

rag right a ritual augment generation

and starting this year everybody talks

about agentic agent take agent is the

big marketing buzz word today. Um

everybody moving towards the agent

anything they talk in AI they talk in

the notion of agentic but it is a good

idea to understand what is a rag versus

agentic rag can rack what the agentic

rack solve the problem that rag cannot

solve the rag come into a picture if any

knowledge is not available part of the

training model. So you go and ask about

you know the good example everybody uses

is you know you you going to ask about

what is my company policy right or what

is my company vacation policy right also

known as PTO when you talk to that

context it is your internal data that

data leaves in your company or in your

employer and somewhere in the company

database but that data is not available

to the model that you're going to use in

this context you're going to query your

data and send the data part of the

prompt. So when you ask what is my PTO

policy you query your data and then you

put the data from your database and then

send back to the LLM. Now LLM craft a

new response hey here is your company

policy you sending the prompt you also

sending the grounding grounding source

or the knowledge or the response that

you got from your internet database.

Then the LLM takes all those things and

draft a new response sending it back to

you. Here is your policy. That's the

rack, right? What is agentic rack?

Agentic rack does the same thing but it

does way more than the rack again too.

So in the retrieval the rack is always a

one side. You ask a question it calls

one of the plug-in or tools. They both

are ter plug-in is the terminology used

in the darknet stack. Tools is the

terminology used in the python stack.

But now everybody converging towards

calling it the tools. Everybody is going

to call tools. Um so you know you you

feed in a tools it looks at the your

prompt and then you know it may open a

tools to get the knowledge or the

grounding source you call it as and then

it's going to give you the response. But

agent I can do iterative meaning that

based on your your um complexity of the

prompt right you may ask um you know I I

have an event here um which my you know

is it the event is going to happen what

is the weather is going to be tomorrow I

have an event and what's the weather is

going to be tomorrow you know based on

that you know what you can suggest right

that's that's a complex action you need

to find the weather it needs to find

your event and then based on that it

need to suggest what you can do and you

may al also ask some actions to move to

the event different place or different

date and so on. So it can take your

prompt based on your prompt it can

create a plan that plan can be iterative

process. it gets the data based on the

data it it it re recreates another step

you know and so on until the entire um

objective is meet right so that's the

iterative so this is just a one shot you

call get the answer send response is

over but agentic is a retriever it

create it splits your prompt into a

multiple different simple prompt is not

going to make any difference but when

you talk about the complex prompt it is

all you know it is going to be a huge

difference um it takes the prompt. It

creates a plan by calling the LLM. It

sends the prompt to LLM and then it ask

on all the pro if you have a tools it

sends all the tools description also

known as metadata and then LLM returns

okay here are the plans um you can do

then agent takes takes the control and

executes in a loop until all the the

code is achieved right the reasoning is

a limiter there's a multi-step because

it's going to go back and forth um rag

is going to be fast because it's just a

one shot um but agentic rack goes slow

based on the prompt that You asked based

on the number of grounding knowledge

source is going to interact. Right?

Sometimes you ask for the policy by so

and so department by so and so title and

the so and so date. Now it needs to run

a multiple it needs to split into a

multiple um small subqueries and send

back to the data and query and and then

join everything and return you a

response may take a long time not long

time but it may add some latencies to

complete all the iterative steps. So it

is slower but in a simple case you are

not going to see any difference but in a

complex case um you you always going to

see some slowness. Um complexity rag is

just a one shot so simple um agentic rag

is a complex based on the nature of your

your prompt and then how many data

source or tools that you configure to

work with your agent. The use case is

question and answer. You ask a question,

it returns. Right? That's the simple

rag. But agentic rag is a multiple hop.

You can ask them to do some action. For

example, you find a so and so person,

you know, so and so problem. Email to so

and so person or call so and so person.

Right? Those kind of automation you can

pipe into the actions that agent can

take over that scenario and action on

your prompt. Accuracy. Since it's a

oneshot, there is always going to be

moderate. It may hallucinate sometimes

but the agent can act you know as it's

iterative multi-step process um your

accuracy can be as high as possible. So

you can have a agent. Now that agent can

talk to you know if if the agent needs a

knowledge as I said an example what is

my company policy right your company

policy in the document in some database

and you can vectorize the document put

them in the search index before agent

retrieval if you ask find a document

that PTO policy it's going to go back to

the index most likely vectorized you

know it's look for the you know PTO

policies and return back the document

right just a search vectorized search.

It takes your incoming keyword as a

vectorized and then go back to the

search index and you're going to look in

the column vectorzed. If you need some

additional filter, you can add it um in

a hybrid search mode, you can add a

filter. If you don't have hybrid, you

purely go with the vector. Then you go

with the vector. It gives the top

refined result and that's all you do.

You take that result sent back to the

LLM. So you but here what happen is

you're still going to use LLM but it

breaks down when you sent the complex

query to the search engine the search

engine now has agent it's called

knowledge agent right it's going to send

that query back to the knowledge agent

knowledge agent now can slice into a

multiple queries before it runs a

oneshot query but now it can slice into

a multiple queries I will show you an

example over here so see is find all

company policy pages that mention hybrid

work arrangements, summarize key points

and include related HR guidelines from

the last 12 months. If you take this as

a complex query, this query makes a LLM

call to make a plan. But this is all

happening inside the Azure a search

knowledge agent. Okay. And the three

calls and subqueries, it divides the

hybrid or policies. Instead of running a

one full query, it needs to run one

query look for hybrid or policies.

Second query HR guidelines for hybrid or

last 12 months is kind of filter right

creator article to recent hybrid policy

updates. So, so one LM called creating

the plan, three search calls and then

one reranker. But these three may give a

different result, right? This may give a

30, this may give a 40 result, this may

give, you know, another 30 results all

to 100 results. But the the reranked

result can cannot exceed 50. So out of

100 results, it's going to rerank it and

return the top 50. Okay? And we're going

to take out of that, we are not going to

send all the top 50 to the LLM. That's a

lot of data. You're going to lose a lot

of money. So we may take like a top five

or top 10 at the max. All right. Uh

let's go back here. Right. That runs a

subquery

merges the result and you know if it

returns the 50 then we don't want it

merges all the three subqueries and then

it turns a top 50 then we don't want to

send all the top 50. We may send like a

top three or the max top 10 because you

don't want to pay a lot of money for the

tokens. Um the best for complex

multi-art questions as we saw before. So

here is the architecture. It works um is

a conversation history you have it and

you know you you interacting with the

agent which is your agent right um or

completion and then that completion is

going to talk to um um search service

agenting it's a beta right now so it's

going to talk to the agentic of search a

search agent and that agent is going to

that's what is going to slice your

queries into multiple sub queries and

run all the queries merge all the result

take top 50 if it is more than 50 then

send back to you the number of result

that you asked. You asked the top 10,

it's going to send you top 10. You ask

top five, you can send it top five. But

the max it can return is the 50 and then

it sends the response back to the user.

Okay. So, and then agent receives that

data and then send back to the LLM to

create the final draft of the response.

Then it send back to the agent. Okay. So

that's all we talk about it here. Uh and

you can look at these articles.

Everything is provided at the end of the

uh slide.

Um we we just discussed it you know

that's a complex query you know um it

comes from you know the these are the

filters date department all those things

is plan created by the agent it runs by

itself and returns the data all right

let's look at in a practical way right

uh the easiest way you can start this um

go back to this system

um there is a

there is a Python uh quick Start agentic

search. Okay, we will start from here

and then I will run the application that

I built which you can as I said before

you can download this. This is the

working application. You can just

download and then you need to do a

little thing over here. So you have to

complete the um um app settings um you

know your if you have Azure ID it's

authentication enabled. So you create

app ID and your tenant ID and then

register this as your redirect URL. Um

that's all you need and then client

secret to get the um token open ID token

right the same thing you're going to do

on the uh server side you go back to the

API open up the settings um you complete

all those things okay uh here it is set

service endpoint um you can leave it and

I will tell you how the application runs

as we move forward and then document

service and all those things you you

will get to it as we see the demo Okay,

that's the full application. But we will

run the full application. Before the

full application, then it's a good idea

to understand the fragment of how it

works, right? So I configured the Python

notebook. Okay, the Python notebook,

it's easy way to understand because you

can run one step at a time, right? So

you can start from here um and then run

step by step to understand how it works.

This Python notebook walk you through

how you can create the um um agent

retriever. How you can achieve the agent

retriever. What they do here they

creating an earth at night search index

and then they create the agent right as

I said you know the search service

supports the agent. So agent is created

part of your search index. I'm using the

agent rank result and so on. So let's go

back to our application browser over

here. So let's look at it.

azid.com

I already created a search service I'm

you know in your case if you exploring

it you know you have to create one but

the agentic in my case the agentic um

retrieval works only in the central US

so if you if you if you're going to

explore tonight or later um I would

suggest you create in a north central

and if you create in a north central

it's guaranteed all the features are

enabled um all Right. So now I already

created some some search indexes. We are

not going to worry about it. But let's

go back and run the samples step by

step. Right? Uh this sample as explained

here. It's going to create a search

index called earth at night. Right?

Which we don't have it. Um in here I we

have provided all the details as in as

open AI endpoint because it's going to

use a agent. It doesn't mean it's going

to have its own LLM connection. You

still have to provide LLM connection.

Okay. that LLM comes from here. Okay. So

you have to give the endpoint. So that

means if you run a lot of queries,

you're also going to pay a lot of money

too. So it's not going to give you a

free LLM um to you. You know, you will

have to give the LLM connection. Okay?

It can only provide you agent. But we

need to provide what LLM to use. And

that's the the supported LLM. In other

words, the Aza and OpenA they support I

guess and you have to provide that in

search. I don't have any access key

because it's going to use the um managed

um service principle to connect to the

services. Okay. Now let's run the

samples. If you go to the samples, we

did all the prerequisites. We created

the service. We you know we enabled the

smantic enabled um you know we have the

open a resource and then supported

model. So there are some limitation what

model they support. You know that will

tell you while creating oh this model is

not supported. But in this case GPD 40

mini or GPD41 is supported. Okay. So now

portal you know all those things

um you know if you use the um managed

identity then you have to enable that

particular identity to have the search

service contributor search index data

contributor and search um index data

reader those permissions I already

granted it. So my demo is good to go.

Then you take a samplev and then convert

back to the env again this is we are

running everything on net. If you go you

can download this to run in a Python

book and then install the packages. So

this is where we're going to start

right. So these are the package we have

to install right. So we are running in a

Python notebook search documents 1170

beta 4 and then the identity openai and

then the net environment. So and you see

tick up done it's it's completed. So

this step is completed. These packages

are installed. The second step we're

going to load those environment. Okay.

So this environment is loaded. So this

is done. I know it still say spinning

but it is done. The third step we're

going to see is create an index right?

So we said that we you know it's we

going to create the index which is not

exist already. Um we saw the name it's

not there. So we're going to run this

script to create index.

Um so if you look at this script over

here again this is you know pretty

simple right? So as a such documents

index um you're just creating the index

with the with the embedding enabled. If

you have done before you know what it is

but otherwise it has to create the

embedding vector or go to the index and

explore how it's been mapped. Okay you

have to you have to create that you know

model and then what is the embedding

field and vector search field what is

the algorithm you're going to use and

all those things we have to define it

and then create it. All it does is it

creates a U text with embedding field

enabled. Okay. So you can store the

vectors in the embedding field. Uh the

embedding or vectors are

interchangeable. Some people call

embeddings, other people call vectors

because it's the array of float numbers,

you know, vectors. But when it comes to

the API level, they most likely use the

O embeddings,

right? We enable the semantic search.

That's how you know what is a semantic

search mean here. You know for example

if you want to look for a restroom you

know some some country they may call a

bathroom other country they may call a

washroom but the document itself can say

the restrooms right if it is semantic it

knows a restroom is washroom restroom is

a bathroom and so on um so that's why we

need to create the semantic search so

everything is going to be semantic even

if you if you're looking for apple you

make a one mistake in the keyword search

it will not find it a pl is apple but

you know you put a you know miss or miss

e or any character you mess it you'll

still find it because of the systematic

search nature. Okay. And then index

client and create the index. So now if

you go back to our service here and then

you refresh it um you have the index

here that night you know you look at the

index over here and look at the field

you know these fields are created. You

can look at the semantic configuration.

It is the page junk as a content field.

That's where we're going to throw all

the all the vectors over there. Okay.

And then if you look at sorry page junk

is the semantic config vector profile is

the um you know this vector as you're

opening a text three

large you look at the field over here um

um this one page embedding that's why

we're going to uh store all the vectors

over there we didn't do anything much we

just created the index we configured the

semantic we configure the vector profile

we have four field and now we will

explore how to upload a document and

then see how the document has been

sliced.

Okay. So now we're going to upload some

documents, right? So to upload a

documents, they take the sample

documents from here and then take the

content of the document and then um you

know you just upload it. Okay. Um they

run it the documents has been uploaded.

So if you go back to our index over here

um search explorer you look for it.

There is a document right. But if you

look at this document, there's a page

number 5 4 9 8. Okay. So when we need

auting or a vectors, we have to slice

the document. Your document may have a

two pages. But you are not you know you

you have to slice them into maybe four

pages. I mean two pages document is

okay. But if you have like a 100page

document, you you know you have to slice

the document number of lines that you

want to take from the document and the

number of overlap lines. So you take a

10 lines you want to overlap two lines.

It will it will include two lines from

the previous page and then take a 10

lines right and so on. Those are the

algorithm you know those are the

critical critical thing to do the right

thing to do right if you design that

correctly your semantic lookup is going

to be accurate right if you don't do it

correctly your semantic lookup you know

may be a little weird it might not

return the result that you wanted that's

the most important part that's what the

Azure intelligence um document

intelligence service does but there are

a lot of open source that can um help

you crack the document into multiple

different parts and then you take the

parts and then you convert vectors and

then display it. If you look at it here,

it's not displaying the vectors because

it's too big. So, we hide it. So, if you

go on in it, select the vector. Now, you

go back to the search explorer and

searches back. Um, that's the vector,

right? So, that vector is generated and

then uploaded over here. So, you look

for something for example for word and

then we going to convert certain vectors

and we look against it if we look up the

vector lookup. But in our case, we going

to the agentic lookup. So whatever you

send the agent will take care of

everything. Sending back to the uh

sending back to the search index and

then take all the values. Let's

continue. We'll come back to that. Okay.

So we upload the document. Um so far we

only uploaded the document. You know we

haven't done anything much right. So we

we can go here and then look set

something maybe a forward. Uh we have a

forward over here, right? So let's go

look for the forward. But that's you are

looking at the page chunk. It's going to

look at it and return the word. return

the value but we still not doing

anything in the vectorzed search or

semantic search in other words if I go

and say this may not return any result

or may return okay so you know but it's

irrelevant return as you can see so when

I put a word okay so use uh you see the

different result when I have typo it's

not but that's what the semantic fixes

it for all right continue on so now what

we going to do we're going to create an

agent inside the search right this is

the new feature

Okay, it's a cool feature. I using it.

I'm using it for for my project. It's

it's it's amazing. So, we're going to

run it. So, let's see whether it's

create the agents. Okay, it's created

0.3 seconds. Right. So, if you look at

the code, you know, you take the Azure

open a vector parameter where it is and

then you know agent model and the

knowledge agent target index which is

our current index and then you know we

create it. Okay, that's it. Create or

update knowledge index. We already have

the index. Now, we're going to update it

with this agent. So the agent name is

going to be created all those things.

But there is no way you can see it in

the UI. Uh it's coming but right now you

cannot see it in the UI. It's going to

say um earth search agent is created or

update successfully. You will see how

you can delete it at the end. But there

is no UI that you can see it. If you're

trying to delete now this index it will

say nope you have to delete the delete

the agent first but there is no UI to

delete the agents.

We'll continue on. Um so now we created

the agent

now let's set up some message right so

then now we're going to interact with

the agent so this is the first thing

what you do is the system prompt system

prompt you know you you set up the

system prompt hey you are a Q&A agent

you know what you want to do all those

things that's a system instruction

that's the most important step so you

you have to say what is the the role of

the agent right so that's what you call

it as you know um meta prompt or the

system prompt or you can call it a

system prompt, meta prompt. Um, you

know, that's the most important. That's

a agent instruction. So, you set the

agent instruction and then agent

retrieval for fetch result. Okay. So now

we're going to retrieve it. Um, but when

you want to retrieve it, you call it as

the role user. Now the agent role is

done. We set it up. But now you as a

user want to query something, right? Why

do you know some question why do you sub

pairs display large disable operating or

whatever the whatever the user wanted to

to as a prompt is going as a role user

right and then we're going to retrieve

the agent um and then we're going to add

assistant whatever the result is

returned is you know so this is this is

user now is a system it's going to

perform it and then the assistant

message will come in the role assistant

whatever it returned by the system is

going to be come back over here so when

I run

So this step will take up to you know 10

seconds because you know now it needs to

send the data to um um search query.

Search query needs to say you know send

back to the knowledge agent. Knowledge

agent needs to be split talk to the LLM

to split this and all those things will

happen right all the cool things that we

talked. So now it it it has the result

as an assistant. So how we can see the

result? So you can go and run this one

and then you can see this is the result.

Okay. And so on you can keep going. And

but if you want to print the result in a

different way then you can start looking

at it's 11 retrieval result you know it

returns all the results input token is

1390 the output token is 458 and you can

keep look at what is what is being

returned right so you can expand it know

scrollable element you can you can

expand it and then see it

so far we send a query to um a search

agent a search agent you created

a plan and looked at it, returned the

result. But that's not enough, right?

That's that's only the retrieval part.

Now you got the best best grounded

knowledge possible, okay? Than before

because now your query is split into

multiple plan and executed parallelly

and and then reaggregated and then

grouped and then return a top ranked

rows. But that's the grounding. So far

what we did was grounding, right? These

are the result. Now how the end user

gets the polished output. Now we need to

send that to LLM. So that's that part is

you know before you user ask something

you take the intent you ask LLM what is

the user's intent and then user you know

LLM gives you the intent and then you

run some query without agent and then

you return the data but now you take

that query as it is sending back to the

search and search has the agent and it

sends back to the agent. Now agent

creates a multiple plan, executes, finds

the best result possible, return back to

you. That's the agent's job and agent is

not drafting the AI search agent is not

drafting the final response. Now we need

to call the Azure open a or the open or

any model that you wanted to use. You

take that response and then you create

the create the LLM connection, right? Um

then you use the completion API. Uh

okay. uh completion chat in this case

you can also use um agent um and then

that's the final output okay but this is

this is the uh um agent return agent

returned output the search agent output

this then we feeding back to the llm and

the llm returns this final uh final

output okay and then you can continue

the conversation and the history is

attached to it and it goes back and

forth um you know you ask whatever the

question you know How do I find the lava

at night is a question and then it's

still working on it. Uh it didn't tell

me. Okay, so it found it. The value is

there. We have to print it over here. So

we print the value. Yeah, it's a light

source at night, you know. So you get

what what it says, right? And then you

continue on. Okay. And then it displays

the result. So now so far we consumed

100 sorry um 1822 tokens. The output

token is 129 and these are the things is

happening. Okay. Now if you need to you

know continue on generate an answer and

you keep going on

the the the role is the very important

user role come back and you got the

answer. Okay that's how it works. But it

is the most important things to

understand you know every time you ask

something and you have to make sure that

you are not exceeding the token

threshold. If you look at it, this token

so far since we've been interacting back

and forth. So you also sending the

history, right? Part of the prompt,

right? So you ask a question, you got a

response and then you send a question

and response to LLM and LLM puts the

output response, right? Now questioning

data LM LM response then your question

comes in that all goes back to the LLM.

So the token over the time it's going to

be huge. So any input token you have to

divide by th000 every thousand token is

going to cost you 0.02

um 0.2 which is 2 cents. Any output

token is going to cost you 0.004

cents which is 1/5 of um 1/5 of 2 cents.

Right? So that's you can find in a

calculator but you have to be very

careful what you send what you ask.

Right? So so you know if you if you send

a huge history you know you're going to

you're going to pay more money for the

input tokens. Okay, so we got the

answer. But now finally the clean it up.

So the first cleanup step is happening

as a delete the search agent and you

delete it and finally sorry search

agent. Yeah you can delete the search

index without deleting the search agent

and then you delete the search index and

we got it off it. Um again this is

available online. You can run it but you

know it's mostly available in Python but

you can change to net and just just run

this in this mode. If you go back and

then search the indexes, our index is

made gone, right? So the that index is

gone. Okay, but now we have the complete

understanding of how it is. Why don't we

look at our application? Okay, so in our

application, we're going to run it.

So we have a two endpoints. A search

a search.b web is just a front end MVC

application just for understanding

purpose. And then the a search is

complete back end. Okay. Um so if I go

sign in as I said you have to set up the

sign in parameter now I'm sign in and I

can start asking a question I have one

document over there you know so um you

know I can ask this question on the

search index it's going to go to a

search it's a normal search nothing to

do with the agentic you know it's going

to find some result may not be useful it

returns me the result if I go to the

chat interface I ask this the same

question but I provide more options over

here um use knowledge agent

require security trimming and enable

text to speech and you can turn on then

you know instead of um you you reading

it if you're on a mobile driving mode um

it can read it for you as the response

comes in okay um it'll automatically

start reading you can pass it um you can

use use knowledge or not but for now

let's disable the um knowledge and then

let's see so this is going to use the

index without a knowledge agent right so

if you look at it that means it's going

to use the chat completion. Okay. So if

I use a chat completion, it didn't go

get in get in those block over here. I

set a break point. So next time when I

come in um you know what it what it's

doing. So it is thinking

um so it found the result right so you

know this is very simple document I

uploaded it you know that value is

coming from technology change past

present future PDF and then if you look

at the step it gone through the

grounding user message. You asked this

question um whatever the question that

we posted. Uh where is it? You asked

this question and then now that question

is grounded. It got the result. In this

it's a very simple use case. There's

only one document. So it's coming back.

Um grounding result received and

preparing the LM message. LM response.

So we going to send the prompt with what

we received from here. Step two send

back to the LM. LM return back the

response. That's the response is this

response. Right. But now let's turn on

the knowledge agent. Okay. Now we're

going to clear this chat and then put

the same exact message and then we're

going to send it. Um so if you look at

it is knowledge agent true, right? So

I'm going to run it. Uh

so here I have two parameter. One this

is like a search knowledge agent. This

one is we simulating it as if a

knowledge agent would do. That's because

we want to understand what's going on

over here. Let's run it. Okay. So, it's

thinking you got we got output, you got

document. If you look at the steps here,

this is for again illustrative purpose

only, right? You have a knowledge page

knowledge agent pattern and then agent

configuration, you know, agent message.

Then we run it, we run the query, we run

the retrieval and then, you know, and

the response generator and so on. But

what if if you you know if you clear

this chat if you going to ask the um the

the Azure a search agent right so we're

going to set a break point we are here

so now what you going to do we turning

on the knowledge but we're going to set

the a search agent right so this this

parameter is coming as a false so we're

going to leave it over here and then we

run

Okay. So it may be the same result but

you can see slightly different result um

you know but but the concept here is now

this is using the um search engine but

you need to have more more data you know

more parameter more filtering then it

makes sense this is a very simple

example okay this UI is also if you

don't if you want to create a knowledge

agent you can go and give the name you

know uh dev Boston demo right and then

you can provide some description the

description is very important uh uh this

is here uh demo index for dev post and

talk or something like that right so you

can create an index this description is

important because this is where the MCP

is going to find your description to

give you guide you what you can use what

you can do so anything that if you see

the description field you have a plan to

use the index um you plan to say MCP

it's most important to provide the you

know valid instruction so if you say

create um it's going to work on it you

create the agents Right. So there is

where is it dev Boston demo. Um if you

go back to the index over here refresh

it it will create a dev Boston demo and

then you can look at it other parameter.

I'll come back and explain to you later.

This also creates a agent body.

Okay. So now what you're going to do um

you know once you have the index you can

upload the document. That document we

have something called knowledge in text.

So we're going to upload the document to

knowledge in text. You can browse the

document and upload it. Um it's all

working. So you can take a look at it.

But let's move on to the next concept.

Okay, let's go back to the power point.

Any questions so far? I don't see any

question. I'll move on. Um

the next one is data pipeline. So

the real business document is not as

simple as you know you write a simple

resume or you find the content blog on

internet or you know some news stuff

like that, right? The complex content

type might contain a graph, chart, you

know, words, you know, you know, video,

audio imported in it or some image into

it, right? Um, that's the real business

data. So data can come from any sources,

one drive, asset storage, shareepoint,

you name it or database or you name it,

you know, you know, any any any file

system or any cloud storage or any

databases or any online software service

materials. Okay, they handles some

multiple uh media types. Turn image into

text. If your document contains the

image, then it can extract the image

alone, put them in a separate uh

separate file. So when you look for

something, it can give you the reference

to the image. You have a document, the

document contains the word image, then

you can take the image alone, put them

in a separate document, part of the

slicing process. Okay. Um keep the

document layout intact and then break

long text into smaller usable parts.

That's the part of slicing and

overlapping. So you have a document per

page you have like a 20 lines. You don't

want to take entire 20 lines. You may

have to take like u maybe 20 lines but

you want to overlap it like four lines

go to the next page right maybe the four

lines goes to the next page and so on.

Uh you can also use a logic app to

automate integration from the connectors

like a workflows power app or anywhere

you come in from or you have any

software as a service you want to bring

a data like a salesforce you can use the

logic app to connect to your salesforce

connect to your workday or connect to a

service now and bring the data over here

everything can be set up directly in the

Azure portal. So that's the data

pipeline advantages but let's take a

look at it how the data pipeline uh can

be utilized in our demo here. So once

you have the asset search you go back to

the overview and then there is import

data right so that let you import the

data to existing index and then it will

index it automatically if you have the

um indexes and the data source

configured correctly right but now we're

going to look at it import and vectorize

the data okay so you can bring a data

from asis blob storage these are the

supported data source uh or you can

bring it from the as data lake storage

the the advantage of bringing data from

the asa take data lake storage. They

also provide the out of the box ACL

import meaning that if you any any file

in the Azure data lake contains the

security group policy enabled. For

example, this document can only seen by

so and so group. It will acknowledge

that when you try to look for grounding,

it will make sure okay who you are and

then it can give you know it will make

sure that not if you if you do not have

access to the document you are not going

to see in the grounding result. Okay,

that's the advantage of as data lake

storage gen 2 that provides out of the

box. Um you can also bring your own uh

permissions but we will talk in the next

next slide. So for now I'm going to take

it. You can do a simple rack or

multiodel rack. Um I'm going to select

my storage account. Um I'm going to

select my container. Uh you know I have

some documents over there. And I'm going

to go next. Um

the first step will take a slightly

longer because it it needs to validate

everything. Okay. So you can do a

default as a intelligence but in this

particular demo we're going to use a a

document intelligence. Um mean once you

pick your own service then you have to

provide the credential to access those

service. What does that mean? You also

we also have to pay for the money right.

So we picked the a document intelligence

service that means we have to create the

document intelligence service and any

consumption goes we we are responsible

for paying for it. You can use the API

key but they they can make the

connection querying it or system

assigned identity that's the best model

um you know um to securely connect the

system. So you don't need to worry about

but some people might disable the uh um

it's called a local mode they can

disable the API key then it won't work.

So this is the right approach. Go to

next. And now we can do image vization

with the text vectoration. So either we

can use the image or multimodel

embedding. This some reason it's not

working. So we're going to select this

one. Um and then and now it's asking me

what is the open service you want to

use. Right? Because as I said before we

have to give them the LLM. We have to

give them all the we have to give them

the LLM. We have to give them the

embedding LLM. Right? So what is your

embedding LLM? Embedding LM is where

where you generate the vectors where the

LLM that's where where your you know

grounded grounding knowledge is going to

get in you know track your response or

the plan is created right so we need to

provide both so in this case this is llm

it's asking me model deployment so I'm

going to pick the GPT40 okay again I'm

putting a system assigned identity now

we need a text vectorization uh we're

going to select the service open a

service and the model is text embedding

three large it used to be a text add 002

but now the two two embedding model

popular is the text embedding three

large text embedding three small based

on your based on your need if you really

have a requirement that deals with

multi- language and you know multimodel

the real notion of multimodel then it's

recommended to go embedding three large

and I will enable the system identity I

acknowledge it because you know it's

also incredational cost you better read

it from there as well uh click next and

then now the storage account um

image output location. Okay, so this is

image output location. What does that

mean? Your document as I said before

your document may have a image button in

it, right? So what it's going to

document intelligent service, it's going

to take the document, it extracts the

image alone and also it will extract the

text from the image. So it creates

another document called image document.

So you have the main document, you have

the image document. If you're looking

for something that come from the part of

the image document, it will display the

image document. If you're looking for

something that come from the main

document, display the main document. So

that's the that's the reason that we

have to um give a output location.

Right? So we're going to say just

output. It's already created. So I'm

reusing the same thing. I will leave I'm

not going to worry about that parameter.

Um you know and then the indexing right

indexing is how often you want to index

the data. So maybe in this demo you

would just say it once, but based on

your need you can read. Um you need to

enable this sematic anchor. If you want

to add a more field and you can add

there are some fields are they

predefined that's um that's the um um u

what do you call the cons of this

approach because you know you have to

use the field they used if you

programmatically then you can overcome

it your own field name but you can add a

field from the data from the data source

you know we select the the blob stores

from the um um as a data lake it gives

you all the meta metadata field you can

add and put it in but if you have the

custom metadata field it will display

here as well. So once you know you need

a more field for example you know I'm

should holding it here you want you know

friendly title or display name or

something like that you create the

metadata in the block and it'll bring

that bring that metadata here and then

you can create the feed okay let's

delete this one

right cancel it and then next

uh we're going to say um we just call it

as Boston

do a box so you know what it is and then

you create it. I have five documents

uploaded but when it slice and creates

all the all the all the vectorization it

could look like a 50 documents uh say

start indexing.

So if we look for something we don't

have anything we'll go check whether

it's really created it. So if I go back

to my data sources

uh yeah the data source looks correct I

go back to my indexer it's still working

on it right so there are three part

comes in one is the index where you

query indexer is one transfer

scheduleuler to take the data and bring

it back and then the data source is the

your ultimate source of truth okay so

it's still running on it uh okay it's

done now if you go back to the index if

you look look up the search here um you

may see some data now okay so we see the

data I mean if you annoyed by seeing the

content embedding as a you know maybe we

turn it on 372 vectors people used to

use a 1 1536 vector um I seen a lot of

people use a 1536 vector but recently I

start noticing everybody use the 3072

vectors size right that means it's going

to create that many dimension of the

data that we providing and since it's a

too much of data we seeing it we remove

it. Now we go back to the search

explorer. But we removing it from the

retrieval. We don't need to. Nobody

going to retrieve the float. It's

meaningless data for the end user. But

it's a meaning very meaningful data for

querying the data. But when the end user

seeing the floating, they got they're

not getting any any insight. It's just a

number. So we can remove from the

retrieval. Um that's why it's

searchable. But retrieval is always you

can um remove and add it back. We look

at it. So it's a 50 we only upload a

five documents um I can share the repo

but the repo only contains the five

documents but once it slice and does all

those things it's become like a 50

documents okay so now if I go back to

the um the um here what it does is the

data pipeline is also creates something

called skill sets okay remember on the

way we provide the document intelligence

service credentials we provided the um

embedding uh embedding model we provide

model you can see all those things over

here that's a skill set is created in

the skill set we say like a documents

take the document from here and then the

field data and then it's a skill set is

over here you know it's analizing the

image you put it over here in you know

here is the open a connection which

model it's going to use for in a

embedding creating the embedding of the

sliced documents um you know and all

those things okay so you can look at the

skill set and then see how it's been

doing it um then it's putting the output

normalized image to the output folder

and so on. So this this is how it works.

So what it does is the indexer runs the

skill sets as a step approach, right? So

when the indexer runs, it's going to

invoke the so if you look at the indexes

code, you go back to the indexes and

then you pull the de Boston and then you

can always look at the edit JSON. Um it

will have pointer to the skill sets.

Okay, so it's going to run the skill

sets.

it will have pointed to a data source

point to target index. So indexer is

orchestrator that uses the skill set and

data source to populate the index and

then the skill set handles all the rules

how you want to decide and all those

things you can do skill set way that's

the most effective way um you know and

you know as a service handles it or you

can manually slice it you know in our

demo that we had it we manually slicing

it right so if you look at this demo if

you run this demo we manually taking the

document we slicing it we taking the we

creating the emping you can step in the

code we creating the emping we put them

in the index but in this case skill set

does all the work for us. Okay. All

right. Move on to the next topic.

So the next topic is uh where we are

right now data pipeline.

Yeah we looked at the data pipeline

document level access right we talked

about it you know um um we talked about

it um as a data lake supports the um

these tendra ID permission models.

That's the one way but not all the

documents always come from the data uh

lake right so you may have a document

from your database you may have document

from somewhere else or flat file system

or anywhere but the bottom line they are

loaded into the index that's how you're

going to query it right but once you

start loading the index but you want to

apply the ground groundness then how you

can do that um so you can do it in a two

different way one is you know pull model

that's the u you know um a as a search

will handle it from if you have a

certain data source in this case as a

data lake. Other one is the push model

meaning that you going to create the

field and you put it over there and then

you're going to start querying that

field. Okay. Um so we're going to look

at the push model today. That mean that

we're going to push the data and then

how we can query the data back. The full

model is mainly for some predefined data

sources like the um adl.

Okay. Uh, all right. Let's go back over

here. So now, um, let's get back to this

page. All right. I'm going to go back

here in overview. I'm going to go back

to my resource. Um, in the resource, if

you we talked about some storage, right?

So, if you look at the storage over

here, storage browser

um container, we have documents,

some five documents here, right? some

documents over here and then it created

the image library which we call it as

the um document out right okay so we put

on this library right this is the area

library so we have a share point um we

named it but we have some documents here

and then uh when we go back to the

container and then this is the output

container this you know this created

some images if you look at the image you

probably know what I'm saying so view

you know it's a it's a logo there's

nothing but but the idea is you just

slice the document right this is another

document okay so this is the image

inside the document it extract and

putting it over here but if you're

looking for something like uh you know

redirect to URI it might exactly point

to this image

okay so now come to the permissions

model so if I go back to the search

service over here so if I look at some

of the indexes or let's go back to the

index the knowledge text that we have

it. Uh if you go back to the fields, we

have some fields called user ids and

group ids. Okay, user ids you can put

list of user ids. Group ids you can put

list of group ids. Okay. Um again it's

array array of string. So um there is a

limitation how many data you can store.

Um you know if you use a metadata if you

use if you use it directly you can

upload as many row as many data as you

want. But some people may prefer to use

this as a metadata. Right? So for

example, we talked about the data lake

storage. Data lake storage is stored as

a metadata. There is a limitation how

how much the metadata can grow. Right?

So if you use bottom line if you use a

metadata to store the group ids and user

ids part of the blob that you are

uploading. So you are putting a document

called company policy and you set there

the who can access it. There's a

restriction. It can exceed you know more

than 8k bytes. But that includes all the

metadata, right? So that means you can

only have about 50 group ids or 50 user

ids of the guided. So that's not

recommended. The best way maybe you you

put the document first and then you run

the data base call and then take

everything and then inject into the

index on this particular call. Okay,

that's that might allow you to insert

more data. But it's still you have to

come to an conclusion how you want to

organize this. You don't want to allow

like a thousands of user ids or

thousands of group ids. Instead, you

want to make sure that you know if it is

a you know define the um definite number

of groups maybe you can say and I'm

going to allow 50 groups. I'm going to

organize everything into this 50 groups.

Okay, that's more than enough. Often

time you know three groups is enough but

if you want to go you know 50 groups or

100 groups but I seen the customers who

have like 100,000 groups I don't know

how they get into there. Um I I had seen

couple of customer they have 200,000

groups um just groups groups after

groups after groups so they have a lot

of groups right but you don't want to

put all those thing that's going to be

disaster so you can design your system

how many groups you want to allow based

on that you can either set that groups

part of the metadata or set that group

if you set part of the metadata the

indexing process will take care of it

indexing it otherwise you have to

manually inject in ingest that data into

the index okay so if um to run this

demo. So if I go back to um go back to

here, I'm going to go back to asive

directory. Now we are looking at the

index, right? So we only have one

document, right? We want to make sure

that we get a result or not, right? So

we go back, we go back to the chat and

then we're going to put the we have the

message lower there. Um this is a

document. U my ID is the Microsoft ID.

Uh that's my ID. So what I'm going to

do, I'm going to go back to this index

and then the users.

Then I'm going to pull meaning. Okay. So

that could be this one. Right. I'm going

to go back to the groups.

Okay. So now I'm part of the 851431

group. So let me see whether I have the

data index to here.

Um you go back to the index search

index. Go back to the knowledge index

fields. Sorry, search explorer search.

Uh I don't have that group added. Um

let's take a different user here. 0251

as a group ID. Okay. So let's take this

users who is over here.

So that's my account, right? So let's go

back. Let's browse this site using my

account. So now what I'm going to do um

where's the UI? Here it is. So I'm going

to take this URL

and go back to

so I'm allowed. Let's browse another

user called this user. So let's see that

this user can see the data. So I'm going

to sign in

now. I sign in as that user. I'm going

to go chat box and I'm going to take um

this query here.

paste it over here.

Uh I'm I'm skipping the stepping part,

but obviously if you're a net developer,

you'll get it. Uh you still find it in

it shouldn't be finding it. Uh

oh, maybe he belongs to another group. I

guess let's go and remove this person.

But you got my answer, you know. We

don't we don't have to stick over here.

But he got me concerned. Maybe he

belongs to this group. Okay, let's move

on.

So the next thing is

next thing is MCP. Azure MCP. This has

nothing to do with Azure index itself.

Azure index itself is overall concept

that how we can use Azure MCB to

discover the services provisioned in

your uh subscription. Right? um you know

you use the IDE you know cursor or

resource studio code or you you name it

the your your popular IDE uh popular ID

any ID that supports the MCP integration

so in this case we're going to look at

the VS code use the VS code um you know

how you can ask a data about the Azure

services okay so what I'm going to do

here so I'm going to open up Visual

Studio Code you know I'm going to um ask

in this agent mode Okay. Um

uh can so to enable the MCP you have to

first enable the Azure login. So you

have to go here and then you log in you

you make sure your subscription is

there. But any point in time if you

stuck something you select this agent

mode and then you can ask say for

example you are not sure how to um see

the ACMCP you are not sure how to enable

the as a as a connection right to see

these resources you can simply ask hey

can you tell me the steps how I can

enable it would even tell you go back to

left or right based on where you are and

click on the icons and then you know

then you sign in and complete the

process. So it will tell you completely

like like a master um then you can see

it right. So you can start asking the

right questions and it can help you. So

if your prompt is good your A is good.

If your prompt is not good your A is not

good. So in this case I already

connected everything. I'm going to ask,

can you suggest me an index that can be

used to find easil group mapping

for security filter?

Okay. So, I just asked a very very very

raw question, you know, can you tell me?

Um, as I said before, this has nothing

to do with um such itself, but it it it

gives me some response. So if you go and

look at it all I'm looking for index

name right for user to group map nodes

to support index uh is it's completely

design you know sometimes it does this

so I can ask I can ask in a different

way um

that I'm looking for

an index name in my search service

uh

what is The search service we are using

it

demo one.

Okay, it now asking me um you know

enable auto approve. I can say hello or

whatever it is but let me say hello for

now. Or sometimes you can enable auto

session.

You got the concept. Sometimes it finds

it faster, other times it just takes. Uh

I know I just wanted to approve it so I

don't have to answer every time.

I mean it's a demo machine. I'm fine.

But you have to be very careful when you

run always, you know, because you are

running agent mode. If you ask can you

delete my C c files it's going to delete

it. So you have to be very careful what

you auto approve.

Okay. So I need resource group name. Uh

but it's it's not going in a very uh but

sometimes it gets in a right you know uh

one one click other times it's kind of

you know keep going in a loop but it'll

find eventually. But you get my answer.

Let's go back to the next one.

is the last one. Um so what is this

means right? This is the um sensitive

labels. I don't have demo because I

didn't you know this is a preview so I

didn't I I signed up but some reason I

didn't get access to it. Um this is a

pre preview feature. This is not again

it's not part of the search. Um this is

part of the um um you know um

um

Microsoft purview information

productions and it sensitive labeling

capabilities. Um sensitive labels in

Azure is that you have classified

product sensitive information ensuring

compliance and the organizational

policies and regulatory requirements. If

you already used um perview you probably

know. So that's that tells you what what

is the sensitiv sensitivity of your

document somebody looking at it and

classification all those things uh if

you haven't used um it's not very useful

but a lot of enterprise will use this

making sure the sensitive informations

are staying um secure and sensitive

right so those classification is very

important in a high enterprise

so um I think you know that's as

sensable available over there you can

take a look at it um That's the one and

then this one is is a material that you

can find. Uh this is a blog. Um you know

there are a lot of blocks you can go and

find. You know I will put the link to my

slide. It's already in the slides share.

Um, I will also post these in a blog

post tonight and which

Building Enterprise AI Agents using Azure AI Search by Udaiappa Ramachandran

Nashua Cloud Dotnet User Group

10 days ago

1:00:59

RAG & Vector Search

Rank #3

Description

Unlock the next generation of enterprise AI with Azure AI Search. In this session, Udaiappa Ramachandran (Udai) — CTO/CSO at Akumina Inc. and Microsoft MVP — walks through how to build agentic, enterprise-grade retrieval systems using Azure AI Search. Learn how query pipelines, RAG and Agentic retrieval, and data pipelines come together to deliver grounded, secure, and intelligent answers across multimodal data sources. We’ll explore: Query Pipeline: Search to agentic retrieval architecture Data Pipeline: Logic Apps + Azure AI Search for ad-hoc chunking and multimodal ingestion Security: Enterprise-grade access control with Entra ID, sensitivity labels, and encrypted indexes Azure MCP Integration: Context-sharing for AI agents Live Demo: The azure-ai-search-multimodal-sample showcasing RAG and policy-based search By the end, you’ll understand how to move from flat search to intelligent, secure, context-aware enterprise agents — all within your Azure environment.

Video Details

Category

RAG & Vector Search

Featured Date

November 7, 2025

Quality Rank

#3

AI Recommended