Building Production-Ready AI Apps with Llama, Hugging Face, and Python #events | DailyDevLists

Loading video player...

Full Transcript

9,092 words • EN

[Music]

Welcome to the fifth community hosted by

Multicloud for you. Hello everyone, a

very warm welcome to all of you. I'm

Sushi Mistra, your host for today's

event with my co-host Lavik. So today's

session is all about building production

ready AI apps and with llama hugging

face and python presented by Dr.

Abhishek Mistra. So before begin let's

take a moment who we are and where we

standing for.

>> First uh we have to understand what we

do who are we. So talking about fifth.

So fifth uh is a stand for fifth

industrial revolution and fifth hour is

a single stand learning community and a

platform where you can learn grow and

polish your skill. So as you can see uh

we started 5 years back in 2019. Then uh

we grow and recognized by big names.

Then we expand our horizon and in 2022

uh we convert a single standal learning

uh community to our platform and uh uh

then we uh create events give mentorship

and so many things. Talking about the

fifth hour features so uh for users the

platform help people create different

type of profiles to market them better.

Uh you can create your business card and

uh talking about business card you can

create your uh uh digital visiting card.

uh you can go to the fifth.com and

create your digital v card and uh we are

very soon going to launch NFC card also.

So you can check out on fifth.com and uh

yeah back to slide uh you can create

your CV, company page, business page, uh

mentor profile and uh talking about

professional you can enhance your

business and product also and for the

organization you can create a community

you can give a mentorship uh create uh

blog give quizzes confess event job

conference job post speaker profile quiz

and so many things.

And talking about multiloud for you. So

multiloud for you is a a community

focused tech firm building data platform

and custom products with consulting and

engineering. We work on generative AI

blockchain data products data platform

uh cloud and uh talking about domains.

So we also worked in retail uh banking

and finance, energy, healthcare, IT and

we have so many clients supported across

the globe uh such that uh America, Latin

America, Europe, Middle East, India,

Oenia. So we are not talking here are

numbers justified our growth. Uh so we

have 31,555

members on register on multiloud for you

and we distribute 8,880 certificates

across the globe. We have 413 speakers,

23 courses, 275 sessions

and uh so we also run a reward

certification program and uh we give a

certificate to our top speakers

and we are also a partnership with

Alibaba cloud academy. So u you can go

to the fifth.com and uh enroll in any

courses you are interested in and you

will get this Alibaba cloud certificate.

And so by the end of this uh uh session

you will uh get a link in your team

chatbot. So by clicking on that link you

have to give the quiz and uh after

completing the quiz you will get the

certificate from us.

Talking about a speaker Dr. Bishek

Mishra. Dr. Vishek Mishra is a seasoned

cloud architect with over 20 years of

industry experience in driving digital

transformation through cloud and AI and

early adopter of Microsoft Azour since

uh 2013. He has led enterprise grade

cloud initiative worldwide and uh Dr.

Mishra is also a tof 9.2 Microsoft

certified Azour solution architect

expert and an AI engineer. So it's time

to call his speaker Dr. Bishik Mishra.

Dr. Dr. Mr. Misha over to you.

>> Thank you fifth air for this wonderful

opportunity again to speak at this uh

wonderful forum. I love speaking at uh

fifth air.

Uh so today I bring you a very exciting

topic like u a very much of a futuristic

topic on how to build intelligent apps

using metal lama via hugging face API

inference. So like um uh when I talk

about hugging face, it's a open-source

uh um um hub where you can find lot of

uh models around there. You can download

those models even locally and you can uh

work on those models.

Um and the best part is like uh uh you

can also work uh using API inference.

you need not uh if you don't want to run

uh or or download the models locally and

work on it absolutely fine. So you can

also use the AP inference way like we do

it for uh chat GPT or open AI open AP AI

or Azure OpenAI or any of the popular uh

uh patch services

and I would say like this is the best

way to like um um experiment and try out

the models before you adopt those models

to any project. So normally when you

think of building a jenna application or

you are a gen architect the very first

thing you do is like you select the

right model that will fit to your use

case and this is the playground where

you can try these models and uh have a

flavor of it and then you can actually

try like uh it's there's more lot to it

maybe when I get dive into the session

like uh I progress to the session I will

be able to show you okay so anyways like

thanks for the wonderful introduction so

I will just skip this uh about me and I

will directly focus on the topics that I

have in my agenda.

So like uh today we will start with uh

the basics of AI MLDDL. Uh what I mean

by that actually sorry for the acronyms

like it was I I would have definitely

loved to expanded them but uh uh like

somehow I ended up writing those

acronyms. AI is nothing but artificial

intelligence. ML is machine learning and

uh deal is deep learning. Okay. So why

I'm not directly jumping into the topic

is because I first I want to build some

fundamentals.

Okay. So there can be audience uh who

are very new to the AI world or the

generative AI world and uh it's

important for them to understand the

evolution before I jump into hugging

phase.

Okay. So let's start with the very first

thing uh that is artificial

intelligence. But before I like define

AI, can someone from the group tell me

like uh uh when was the first AI

application built or or when did AI

started?

Any guesses?

Okay, let me tell. So it it all started

in um 1950s. Okay. So in 1950s the AI

started and it was it's a very wide uh

domain actually when you talk about AI.

Okay. So in 1950s so we had some rules.

So based on the rules like uh we used to

like build our AI applications. Okay. So

this is your AI. So we at that point of

time in 1950 so we had lot of rule-based

uh applications like uh if this happens

then this or that and slowly AI

graduated into like something called as

uh subset called as machine learning

okay it's a subset of AI where like we

brought in concepts like probabilities

statistics

okay the models started use like we

started talking about models who could

work on uh statistics and uh uh all

these things like probabilities and um

then came up uh those concepts of like

supervised learnings, unsupervised

learnings then uh like uh uh so where

like we normally label the data and uh

uh we know the features definitely like

when I say features like suppose uh um

any artificial intelligence uh program

or or or any ML based programs they work

on a feature Feature identifications. We

need to identify the feature and tell

like uh uh maybe like um the age uh like

suppose you are doing a survey of uh uh

like the age of like um maybe the income

spent across various age categories. So

you need to identify people on age then

uh um uh gender and all. So these are

nothing but these are the features.

Okay. And then you label the data. Okay.

do some data labeling and then like on

the labelled data like your supervised

learning will run. Then we had the

unsupervised um uh um uh learnings were

uh happening where like uh part of them

is labeled and part of them is not

labeled. Then you have uh uh uh sorry

it's a semi-supervised learning where

you have the part of the uh data set is

labeled and part of it is not labeled

and then like you have your

reinforcement learning like uh where

like uh you have um uh like you you

penalize uh your uh model if they do the

right thing and if they don't predict

the right thing then you you penalize it

if you they do not do the right thing

and you don't penalize it uh then uh um

when uh like uh they're doing the right

thing, you reward them. Okay, so this

kind of models came in. You also had the

unsupervised learning like uh that

didn't have any labels but feature was

the key. So you need to identify the

feature. Then uh this was the era of

your um uh machine learning that was

based on statistics and where like you

had the supervised learnings uh

unsupervised and uh reinforcement

learnings. Then came the era like of

deep learning where what you did was uh

uh like uh uh these um applications

started thinking just like uh humans

human brains. So they mimic the human

brains. So what is a human brain? Human

brain is nothing but it has a lot of uh

nodes called as neurons. It's it's a

combination of lot of neurons and those

neurons like they talk to each other and

take a decision in the same way here. So

how these uh models deep learning models

work is they have a bunch of neurons.

Okay. So maybe or nodes. Okay. So these

are the nodes for example. So

I'll just put some nodes.

Okay.

Okay. So these nodes will communicate

with each other.

Okay. Then they will also communicate

among themselves.

Okay.

Then uh uh so these all these nodes I'm

just showing here some five to six but

there are millions of nodes which

communicate uh among each other and take

a decision and all uh and uh each like

there is something called as weights

associated uh uh to the learning

parameters like for example uh um uh uh

as a a nodes uh is uh processing uh some

kind of information that information is

multiplied by that weight and that

multiplication results whether the next

node should um like uh activate or not.

If it finds that uh this node should be

activated and the weight is unable to

activate it then it will go back it will

propagate do a backward propagation and

it will adjust the weight and make sure

that uh um the weight is adjusted so

that this next node is propagated. So

this is how the things move back and

forth and uh like uh nodes u get

activated and uh uh we get the results.

So it has a input layer and output layer

and like a lot of hidden layers in

between. So these all are the hidden

layers of the neurons.

This is how like the neural network

works. So

uh that was all fine with neural

networks like we had our uh uh uh

something called as uh recurrent uh

neural network that processes the

information uh sequentially. That means

when I say that um I have a dog at my

house. So it will first process I then

have then uh a then dog then at my

house. So each of these information will

flow sequentially among the nodes. They

will be processed one by one. So that

adds to a delay and also brings in a

problem like where like the model needs

to remember the context of a of longer

strings and or what has happened long

before. So to overcome all these

complexities there is something called

as a transformer that came into

existence and what those transformers uh

did was so those transformers are uh uh

clever enough or they use something

called a self attention where they like

are able to process uh like each of

these letters or or or words in in

parallel. So when I say I am a boy or

like uh I have a dog at my home. So the

entire sentence each word of the

sentence is processed uh in parallel

instead of processing it sequentially

using this uh uh RNN or uh uh uh this

recurrent neural network. So that is how

the transformers uh came out and

transformers are ones that you see in

the chat GPS or like geiny burk the

burke models with the google geminy that

uh helps you in like text uh

classifications and all those rather the

text generations and all. So this text

generation mechanisms they came from

these transformers. Okay. So uh uh now

like with uh the transformers we started

like uh generating tons and tons of text

and it was all working good. Then it

evolved to something into a new area

like uh where like we need to generate

videos and uh um and uh images. So then

uh like uh the researchers uh then they

came back with um a different kinds of

models like uh maybe you must have heard

about uh deep fake the the the deep fake

applications

like they are based on uh G something

called as G model or like uh um

generative adversial uh network models

and what they do is they have two sets

of neural networks like similar to is

they have two sets of neural networks.

One of the neural networks will be like

uh creating or generating the images.

The second neural network will be saying

that whatever the first neural network

is generating is fake. Then the first

neural network will take the feedback

and will try to improve on it and again

the second network will try to prove it

as fake. So as a result like um with the

competition between both the

neural networks you finally end up with

having a very good uh image quality and

uh it's able to fake a image really

well.

So that is how the generative uh the the

uh the gen generative adversial uh

network works and also like there is one

more kind of a image generation models

called as diffusion

diffusion based models. What it does is

if you give uh these models an image it

will keep adding noise to it and once

lot of noise is added like uh the image

gets distorted. Now from this distorted

uh image uh like uh the generative AI

model will like will be very innovative

in removing the noise and transforming

the images to lot of possibilities. It's

just like you I give you a block of

marble and I'll ask you to carve it out

from there. So you will use your

creativity to like uh shape that marble

into uh the right uh or or uh the the

right uh statue. Okay. So you get the

the raw block and from the raw block you

craft the things. In the same way from

the image we add noise and uh make it as

a raw block so that uh imagination can

be applied and the images can be like uh

created based on the algorithms.

So this is how like uh generative AI has

uh come into existence today and this is

the whole journey of uh how the things

have worked. So with this background now

let me take you back to my slide.

I know I was bit uh fast

but don't worry if you have also not got

uh the concepts what I wanted to

communicate it's absolutely fine because

uh I will be uh like uh it's not needed

to follow what I'm going to uh like demo

today. But again knowing these uh

models, knowing this evolution is

important because as an architect when

uh you are going to design your uh

application or generative AI

application, you need to know like which

model to choose uh what is happening in

the background. So all these

informations will help you. Okay. Okay.

So let's get started with this uh

prerequisites that I just uh briefed

you. So artificial intelligence is

nothing but it's a field or like where

like uh machines start thinking like

humans

and there are various applications need

not say like all of us are aware like we

have natural language processing visions

robotics recommendations engines lot

more and the very key idea is like

enabling uh machines to mimic uh human

intelligence

okay so I just spoke about the machine

learning like it's a subset of AI like

uh where the use that uses a probability

and statistics

okay that uses probability and

statistics to do supervised unsupervised

and reinforcement learning. So

reinforcement learning is nothing but it

works on a penalty system like if the

model doesn't predict something right it

it is penalized if it is it predicts

okay then uh it is rewarded then again

if you don't uh label the data then it

is unsupervised we also have a

semi-supervised then part of the data is

leveled and part is not leveled and a

few of the example is spam detection and

recommendation systems

hey just one minute I'll just verify

thing

uh I okay sorry apologies I had turned

off the video feed I have turned it on

okay so then we then I have already

explained you what uh deep learning is

so with uh deep learning what we do is

uh uh we try to like um think like a

human brain uh brain is nothing but a

bunch of neurons so we in the same way

we build nodes uh for our uh like for

our models in our models so that like

each node is a neuron and can think like

a neuron. We have millions of those uh

neurons mim mim making the human brain

and uh like uh this is the the point

where generative AI started uh evol like

evolving and one of the most important

feature to highlight is feature

extraction. So in uh semi-supervised or

or or rather like in machine learning uh

you provide the feature but here like

it's intelligent enough to unlock the

feature

for you.

Okay.

So next uh I have already spoken about

neural network. I uh like uh how it

works like uh it has uh three layers

input uh hidden and output layers. Like

basically each of these layers are

nothing but these bunch of neurons or

the nodes that we see here and uh the

data flows uh to and fro like for

example like I told you like each of

these uh um inputs are like multiplied

by weights and the output is generated.

If the output is uh um like sufficient

then like uh the activ the fun there is

something called as activation function

for which the other neuron the next

neuron in the queue that gets activated.

If it is not and if it should have been

activated then uh a feedback goes back

to the previous input and then weight is

adjusted and again like um uh the data

flows into the next uh node and this is

how the data goes to and forth making

sure that um

the neurons uh keep activating and your

uh uh network your in your input gets uh

processed.

Then I spoke about already I have uh

demonstrated you or or rather I spoke

about the uh recurrent neural network

that is uh that handles the information

uh sequentially.

Okay. So like uh

the very problem I told you is like uh

keeping the context because if the

sentence is long or in paragraphs it

struggles to or it needs lot of memory

to keep the context.

So that is why where we came up with

these large language models trained on

uh transformers.

So

uh these transformers are like uh uh

nothing but they are a set of

is it they just not uh uh uh understand

the natural language they do much beyond

that. So what happens exactly is like uh

you must if you have worked on ML

applications uh uh um back in 2016 or

2017 that is a time when I started on AI

and ML like on Azure machine learning

studio. So there we used to for NLP we

we had a concept of uh um uh like uh um

like uh um we we used to actually like

for example let's say I am a boy then

like uh we have a unig I that's a single

word okay we had bgrams like I am like

two words it's a bgram or um we used to

have a multigram like or engram

basically they're called as engrams

where it is a uh it's a combination of a

lot of uh words and using those uh like

uh grams a prediction is to happen like

for example I say that I am a so based

on that I if I'm using a engram concept

and I am a is one engram then after I am

a the model will predict that I am a boy

like for example um you are using a

unigram like uh I love eating so eating

so eating can be based on that word

eating it is considered as a unagram and

for that eating lot of uh words will get

predicted. So that is how your NLP used

to work in those days. If you worked on

NLTK and all natural language toolkit

and all this is how it used to work but

in LLMC it they don't work that way. So

they use transformers. Let me say if I

have a oh yeah I have a transformer. So

they have a encoder decoder mechanism.

they use an something called as an

attention self attention mechanism where

like when I say I am a boy then I am a

boy all these uh words are processed in

parallel and like they're positionally

encoded that means uh they will uh uh uh

they will uh like it's it's not just

about uh um uh like uh predicting it

does a semantic uh like um um like it it

understands the context Basically it

does a semantic search definitely like

it understands the context okay by doing

a positional encoding based on like say

I say that I am a boy then a is kept in

a is assigned a position in the sentence

boy is uh defined a position in the

sentence uh like uh for example someone

says like or or it's like boy is

relatively near to a and far from i. So

based on those information it does the

prediction and it this makes sure that

uh um uh the context is preserved the

real context of the sentence like for

example there is a sarcasm like uh

someone uh post that I had a great

flight but I lost my baggage.

Thanks for the great flight I lost my

baggage. If you give this to flight, a

natural language processing will not

work here. NLP normal NLP will say that

the sentiment is bad. But if you're

using a transformer, so it has the

capability of uh understanding it the

right way that uh um the it's not a

positive uh sentence. It's a negative

one and uh it should not be that way. So

that is only that can only happen

because of uh this context handling

features using the positional encoding.

Okay. So, lots of theory. Now, let's get

to hugging face. I will take you to

hugging face and I will show you some uh

few of the things here.

Uh, okay. So, before I get into the

hugging face, so let me uh tell you what

hugging face is. Hugging face is a

open-source hub. Okay. So you'll find a

lot of data sets, you'll find models,

open-source models or they call it openw

weight models when it's open weighted

models. So you can download these

models. I spoke about weights right in

neural networks. So when the data flows

through the neural networks, it gets

amplified by those weights. If the next

activation function doesn't uh activate

the next node, then this weight is

adjusted adjusted through a feedback

mechanism. Okay. So uh you can download

these uh models and you can actually

adjust those weights. Okay, the weights

are exposed to you. That's why they are

called as open weighted models

and uh uh like uh you can uh uh you can

also build your application and uh put

it there. Okay. And uh you can uh uh use

those uh urls for inferring uh those uh

models. And it's a communitydriven

driven ecosystem. I'll just show you to

you. Okay. So before I get into the

metal lama thing, let me take you to the

hugging face interface. Okay.

This is your hugging face. Okay,

I have already created a user ID. You

can create a user ID and log login to

the hugging face. And here you can see

various models listed here. Okay.

So you can use these uh models uh uh you

can either you can download it these

models and run it locally. There is a

package called as transformers. So you

can use those transformers uh python

package. Bring down these models to your

laptop. These are lightweight models and

you can run them on your laptop. Okay.

So if not uh you can also like do it the

way we work on our uh with our Azure

OpenAI or any OpenAI applications by

using those u uh URLs and the keys. I'll

show you like how to generate the keys

to work on them. Okay. And also it's a

communitydriven one. You can write your

own blogs. You can post in blogs on

hugging face. you can interact with

other fellow researchers or like uh

developers who are uploading uh these uh

models. Okay. And you can uh if you need

a data set like you want to have a data

set to build your Python model or or

sorry your um generative AI model your

own generative AI model or you want to

use those uh data set for something you

can download it from here. There's a lot

of uh communitydriven data sets that you

can find and they're really useful if

you're if you're planning to build your

own model or like you want to use it

somewhere.

Then the best part is spaces where like

uh these uh the developers they build

their own applications and they have

hosted it here. You can come and like

suppose you want to use like u uh use

their model you can come here you can

check that model and you can use it like

for example uh I will what I'll do is

I will just uh go to this quen image

I'll just go to image generation

okay I'll just uh take this uh quen

image no not control net quen image

Yeah, it's a simple one.

Okay,

I have already logged in to do the

inference.

So maybe I can say something like uh

generate

a

a happy cat for me.

for me

drinking

Coca-Cola

under the sun.

Okay, then let me submit it.

So, it takes a while to process it.

just hold on like it takes a while.

Meanwhile, I'll I'll show you one more

while it's happening. Even like there is

something called as a coin image

application that someone has built. So

maybe like uh I like this animal cap

capy.

So let's try this.

Okay, I think

there is some issue. Okay, let's see.

Maybe I'm trying to do two things

simultaneously.

Got that error. But that's fine. Let it

generate.

Oh,

okay. So it generated this uh capy bara

for me and uh I know that I see that

it's a good image uh generator. So what

I can do is I I can actually go to uh

download the model and I can use the

model.

Okay. So that is how I can actually

guide with this. So this one I think

it's still generating. It will take some

time because normally these models they

take time. So I'll just uh close this.

Okay.

So uh you can actually how you can use

this uh platform is you can uh uh build

your own applica your own uh models host

it and also build some applications and

you can uh actually

uh demonstrate those applications on

this model. Community can see this they

can consume it. And when I talk about

models uh one moment uh okay so when I

talk about models there are plenty of

models can consume them. Okay now let me

go back to my presentation about

metalama.

Okay, Metalama is developed by Facebook

AI research.

Then it's an open LLM for sure. I have

already explained you what open LLM is

and can be definitely used for research

as well as commercial use. There are

variants like uh Llama 2, Lama 3 like

fine-tuned versions. There are different

versions of it like we have a GPD2,

GPD3. Same way like llama has also

evolved and uh mainly the use cases are

chatbots summarization coding all this

the text based uh stuff work great like

they're intended to do that

and uh

uh like um why inference API is

important for hugging face because uh if

you're using a hugging face so like uh

one way is you can definitely you can

use the transformers uh uh nug get

package or like uh you can pip install

the transformer python package and uh uh

via that you can download the model and

you can work it uh work on it locally

locally you can actually run those the

models they're lightweight and they will

run but again like uh uh suppose you

want to use uh it from the API so it's

easier one right nowadays it's all about

pass Okay, it's so like normally when we

go to the Azure OpenAI, we just grab the

URL, the tokens and we start using it.

So with that kind of uh flexibility so

like uh normally like uh I feel that hug

hugging face in inference APIs are a

great tool. I need not download it. I

can directly infer it. But uh to infer

that what you need to do is uh like uh

maybe for I'll just tell you for meta

meta lama.

Okay, maybe I can search one of the

models. You need to like um agree to the

license and provide some of the

informations out here. Okay, it's very

easy. You just go to the models. Okay,

you go to the modu models uh area,

search metal lama or any of the models

you use and you agree to the licenses,

provide the necessary informations and

submit it. And once you have submitted

it, so uh it they normally take half a

day to approve it. Once they have

approved it, you can start using their

model. But there is one more thing you

need to do.

There is something called as access

token. Okay. So I have for today's

application I have already created this

access token. You need to create a new

token out here. Copy the token keep it

with you and use it. Okay. And also the

permission what kind of permission you

need. Definitely I don't need a right

permission but uh I have while creating

it I have done that.

So

uh like uh and how to build a

intelligent apps uh using hugging like

uh hugging face is like uh you need to

choose the model first the variant of

llama. Okay. Then uh connect via API or

SDK like uh as I explained you like uh

either you can uh download the model and

run it locally or like uh do it through

inference in API then you can integrate

your logic and you can get get ahead.

But the primary feature is choose the

llama variant like uh um on the UI of

hugging face agree to the license

and once you have agreed to the license

wait for them to approve that is the

most crucial stage or else these models

won't run and then like uh go ahead and

uh uh go ahead and like um you

know create your key okay the key that

you'll be using it Okay. So with this uh

uh theoretical and and some of practical

demo like few of the demos on hugging

face now let's move to a bit complex

part and the most interesting part of

this session where like we'll build a

fly uh python flask application then uh

llama interf inferencing through hugging

face API we are going to take the API

route I'm not going to download any

model to run it locally on my laptop and

then we will deploy to Azure here. Okay.

So, with this

let me go ahead

there any queries out. Uh let me just

check.

Okay.

I have opened my V VS code and

definitely now what I'm going to do is

uh I I won't uh demonstrate anything

like a um usual developer like normally

like we used to develop all these days.

So but again we will work intelligently

with GitHub copilot. Trust me I'm not uh

like um I'm not lying. It's it's a fact

that this entire application I have

built using VIP coding intentionally to

demonstrate you and uh you can I I'll

show you the prompts that I've given to

build this application. You can see here

actually uh it's not um this enter

application has been vibe coded. Okay.

See I have uh like uh I used to give uh

okay let do a redeployment to Azure or

like uh I I I can't see the show you the

history but um this entire application

is vcoded okay and now also I'll run it

through v coding so but before I run it

so let me explain you a few things

this this file diet plan generated py is

the one that actually does the inference

Okay.

So if you see like uh

maybe I can ask uh uh let me do this way

like instead of me explaining you

everything line by line let uh the uh

GitHub copilot explain it.

Okay.

Now it will go through each of the files

and uh

it will tell me

and this entire application I have built

it in uh say a record uh 3 to 4 hours

that also I used to take breaks and all

like it's all through I'm not a Python

expert but still I was able to build

this uh Python

Uh, okay. Let it uh generate the

descriptions.

Okay.

See

Okay, now let me walk you through. I

could have put all these in slides but I

just wanted to demo you and show you

that the power of uh genai that we have.

So overall this is the architecture okay

of this entire application. User gives

an input to the flask app. Flask is

nothing but it's a package. There are

two packages that are used here. One is

flask and another is venv.

Okay. Venv is used for virtual

environment creation and flask app is

something that um helps in building

python applications you web applications

something that you uh like suppose uh uh

you want to build a website using python

then you can use flask app. You can also

use Django for that but flask is simpler

and easier like if you are a starter

you're starting on uh Python application

websites and all then flask app is the

place to start. Okay. So what it does is

it calls the diet plan generator. Okay.

That is nothing but this uh um diet plan

generator py. It calls this and this

diet plan generator calls the hugging

face API. Okay. And this hugging face

API it runs this uh llama model 3.23b

instruct and processes the uh response

and uh provides the formatted output.

That is the whole whole flow. Python

application calling the diet plan diet

plan generator. py and from the diet

plan generator. py your llama is uh

called uh llama model is called to hug

hugging face API. Okay, this is the

configuration uh for uh um like um uh

like where like the model is uh called

and you can find this in this um like

wait uh for where I've kept it uh

config.py Pi okay

you'll have all the prompt templates

and this uh model config out here so

this is the model config is written here

okay

then

there is some description about model

characteristics I about the llama that

I'm not going to get into this then we

are following a develop um API approach

uh but inference client is preferred for

llama when I said uh uh like um direct

rest API fallback. So like um uh this is

uh but again like this is preferred for

llama like uh um so there is a two APIs

that are uh uh exposed.

So one is like using the like um you

understand that we have an SDK way of

doing the things. If the SDKs don't SDK

internally they call the rest APIs. If

SDKs don't work we fall we do a fall

back to the rest API. So that is what it

is done here. Then the prompt

engineering uh uh like before I get into

this I'll just speak something about

prompt engineering. I forgot while when

I gave the introduction I had that in

mind. So prompt engineering is the core

to any generative AI application. Either

you can do a singleshot prompting where

you write your prompt say that maybe a

query like what is the weather in uh

Singapore today? What is the weather in

Hong Kong today? Or rather you can say

something like few short prompting where

you can give examples like um uh like

what is the uh uh like maybe something

like what is the weather today at

Singapore? Uh give me the response in

this format example. Then in the example

you can say that uh uh it was uh sunny

in day, it was uh um like uh it was

raining in afternoon and it was again uh

sunny towards evening and night was um

moderately chilled. So then the response

will come that way. In the same way we

have done a

uh prompting here. So and a prompt is

and also there can be a chain of prompts

where you can give uh prompts uh like

you ask something to the ji it gives you

a response based on that you again ask

then when you have a complex task you

break down your uh entire task into

multiple prompts and get it done okay so

here actually like if you see here so

this is the prompt I'm uh uh I'm uh I

have uh building so create a daily quiz

and diet plan for age weight BMI and

quizine

Okay. Then uh uh like um then some uh

formatting has happened so that llama

can explain.

Okay. Llama can understand post that

like um uh um like BMI based

personalizations like uh how we are

categorizing the uh like a person based

on the BMA value. Then there is a

message parsing algorithm where like uh

whatever response um uh like um where

actually we are converting the llama

prompts the prompts that are going to

give the chat messages into llama

tokens. Then there is a retry and

handling error mechanism.

Then we do some content filtering

activities like this is for the fallback

actually. But again like uh um these are

something if uh the model is down then

uh like uh uh like for the fall back all

these will run where the model will try

to give you a standard response instead

of giving an error uh like um it should

uh give some uh standard response. So

this is what it is uh written out here.

Now like uh now let me move to the this

is the entire algorithm flow like uh I'm

not getting deep into this. I'll give

you the GitHub link. You can actually

explore that. Okay. So like um uh but

you can see the entire explanation is uh

um generated by the AI. I've also

generated beforehand to save some time

to like um

uh wait one moment

go to that prompt.

Okay. So I have given already a prompt

where like can you explain me the flask

application. Okay definitely that flack

a flask application is calling this diet

plan generator app. Okay. So you can see

the what it says like it has also very

beautifully explained me the flask

application also. So like the

application structure the templates

these are the templates. Uh normally

like when you build any website we build

the templates like index result. So

where that we use to like uh embed the

data then the user profile is kept here.

Then uh diet plan generated py where the

actually the AI integration happens the

crux of everything and the configuration

settings are kept here. Okay. So then

here we are initializing the flax

application.

Then uh uh you see those forms are

created here. So like uh age, weight,

height nationality food habit

diseases, API token. So all these things

are like uh uh we put it here. You'll

see that when I run this, you will find

all these fields. Okay. Then we have

also done some form validations

and these are the routes that are

defined like uh uh I'll show you like um

I'll take you to the flask application.

[Music]

app.py.

Okay. So, you'll find all those. See,

this is what like uh this is the UI.

This is the UI that has been built with

all validations and all. Okay. And uh

this is the routes. Now, normally we

write if you have worked on web API, you

will understand this. So when we say

that ABC/ uh uh like even in angular

also like normally in the front end

applications also we have these routes

maybe abc/index then it will take me to

the index page. So we have uh defined

those routes you can see a route with

generate if it uh goes to the generate

page then it will generate a diet plan

for me. So all these routes are defined

in this app.py

and you can see the model also has

explained me all these things.

Okay. Then I have also generated the

deployment to Azure. Okay. So how the

Azure deployment works? I'll I'll take

you there.

So can you explain me the Azure web app

deployment process? So you can see it

has very beautifully summarized uh the

deployment steps. Okay. So it says that

we have two GitHub repos. one is where

that um um uh where the actually the

this uh our uh

um code is checked in. So I will share

this GitHub repo to you guys and another

is like uh that is used for hosting. If

you see this this sem.ure website.net,

net. This is nothing but your uh Azure

website uh or if you are aware of Azure

web app, it is the kudu uh the uh the

kudu URL basically. Okay. So this uh

kudu uh uh from the kudu URL you can

actually like when you're deploying a

python application you can actually use

this sem as your websites as uh there is

a g by default that comes with your

azure web app that can be actually used

to like there you can actually push your

uh packages to the kudu git of your

azure web app and there it will get

deployed okay that this is what uh uh it

has uh mentioned here so azure git is

the direct g deployment of kudu. So when

you put your packages there, your

application will start running. Then you

have the source code repos. This is a

source code repository that I'll be

sharing with you guys so that you can

take a look at it. Then uh these are the

required files it says. Then uh the uh

this is how your Azure will start up the

startup.py code for the startup.py.

Then uh

uh the code change is made and it is

committed to both the places. One is

your uh the source code repo in GitHub

and another is to it is also pushed to

the GitHub the git repo on the Azure web

app through the Kudu URL. So this is the

code for that. Okay. Then this actually

triggers the Azure deployment. Okay.

Then uh um then this is the build uh

like uh build aure build process that

gets invoked in Azure and these are the

build steps that are listed. Maybe you

can go through it. I'm just keeping them

with interest of time. Then uh

environment variables and everything

everything is documented here. So now

what it's already the I have already

pushed uh the packages to uh Azure web

app. So but again like I'll just show

you maybe like what I'll do is I'll let

me first uh run this application

locally. I will say run the

application.

Okay. Now you see

the GitHub copilot which has helped me

to generate all.

Okay. I'll just do allow.

Okay, the application has started.

[Music]

Let me just run it locally.

Okay. So, maybe I'll give my age. I'll

give I'll just poof my age. I don't want

to reveal it. I'll just say like say 43.

I'll just give my weight say 90. Then

height say 72.

Then I am an Indian.

I eat both wage and non-veg. I'll select

both or let me let me be vegetarian

today. Even if I love eating non-veg

obesity some of the criteras I I'm just

uh typing it. It's not my health

condition. Just for demo purpose, I'm

just typing it. And let me generate the

diet plan.

Okay. So this is my personalized uh diet

plan like my age is 43, 90, 72. So this

BMA calculations I'm marked as

overweight thanks to it. And then this

is the meal plan that it has generated.

Okay. Now let me do one thing.

Now let me just type in something like

43. My weight is 90. It is 72.

Then now let me say that I'm an Italian

or let me let let's have some fun. I'll

call myself Japanese. Okay. So and a

vegetarian. Let's see what it does. I'm

here actually I'm just uh testing the

llama model. what it generates. I'm also

curious to know.

Okay.

So, see it gave me some uh nice uh

Japanese foods. Uh like the model has

generated all these things. Okay. Now,

let me do one thing. I'll go back. I'll

just stop this application running here.

I will say like uh

can you

redeploy

I have already deployed that's why I'm

giving this prompt the

application to Azure

Anjini sir do I have another 5 minutes

more five or 10 minutes more to spend or

I need I have a hard stop at

Well, it was mean for half an hour but

anyway.

>> Okay. Sorry.

>> No issues. You can spend 5 minutes more.

Um.

>> Okay.

>> Great. It's so Yeah. Um, we also have

implemented metal lama

>> and uh there are lot of uh use that we

are doing it.

>> Okay.

>> Same platform.

>> Oh, cool.

>> Definitely. Yes.

>> Nice too. Nice that then we are in sync.

My session is in sync with you guys.

>> Yes. Yes.

>> I'll fasten I'm already done. I just

need to do this deployment then I'll

show the Azure part. Uh

so actually I just wanted to give you a

feel of how this uh

>> yeah since this is

>> the new age developers should work.

>> Yeah many people are from my team also

and uh so not all are from the same

domain. So this would have been great

session for them. Okay. Nice. Nice to

learn.

Okay.

Okay. Now I am committing

it.

So that uh now here actually when I do

this it will get pushed to SCM uh like

kudu the g which sits in the kudu of uh

as your uh web

while it's uh completing the deployment.

Uh just even I'll just show you a few

more things. All these documents that

you see read.md and all it's all um I

have not coded anything like it's all I

have generated it. But again when you're

generating the code one um piece of uh

advice

peer review them if like uh don't uh do

a w code blindly. peer review them. The

reason being many times like all these

are open models. If you're having a your

own instance of GitHub copilot then a

separate story you train your own model.

So it will pick up from your

organization code. But again like if

you're working in an organization you

have multiple customers with you. So

each of these customers uh again can get

into a clash like it the model will

generate customer A's code in customer

B's application. Okay. So just peer

review them and make sure that you are

not leaking any customer informations,

you're not violating any licensing

agreements and all when you're doing

this.

So that is one of the most important

features that you need to take care of.

Okay, I think the deployment is

complete.

So it's just summarizing the

conversation history.

Then I will open the website and show it

to you.

Okay, it's taking a while.

Okay,

thank you. Now it has already it has

opened the application as well

everything what it has done it has

summarized you can go ahead check all of

this and also review the code so what

I'll do is I'll say open the

application

running on

Azure just from here.

So, it will open it for you.

See,

I'll just open it uh outside so that uh

one moment where it opened.

Okay.

Okay. So now again like I've already

demoed you this application so you

should be able to do it. So with this uh

background yeah sorry

uh I have come to the end of my

presentation. So Apollo is uh like uh my

bad. I thought it's a 1 hour session. So

like I I

brought in so many contents in uh

>> yes really really nice the especially

the you know the way you got the code

checked into uh Azure and everything

through GitHub pilot was fantastic and I

think many people would love it.

>> Yep.

Open to questions if there are any. Now

>> yeah any question please you know raise

your hand or just put in the chat

and u there is a session u um uh um team

can you send the link for certification

>> and uh also this session is recorded

right so we will all get the link for

the session

>> yes and there's a link certification

also uh where you can take the quiz

quick quiz and then there are some

goodies um uh which you know uh we don't

see a link here actually

um

it doesn't have a link tax sheet that

you have sent

>> okay

>> subscription

yeah there is a question like on

subscription uh so you can create a free

subscription but again like if you want

to use it in a professional community or

or rather you want to use it for

professional development purpose

definitely you can refer to this pricing

model

this is a community version I'm running

at so you can see the pricing out here

but I'm running it uh from a community

perspective it's not uh whatever

application I have hosted here I have

done it uh just for this session And

I'll be bringing down the web app. I'm

not uh monetizing this uh application.

Yeah. And the second thing is you can

also download um the way said you can

download llama to your you know uh local

server if it has that capacity and then

you can you know uh set up your APIs to

you know get it connected to anywhere

you love. Maybe we'll have to write

couple of layer of wrappers on on

different technologies whatever the

technology your team prefers maybe

Golang.net

NET, Java, whatever uh to utilize it um

in a in a real world scenario. But yeah,

hugging face is great if you just need

to you know test and start with it

a very low cost.

>> Maybe I will propose one more session

with Anjeniser for this uh how to run

the transformers and like the

transformer package and get to run those

uh packages the same application

locally. Yeah, definitely much needed.

Okay, so um let's get back to the team

here. Um

uh thank you so much Dr. Ravishek

Mistra. A wonderful session and big

thanks to my audience who have connected

with us. I hope this session is a very

productive and informative to all of

you. Keep learning, keep growing and

stay connected with fifth year community

upcoming event. Thank you.

Building Production-Ready AI Apps with Llama, Hugging Face, and Python #events

MultiCloud4U

10 days ago

1:04:42

Python & FastAPI

Rank #3

Description

🚀 Learn how to build production-ready AI applications using Llama models, Hugging Face Transformers, and Python! In this step-by-step tutorial, we’ll walk through everything you need to know — from model selection and fine-tuning to deployment and scaling for real-world use. Whether you’re an AI enthusiast, data scientist, or developer, this video will help you bridge the gap between experimentation and production. 💡 In this video, you’ll learn: ✅ How to use Llama models for NLP tasks ✅ Integrating Hugging Face Transformers with Python ✅ Model fine-tuning and optimization techniques ✅ Deploying AI models in production environments ✅ Best practices for scaling and monitoring AI applications 🔧 Tech Stack: Python, Llama, Hugging Face, FastAPI, Docker, and more 🎥 Perfect for: AI Developers, Machine Learning Engineers, Data Scientists, and Tech Enthusiasts 📅 Don’t forget to LIKE, SHARE, and SUBSCRIBE for more AI & ML tutorials every week! #AIApps #Llama #HuggingFace #PythonAI #MachineLearning #ArtificialIntelligence

Video Details

Category

Python & FastAPI

Featured Date

November 7, 2025

Quality Rank

#3

AI Recommended