Loading video player...
[Music]
Welcome to the fifth community hosted by
Multicloud for you. Hello everyone, a
very warm welcome to all of you. I'm
Sushi Mistra, your host for today's
event with my co-host Lavik. So today's
session is all about building production
ready AI apps and with llama hugging
face and python presented by Dr.
Abhishek Mistra. So before begin let's
take a moment who we are and where we
standing for.
>> First uh we have to understand what we
do who are we. So talking about fifth.
So fifth uh is a stand for fifth
industrial revolution and fifth hour is
a single stand learning community and a
platform where you can learn grow and
polish your skill. So as you can see uh
we started 5 years back in 2019. Then uh
we grow and recognized by big names.
Then we expand our horizon and in 2022
uh we convert a single standal learning
uh community to our platform and uh uh
then we uh create events give mentorship
and so many things. Talking about the
fifth hour features so uh for users the
platform help people create different
type of profiles to market them better.
Uh you can create your business card and
uh talking about business card you can
create your uh uh digital visiting card.
uh you can go to the fifth.com and
create your digital v card and uh we are
very soon going to launch NFC card also.
So you can check out on fifth.com and uh
yeah back to slide uh you can create
your CV, company page, business page, uh
mentor profile and uh talking about
professional you can enhance your
business and product also and for the
organization you can create a community
you can give a mentorship uh create uh
blog give quizzes confess event job
conference job post speaker profile quiz
and so many things.
And talking about multiloud for you. So
multiloud for you is a a community
focused tech firm building data platform
and custom products with consulting and
engineering. We work on generative AI
blockchain data products data platform
uh cloud and uh talking about domains.
So we also worked in retail uh banking
and finance, energy, healthcare, IT and
we have so many clients supported across
the globe uh such that uh America, Latin
America, Europe, Middle East, India,
Oenia. So we are not talking here are
numbers justified our growth. Uh so we
have 31,555
members on register on multiloud for you
and we distribute 8,880 certificates
across the globe. We have 413 speakers,
23 courses, 275 sessions
and uh so we also run a reward
certification program and uh we give a
certificate to our top speakers
and we are also a partnership with
Alibaba cloud academy. So u you can go
to the fifth.com and uh enroll in any
courses you are interested in and you
will get this Alibaba cloud certificate.
And so by the end of this uh uh session
you will uh get a link in your team
chatbot. So by clicking on that link you
have to give the quiz and uh after
completing the quiz you will get the
certificate from us.
Talking about a speaker Dr. Bishek
Mishra. Dr. Vishek Mishra is a seasoned
cloud architect with over 20 years of
industry experience in driving digital
transformation through cloud and AI and
early adopter of Microsoft Azour since
uh 2013. He has led enterprise grade
cloud initiative worldwide and uh Dr.
Mishra is also a tof 9.2 Microsoft
certified Azour solution architect
expert and an AI engineer. So it's time
to call his speaker Dr. Bishik Mishra.
Dr. Dr. Mr. Misha over to you.
>> Thank you fifth air for this wonderful
opportunity again to speak at this uh
wonderful forum. I love speaking at uh
fifth air.
Uh so today I bring you a very exciting
topic like u a very much of a futuristic
topic on how to build intelligent apps
using metal lama via hugging face API
inference. So like um uh when I talk
about hugging face, it's a open-source
uh um um hub where you can find lot of
uh models around there. You can download
those models even locally and you can uh
work on those models.
Um and the best part is like uh uh you
can also work uh using API inference.
you need not uh if you don't want to run
uh or or download the models locally and
work on it absolutely fine. So you can
also use the AP inference way like we do
it for uh chat GPT or open AI open AP AI
or Azure OpenAI or any of the popular uh
uh patch services
and I would say like this is the best
way to like um um experiment and try out
the models before you adopt those models
to any project. So normally when you
think of building a jenna application or
you are a gen architect the very first
thing you do is like you select the
right model that will fit to your use
case and this is the playground where
you can try these models and uh have a
flavor of it and then you can actually
try like uh it's there's more lot to it
maybe when I get dive into the session
like uh I progress to the session I will
be able to show you okay so anyways like
thanks for the wonderful introduction so
I will just skip this uh about me and I
will directly focus on the topics that I
have in my agenda.
So like uh today we will start with uh
the basics of AI MLDDL. Uh what I mean
by that actually sorry for the acronyms
like it was I I would have definitely
loved to expanded them but uh uh like
somehow I ended up writing those
acronyms. AI is nothing but artificial
intelligence. ML is machine learning and
uh deal is deep learning. Okay. So why
I'm not directly jumping into the topic
is because I first I want to build some
fundamentals.
Okay. So there can be audience uh who
are very new to the AI world or the
generative AI world and uh it's
important for them to understand the
evolution before I jump into hugging
phase.
Okay. So let's start with the very first
thing uh that is artificial
intelligence. But before I like define
AI, can someone from the group tell me
like uh uh when was the first AI
application built or or when did AI
started?
Any guesses?
Okay, let me tell. So it it all started
in um 1950s. Okay. So in 1950s the AI
started and it was it's a very wide uh
domain actually when you talk about AI.
Okay. So in 1950s so we had some rules.
So based on the rules like uh we used to
like build our AI applications. Okay. So
this is your AI. So we at that point of
time in 1950 so we had lot of rule-based
uh applications like uh if this happens
then this or that and slowly AI
graduated into like something called as
uh subset called as machine learning
okay it's a subset of AI where like we
brought in concepts like probabilities
statistics
okay the models started use like we
started talking about models who could
work on uh statistics and uh uh all
these things like probabilities and um
then came up uh those concepts of like
supervised learnings, unsupervised
learnings then uh like uh uh so where
like we normally label the data and uh
uh we know the features definitely like
when I say features like suppose uh um
any artificial intelligence uh program
or or or any ML based programs they work
on a feature Feature identifications. We
need to identify the feature and tell
like uh uh maybe like um the age uh like
suppose you are doing a survey of uh uh
like the age of like um maybe the income
spent across various age categories. So
you need to identify people on age then
uh um uh gender and all. So these are
nothing but these are the features.
Okay. And then you label the data. Okay.
do some data labeling and then like on
the labelled data like your supervised
learning will run. Then we had the
unsupervised um uh um uh learnings were
uh happening where like uh part of them
is labeled and part of them is not
labeled. Then you have uh uh uh sorry
it's a semi-supervised learning where
you have the part of the uh data set is
labeled and part of it is not labeled
and then like you have your
reinforcement learning like uh where
like uh you have um uh like you you
penalize uh your uh model if they do the
right thing and if they don't predict
the right thing then you you penalize it
if you they do not do the right thing
and you don't penalize it uh then uh um
when uh like uh they're doing the right
thing, you reward them. Okay, so this
kind of models came in. You also had the
unsupervised learning like uh that
didn't have any labels but feature was
the key. So you need to identify the
feature. Then uh this was the era of
your um uh machine learning that was
based on statistics and where like you
had the supervised learnings uh
unsupervised and uh reinforcement
learnings. Then came the era like of
deep learning where what you did was uh
uh like uh uh these um applications
started thinking just like uh humans
human brains. So they mimic the human
brains. So what is a human brain? Human
brain is nothing but it has a lot of uh
nodes called as neurons. It's it's a
combination of lot of neurons and those
neurons like they talk to each other and
take a decision in the same way here. So
how these uh models deep learning models
work is they have a bunch of neurons.
Okay. So maybe or nodes. Okay. So these
are the nodes for example. So
I'll just put some nodes.
Okay.
Okay. So these nodes will communicate
with each other.
Okay. Then they will also communicate
among themselves.
Okay.
Then uh uh so these all these nodes I'm
just showing here some five to six but
there are millions of nodes which
communicate uh among each other and take
a decision and all uh and uh each like
there is something called as weights
associated uh uh to the learning
parameters like for example uh um uh uh
as a a nodes uh is uh processing uh some
kind of information that information is
multiplied by that weight and that
multiplication results whether the next
node should um like uh activate or not.
If it finds that uh this node should be
activated and the weight is unable to
activate it then it will go back it will
propagate do a backward propagation and
it will adjust the weight and make sure
that uh um the weight is adjusted so
that this next node is propagated. So
this is how the things move back and
forth and uh like uh nodes u get
activated and uh uh we get the results.
So it has a input layer and output layer
and like a lot of hidden layers in
between. So these all are the hidden
layers of the neurons.
This is how like the neural network
works. So
uh that was all fine with neural
networks like we had our uh uh uh
something called as uh recurrent uh
neural network that processes the
information uh sequentially. That means
when I say that um I have a dog at my
house. So it will first process I then
have then uh a then dog then at my
house. So each of these information will
flow sequentially among the nodes. They
will be processed one by one. So that
adds to a delay and also brings in a
problem like where like the model needs
to remember the context of a of longer
strings and or what has happened long
before. So to overcome all these
complexities there is something called
as a transformer that came into
existence and what those transformers uh
did was so those transformers are uh uh
clever enough or they use something
called a self attention where they like
are able to process uh like each of
these letters or or or words in in
parallel. So when I say I am a boy or
like uh I have a dog at my home. So the
entire sentence each word of the
sentence is processed uh in parallel
instead of processing it sequentially
using this uh uh RNN or uh uh uh this
recurrent neural network. So that is how
the transformers uh came out and
transformers are ones that you see in
the chat GPS or like geiny burk the
burke models with the google geminy that
uh helps you in like text uh
classifications and all those rather the
text generations and all. So this text
generation mechanisms they came from
these transformers. Okay. So uh uh now
like with uh the transformers we started
like uh generating tons and tons of text
and it was all working good. Then it
evolved to something into a new area
like uh where like we need to generate
videos and uh um and uh images. So then
uh like uh the researchers uh then they
came back with um a different kinds of
models like uh maybe you must have heard
about uh deep fake the the the deep fake
applications
like they are based on uh G something
called as G model or like uh um
generative adversial uh network models
and what they do is they have two sets
of neural networks like similar to is
they have two sets of neural networks.
One of the neural networks will be like
uh creating or generating the images.
The second neural network will be saying
that whatever the first neural network
is generating is fake. Then the first
neural network will take the feedback
and will try to improve on it and again
the second network will try to prove it
as fake. So as a result like um with the
competition between both the
neural networks you finally end up with
having a very good uh image quality and
uh it's able to fake a image really
well.
So that is how the generative uh the the
uh the gen generative adversial uh
network works and also like there is one
more kind of a image generation models
called as diffusion
diffusion based models. What it does is
if you give uh these models an image it
will keep adding noise to it and once
lot of noise is added like uh the image
gets distorted. Now from this distorted
uh image uh like uh the generative AI
model will like will be very innovative
in removing the noise and transforming
the images to lot of possibilities. It's
just like you I give you a block of
marble and I'll ask you to carve it out
from there. So you will use your
creativity to like uh shape that marble
into uh the right uh or or uh the the
right uh statue. Okay. So you get the
the raw block and from the raw block you
craft the things. In the same way from
the image we add noise and uh make it as
a raw block so that uh imagination can
be applied and the images can be like uh
created based on the algorithms.
So this is how like uh generative AI has
uh come into existence today and this is
the whole journey of uh how the things
have worked. So with this background now
let me take you back to my slide.
I know I was bit uh fast
but don't worry if you have also not got
uh the concepts what I wanted to
communicate it's absolutely fine because
uh I will be uh like uh it's not needed
to follow what I'm going to uh like demo
today. But again knowing these uh
models, knowing this evolution is
important because as an architect when
uh you are going to design your uh
application or generative AI
application, you need to know like which
model to choose uh what is happening in
the background. So all these
informations will help you. Okay. Okay.
So let's get started with this uh
prerequisites that I just uh briefed
you. So artificial intelligence is
nothing but it's a field or like where
like uh machines start thinking like
humans
and there are various applications need
not say like all of us are aware like we
have natural language processing visions
robotics recommendations engines lot
more and the very key idea is like
enabling uh machines to mimic uh human
intelligence
okay so I just spoke about the machine
learning like it's a subset of AI like
uh where the use that uses a probability
and statistics
okay that uses probability and
statistics to do supervised unsupervised
and reinforcement learning. So
reinforcement learning is nothing but it
works on a penalty system like if the
model doesn't predict something right it
it is penalized if it is it predicts
okay then uh it is rewarded then again
if you don't uh label the data then it
is unsupervised we also have a
semi-supervised then part of the data is
leveled and part is not leveled and a
few of the example is spam detection and
recommendation systems
hey just one minute I'll just verify
thing
uh I okay sorry apologies I had turned
off the video feed I have turned it on
okay so then we then I have already
explained you what uh deep learning is
so with uh deep learning what we do is
uh uh we try to like um think like a
human brain uh brain is nothing but a
bunch of neurons so we in the same way
we build nodes uh for our uh like for
our models in our models so that like
each node is a neuron and can think like
a neuron. We have millions of those uh
neurons mim mim making the human brain
and uh like uh this is the the point
where generative AI started uh evol like
evolving and one of the most important
feature to highlight is feature
extraction. So in uh semi-supervised or
or or rather like in machine learning uh
you provide the feature but here like
it's intelligent enough to unlock the
feature
for you.
Okay.
So next uh I have already spoken about
neural network. I uh like uh how it
works like uh it has uh three layers
input uh hidden and output layers. Like
basically each of these layers are
nothing but these bunch of neurons or
the nodes that we see here and uh the
data flows uh to and fro like for
example like I told you like each of
these uh um inputs are like multiplied
by weights and the output is generated.
If the output is uh um like sufficient
then like uh the activ the fun there is
something called as activation function
for which the other neuron the next
neuron in the queue that gets activated.
If it is not and if it should have been
activated then uh a feedback goes back
to the previous input and then weight is
adjusted and again like um uh the data
flows into the next uh node and this is
how the data goes to and forth making
sure that um
the neurons uh keep activating and your
uh uh network your in your input gets uh
processed.
Then I spoke about already I have uh
demonstrated you or or rather I spoke
about the uh recurrent neural network
that is uh that handles the information
uh sequentially.
Okay. So like uh
the very problem I told you is like uh
keeping the context because if the
sentence is long or in paragraphs it
struggles to or it needs lot of memory
to keep the context.
So that is why where we came up with
these large language models trained on
uh transformers.
So
uh these transformers are like uh uh
nothing but they are a set of
is it they just not uh uh uh understand
the natural language they do much beyond
that. So what happens exactly is like uh
you must if you have worked on ML
applications uh uh um back in 2016 or
2017 that is a time when I started on AI
and ML like on Azure machine learning
studio. So there we used to for NLP we
we had a concept of uh um uh like uh um
like uh um we we used to actually like
for example let's say I am a boy then
like uh we have a unig I that's a single
word okay we had bgrams like I am like
two words it's a bgram or um we used to
have a multigram like or engram
basically they're called as engrams
where it is a uh it's a combination of a
lot of uh words and using those uh like
uh grams a prediction is to happen like
for example I say that I am a so based
on that I if I'm using a engram concept
and I am a is one engram then after I am
a the model will predict that I am a boy
like for example um you are using a
unigram like uh I love eating so eating
so eating can be based on that word
eating it is considered as a unagram and
for that eating lot of uh words will get
predicted. So that is how your NLP used
to work in those days. If you worked on
NLTK and all natural language toolkit
and all this is how it used to work but
in LLMC it they don't work that way. So
they use transformers. Let me say if I
have a oh yeah I have a transformer. So
they have a encoder decoder mechanism.
they use an something called as an
attention self attention mechanism where
like when I say I am a boy then I am a
boy all these uh words are processed in
parallel and like they're positionally
encoded that means uh they will uh uh uh
they will uh like it's it's not just
about uh um uh like uh predicting it
does a semantic uh like um um like it it
understands the context Basically it
does a semantic search definitely like
it understands the context okay by doing
a positional encoding based on like say
I say that I am a boy then a is kept in
a is assigned a position in the sentence
boy is uh defined a position in the
sentence uh like uh for example someone
says like or or it's like boy is
relatively near to a and far from i. So
based on those information it does the
prediction and it this makes sure that
uh um uh the context is preserved the
real context of the sentence like for
example there is a sarcasm like uh
someone uh post that I had a great
flight but I lost my baggage.
Thanks for the great flight I lost my
baggage. If you give this to flight, a
natural language processing will not
work here. NLP normal NLP will say that
the sentiment is bad. But if you're
using a transformer, so it has the
capability of uh understanding it the
right way that uh um the it's not a
positive uh sentence. It's a negative
one and uh it should not be that way. So
that is only that can only happen
because of uh this context handling
features using the positional encoding.
Okay. So, lots of theory. Now, let's get
to hugging face. I will take you to
hugging face and I will show you some uh
few of the things here.
Uh, okay. So, before I get into the
hugging face, so let me uh tell you what
hugging face is. Hugging face is a
open-source hub. Okay. So you'll find a
lot of data sets, you'll find models,
open-source models or they call it openw
weight models when it's open weighted
models. So you can download these
models. I spoke about weights right in
neural networks. So when the data flows
through the neural networks, it gets
amplified by those weights. If the next
activation function doesn't uh activate
the next node, then this weight is
adjusted adjusted through a feedback
mechanism. Okay. So uh you can download
these uh models and you can actually
adjust those weights. Okay, the weights
are exposed to you. That's why they are
called as open weighted models
and uh uh like uh you can uh uh you can
also build your application and uh put
it there. Okay. And uh you can uh uh use
those uh urls for inferring uh those uh
models. And it's a communitydriven
driven ecosystem. I'll just show you to
you. Okay. So before I get into the
metal lama thing, let me take you to the
hugging face interface. Okay.
This is your hugging face. Okay,
I have already created a user ID. You
can create a user ID and log login to
the hugging face. And here you can see
various models listed here. Okay.
So you can use these uh models uh uh you
can either you can download it these
models and run it locally. There is a
package called as transformers. So you
can use those transformers uh python
package. Bring down these models to your
laptop. These are lightweight models and
you can run them on your laptop. Okay.
So if not uh you can also like do it the
way we work on our uh with our Azure
OpenAI or any OpenAI applications by
using those u uh URLs and the keys. I'll
show you like how to generate the keys
to work on them. Okay. And also it's a
communitydriven one. You can write your
own blogs. You can post in blogs on
hugging face. you can interact with
other fellow researchers or like uh
developers who are uploading uh these uh
models. Okay. And you can uh if you need
a data set like you want to have a data
set to build your Python model or or
sorry your um generative AI model your
own generative AI model or you want to
use those uh data set for something you
can download it from here. There's a lot
of uh communitydriven data sets that you
can find and they're really useful if
you're if you're planning to build your
own model or like you want to use it
somewhere.
Then the best part is spaces where like
uh these uh the developers they build
their own applications and they have
hosted it here. You can come and like
suppose you want to use like u uh use
their model you can come here you can
check that model and you can use it like
for example uh I will what I'll do is
I will just uh go to this quen image
I'll just go to image generation
okay I'll just uh take this uh quen
image no not control net quen image
Yeah, it's a simple one.
Okay,
I have already logged in to do the
inference.
So maybe I can say something like uh
generate
a
a happy cat for me.
for me
drinking
Coca-Cola
under the sun.
Okay, then let me submit it.
So, it takes a while to process it.
just hold on like it takes a while.
Meanwhile, I'll I'll show you one more
while it's happening. Even like there is
something called as a coin image
application that someone has built. So
maybe like uh I like this animal cap
capy.
So let's try this.
Okay, I think
there is some issue. Okay, let's see.
Maybe I'm trying to do two things
simultaneously.
Got that error. But that's fine. Let it
generate.
Oh,
okay. So it generated this uh capy bara
for me and uh I know that I see that
it's a good image uh generator. So what
I can do is I I can actually go to uh
download the model and I can use the
model.
Okay. So that is how I can actually
guide with this. So this one I think
it's still generating. It will take some
time because normally these models they
take time. So I'll just uh close this.
Okay.
So uh you can actually how you can use
this uh platform is you can uh uh build
your own applica your own uh models host
it and also build some applications and
you can uh actually
uh demonstrate those applications on
this model. Community can see this they
can consume it. And when I talk about
models uh one moment uh okay so when I
talk about models there are plenty of
models can consume them. Okay now let me
go back to my presentation about
metalama.
Okay, Metalama is developed by Facebook
AI research.
Then it's an open LLM for sure. I have
already explained you what open LLM is
and can be definitely used for research
as well as commercial use. There are
variants like uh Llama 2, Lama 3 like
fine-tuned versions. There are different
versions of it like we have a GPD2,
GPD3. Same way like llama has also
evolved and uh mainly the use cases are
chatbots summarization coding all this
the text based uh stuff work great like
they're intended to do that
and uh
uh like um why inference API is
important for hugging face because uh if
you're using a hugging face so like uh
one way is you can definitely you can
use the transformers uh uh nug get
package or like uh you can pip install
the transformer python package and uh uh
via that you can download the model and
you can work it uh work on it locally
locally you can actually run those the
models they're lightweight and they will
run but again like uh uh suppose you
want to use uh it from the API so it's
easier one right nowadays it's all about
pass Okay, it's so like normally when we
go to the Azure OpenAI, we just grab the
URL, the tokens and we start using it.
So with that kind of uh flexibility so
like uh normally like uh I feel that hug
hugging face in inference APIs are a
great tool. I need not download it. I
can directly infer it. But uh to infer
that what you need to do is uh like uh
maybe for I'll just tell you for meta
meta lama.
Okay, maybe I can search one of the
models. You need to like um agree to the
license and provide some of the
informations out here. Okay, it's very
easy. You just go to the models. Okay,
you go to the modu models uh area,
search metal lama or any of the models
you use and you agree to the licenses,
provide the necessary informations and
submit it. And once you have submitted
it, so uh it they normally take half a
day to approve it. Once they have
approved it, you can start using their
model. But there is one more thing you
need to do.
There is something called as access
token. Okay. So I have for today's
application I have already created this
access token. You need to create a new
token out here. Copy the token keep it
with you and use it. Okay. And also the
permission what kind of permission you
need. Definitely I don't need a right
permission but uh I have while creating
it I have done that.
So
uh like uh and how to build a
intelligent apps uh using hugging like
uh hugging face is like uh you need to
choose the model first the variant of
llama. Okay. Then uh connect via API or
SDK like uh as I explained you like uh
either you can uh download the model and
run it locally or like uh do it through
inference in API then you can integrate
your logic and you can get get ahead.
But the primary feature is choose the
llama variant like uh um on the UI of
hugging face agree to the license
and once you have agreed to the license
wait for them to approve that is the
most crucial stage or else these models
won't run and then like uh go ahead and
uh uh go ahead and like um you
know create your key okay the key that
you'll be using it Okay. So with this uh
uh theoretical and and some of practical
demo like few of the demos on hugging
face now let's move to a bit complex
part and the most interesting part of
this session where like we'll build a
fly uh python flask application then uh
llama interf inferencing through hugging
face API we are going to take the API
route I'm not going to download any
model to run it locally on my laptop and
then we will deploy to Azure here. Okay.
So, with this
let me go ahead
there any queries out. Uh let me just
check.
Okay.
I have opened my V VS code and
definitely now what I'm going to do is
uh I I won't uh demonstrate anything
like a um usual developer like normally
like we used to develop all these days.
So but again we will work intelligently
with GitHub copilot. Trust me I'm not uh
like um I'm not lying. It's it's a fact
that this entire application I have
built using VIP coding intentionally to
demonstrate you and uh you can I I'll
show you the prompts that I've given to
build this application. You can see here
actually uh it's not um this enter
application has been vibe coded. Okay.
See I have uh like uh I used to give uh
okay let do a redeployment to Azure or
like uh I I I can't see the show you the
history but um this entire application
is vcoded okay and now also I'll run it
through v coding so but before I run it
so let me explain you a few things
this this file diet plan generated py is
the one that actually does the inference
Okay.
So if you see like uh
maybe I can ask uh uh let me do this way
like instead of me explaining you
everything line by line let uh the uh
GitHub copilot explain it.
Okay.
Now it will go through each of the files
and uh
it will tell me
and this entire application I have built
it in uh say a record uh 3 to 4 hours
that also I used to take breaks and all
like it's all through I'm not a Python
expert but still I was able to build
this uh Python
Uh, okay. Let it uh generate the
descriptions.
Okay.
See
Okay, now let me walk you through. I
could have put all these in slides but I
just wanted to demo you and show you
that the power of uh genai that we have.
So overall this is the architecture okay
of this entire application. User gives
an input to the flask app. Flask is
nothing but it's a package. There are
two packages that are used here. One is
flask and another is venv.
Okay. Venv is used for virtual
environment creation and flask app is
something that um helps in building
python applications you web applications
something that you uh like suppose uh uh
you want to build a website using python
then you can use flask app. You can also
use Django for that but flask is simpler
and easier like if you are a starter
you're starting on uh Python application
websites and all then flask app is the
place to start. Okay. So what it does is
it calls the diet plan generator. Okay.
That is nothing but this uh um diet plan
generator py. It calls this and this
diet plan generator calls the hugging
face API. Okay. And this hugging face
API it runs this uh llama model 3.23b
instruct and processes the uh response
and uh provides the formatted output.
That is the whole whole flow. Python
application calling the diet plan diet
plan generator. py and from the diet
plan generator. py your llama is uh
called uh llama model is called to hug
hugging face API. Okay, this is the
configuration uh for uh um like um uh
like where like the model is uh called
and you can find this in this um like
wait uh for where I've kept it uh
config.py Pi okay
you'll have all the prompt templates
and this uh model config out here so
this is the model config is written here
okay
then
there is some description about model
characteristics I about the llama that
I'm not going to get into this then we
are following a develop um API approach
uh but inference client is preferred for
llama when I said uh uh like um direct
rest API fallback. So like um uh this is
uh but again like this is preferred for
llama like uh um so there is a two APIs
that are uh uh exposed.
So one is like using the like um you
understand that we have an SDK way of
doing the things. If the SDKs don't SDK
internally they call the rest APIs. If
SDKs don't work we fall we do a fall
back to the rest API. So that is what it
is done here. Then the prompt
engineering uh uh like before I get into
this I'll just speak something about
prompt engineering. I forgot while when
I gave the introduction I had that in
mind. So prompt engineering is the core
to any generative AI application. Either
you can do a singleshot prompting where
you write your prompt say that maybe a
query like what is the weather in uh
Singapore today? What is the weather in
Hong Kong today? Or rather you can say
something like few short prompting where
you can give examples like um uh like
what is the uh uh like maybe something
like what is the weather today at
Singapore? Uh give me the response in
this format example. Then in the example
you can say that uh uh it was uh sunny
in day, it was uh um like uh it was
raining in afternoon and it was again uh
sunny towards evening and night was um
moderately chilled. So then the response
will come that way. In the same way we
have done a
uh prompting here. So and a prompt is
and also there can be a chain of prompts
where you can give uh prompts uh like
you ask something to the ji it gives you
a response based on that you again ask
then when you have a complex task you
break down your uh entire task into
multiple prompts and get it done okay so
here actually like if you see here so
this is the prompt I'm uh uh I'm uh I
have uh building so create a daily quiz
and diet plan for age weight BMI and
quizine
Okay. Then uh uh like um then some uh
formatting has happened so that llama
can explain.
Okay. Llama can understand post that
like um uh um like BMI based
personalizations like uh how we are
categorizing the uh like a person based
on the BMA value. Then there is a
message parsing algorithm where like uh
whatever response um uh like um where
actually we are converting the llama
prompts the prompts that are going to
give the chat messages into llama
tokens. Then there is a retry and
handling error mechanism.
Then we do some content filtering
activities like this is for the fallback
actually. But again like uh um these are
something if uh the model is down then
uh like uh uh like for the fall back all
these will run where the model will try
to give you a standard response instead
of giving an error uh like um it should
uh give some uh standard response. So
this is what it is uh written out here.
Now like uh now let me move to the this
is the entire algorithm flow like uh I'm
not getting deep into this. I'll give
you the GitHub link. You can actually
explore that. Okay. So like um uh but
you can see the entire explanation is uh
um generated by the AI. I've also
generated beforehand to save some time
to like um
uh wait one moment
go to that prompt.
Okay. So I have given already a prompt
where like can you explain me the flask
application. Okay definitely that flack
a flask application is calling this diet
plan generator app. Okay. So you can see
the what it says like it has also very
beautifully explained me the flask
application also. So like the
application structure the templates
these are the templates. Uh normally
like when you build any website we build
the templates like index result. So
where that we use to like uh embed the
data then the user profile is kept here.
Then uh diet plan generated py where the
actually the AI integration happens the
crux of everything and the configuration
settings are kept here. Okay. So then
here we are initializing the flax
application.
Then uh uh you see those forms are
created here. So like uh age, weight,
height nationality food habit
diseases, API token. So all these things
are like uh uh we put it here. You'll
see that when I run this, you will find
all these fields. Okay. Then we have
also done some form validations
and these are the routes that are
defined like uh uh I'll show you like um
I'll take you to the flask application.
[Music]
app.py.
Okay. So, you'll find all those. See,
this is what like uh this is the UI.
This is the UI that has been built with
all validations and all. Okay. And uh
this is the routes. Now, normally we
write if you have worked on web API, you
will understand this. So when we say
that ABC/ uh uh like even in angular
also like normally in the front end
applications also we have these routes
maybe abc/index then it will take me to
the index page. So we have uh defined
those routes you can see a route with
generate if it uh goes to the generate
page then it will generate a diet plan
for me. So all these routes are defined
in this app.py
and you can see the model also has
explained me all these things.
Okay. Then I have also generated the
deployment to Azure. Okay. So how the
Azure deployment works? I'll I'll take
you there.
So can you explain me the Azure web app
deployment process? So you can see it
has very beautifully summarized uh the
deployment steps. Okay. So it says that
we have two GitHub repos. one is where
that um um uh where the actually the
this uh our uh
um code is checked in. So I will share
this GitHub repo to you guys and another
is like uh that is used for hosting. If
you see this this sem.ure website.net,
net. This is nothing but your uh Azure
website uh or if you are aware of Azure
web app, it is the kudu uh the uh the
kudu URL basically. Okay. So this uh
kudu uh uh from the kudu URL you can
actually like when you're deploying a
python application you can actually use
this sem as your websites as uh there is
a g by default that comes with your
azure web app that can be actually used
to like there you can actually push your
uh packages to the kudu git of your
azure web app and there it will get
deployed okay that this is what uh uh it
has uh mentioned here so azure git is
the direct g deployment of kudu. So when
you put your packages there, your
application will start running. Then you
have the source code repos. This is a
source code repository that I'll be
sharing with you guys so that you can
take a look at it. Then uh these are the
required files it says. Then uh the uh
this is how your Azure will start up the
startup.py code for the startup.py.
Then uh
uh the code change is made and it is
committed to both the places. One is
your uh the source code repo in GitHub
and another is to it is also pushed to
the GitHub the git repo on the Azure web
app through the Kudu URL. So this is the
code for that. Okay. Then this actually
triggers the Azure deployment. Okay.
Then uh um then this is the build uh
like uh build aure build process that
gets invoked in Azure and these are the
build steps that are listed. Maybe you
can go through it. I'm just keeping them
with interest of time. Then uh
environment variables and everything
everything is documented here. So now
what it's already the I have already
pushed uh the packages to uh Azure web
app. So but again like I'll just show
you maybe like what I'll do is I'll let
me first uh run this application
locally. I will say run the
application.
Okay. Now you see
the GitHub copilot which has helped me
to generate all.
Okay. I'll just do allow.
Okay, the application has started.
[Music]
Let me just run it locally.
Okay. So, maybe I'll give my age. I'll
give I'll just poof my age. I don't want
to reveal it. I'll just say like say 43.
I'll just give my weight say 90. Then
height say 72.
Then I am an Indian.
I eat both wage and non-veg. I'll select
both or let me let me be vegetarian
today. Even if I love eating non-veg
obesity some of the criteras I I'm just
uh typing it. It's not my health
condition. Just for demo purpose, I'm
just typing it. And let me generate the
diet plan.
Okay. So this is my personalized uh diet
plan like my age is 43, 90, 72. So this
BMA calculations I'm marked as
overweight thanks to it. And then this
is the meal plan that it has generated.
Okay. Now let me do one thing.
Now let me just type in something like
43. My weight is 90. It is 72.
Then now let me say that I'm an Italian
or let me let let's have some fun. I'll
call myself Japanese. Okay. So and a
vegetarian. Let's see what it does. I'm
here actually I'm just uh testing the
llama model. what it generates. I'm also
curious to know.
Okay.
So, see it gave me some uh nice uh
Japanese foods. Uh like the model has
generated all these things. Okay. Now,
let me do one thing. I'll go back. I'll
just stop this application running here.
I will say like uh
can you
redeploy
I have already deployed that's why I'm
giving this prompt the
application to Azure
Anjini sir do I have another 5 minutes
more five or 10 minutes more to spend or
I need I have a hard stop at
Well, it was mean for half an hour but
anyway.
>> Okay. Sorry.
>> No issues. You can spend 5 minutes more.
Um.
>> Okay.
>> Great. It's so Yeah. Um, we also have
implemented metal lama
>> and uh there are lot of uh use that we
are doing it.
>> Okay.
>> Same platform.
>> Oh, cool.
>> Definitely. Yes.
>> Nice too. Nice that then we are in sync.
My session is in sync with you guys.
>> Yes. Yes.
>> I'll fasten I'm already done. I just
need to do this deployment then I'll
show the Azure part. Uh
so actually I just wanted to give you a
feel of how this uh
>> yeah since this is
>> the new age developers should work.
>> Yeah many people are from my team also
and uh so not all are from the same
domain. So this would have been great
session for them. Okay. Nice. Nice to
learn.
Okay.
Okay. Now I am committing
it.
So that uh now here actually when I do
this it will get pushed to SCM uh like
kudu the g which sits in the kudu of uh
as your uh web
while it's uh completing the deployment.
Uh just even I'll just show you a few
more things. All these documents that
you see read.md and all it's all um I
have not coded anything like it's all I
have generated it. But again when you're
generating the code one um piece of uh
advice
peer review them if like uh don't uh do
a w code blindly. peer review them. The
reason being many times like all these
are open models. If you're having a your
own instance of GitHub copilot then a
separate story you train your own model.
So it will pick up from your
organization code. But again like if
you're working in an organization you
have multiple customers with you. So
each of these customers uh again can get
into a clash like it the model will
generate customer A's code in customer
B's application. Okay. So just peer
review them and make sure that you are
not leaking any customer informations,
you're not violating any licensing
agreements and all when you're doing
this.
So that is one of the most important
features that you need to take care of.
Okay, I think the deployment is
complete.
So it's just summarizing the
conversation history.
Then I will open the website and show it
to you.
Okay, it's taking a while.
Okay,
thank you. Now it has already it has
opened the application as well
everything what it has done it has
summarized you can go ahead check all of
this and also review the code so what
I'll do is I'll say open the
application
running on
Azure just from here.
So, it will open it for you.
See,
I'll just open it uh outside so that uh
one moment where it opened.
Okay.
Okay. So now again like I've already
demoed you this application so you
should be able to do it. So with this uh
background yeah sorry
uh I have come to the end of my
presentation. So Apollo is uh like uh my
bad. I thought it's a 1 hour session. So
like I I
brought in so many contents in uh
>> yes really really nice the especially
the you know the way you got the code
checked into uh Azure and everything
through GitHub pilot was fantastic and I
think many people would love it.
>> Yep.
Open to questions if there are any. Now
>> yeah any question please you know raise
your hand or just put in the chat
and u there is a session u um uh um team
can you send the link for certification
>> and uh also this session is recorded
right so we will all get the link for
the session
>> yes and there's a link certification
also uh where you can take the quiz
quick quiz and then there are some
goodies um uh which you know uh we don't
see a link here actually
um
it doesn't have a link tax sheet that
you have sent
>> okay
>> subscription
yeah there is a question like on
subscription uh so you can create a free
subscription but again like if you want
to use it in a professional community or
or rather you want to use it for
professional development purpose
definitely you can refer to this pricing
model
this is a community version I'm running
at so you can see the pricing out here
but I'm running it uh from a community
perspective it's not uh whatever
application I have hosted here I have
done it uh just for this session And
I'll be bringing down the web app. I'm
not uh monetizing this uh application.
Yeah. And the second thing is you can
also download um the way said you can
download llama to your you know uh local
server if it has that capacity and then
you can you know uh set up your APIs to
you know get it connected to anywhere
you love. Maybe we'll have to write
couple of layer of wrappers on on
different technologies whatever the
technology your team prefers maybe
Golang.net
NET, Java, whatever uh to utilize it um
in a in a real world scenario. But yeah,
hugging face is great if you just need
to you know test and start with it
a very low cost.
>> Maybe I will propose one more session
with Anjeniser for this uh how to run
the transformers and like the
transformer package and get to run those
uh packages the same application
locally. Yeah, definitely much needed.
Okay, so um let's get back to the team
here. Um
uh thank you so much Dr. Ravishek
Mistra. A wonderful session and big
thanks to my audience who have connected
with us. I hope this session is a very
productive and informative to all of
you. Keep learning, keep growing and
stay connected with fifth year community
upcoming event. Thank you.
🚀 Learn how to build production-ready AI applications using Llama models, Hugging Face Transformers, and Python! In this step-by-step tutorial, we’ll walk through everything you need to know — from model selection and fine-tuning to deployment and scaling for real-world use. Whether you’re an AI enthusiast, data scientist, or developer, this video will help you bridge the gap between experimentation and production. 💡 In this video, you’ll learn: ✅ How to use Llama models for NLP tasks ✅ Integrating Hugging Face Transformers with Python ✅ Model fine-tuning and optimization techniques ✅ Deploying AI models in production environments ✅ Best practices for scaling and monitoring AI applications 🔧 Tech Stack: Python, Llama, Hugging Face, FastAPI, Docker, and more 🎥 Perfect for: AI Developers, Machine Learning Engineers, Data Scientists, and Tech Enthusiasts 📅 Don’t forget to LIKE, SHARE, and SUBSCRIBE for more AI & ML tutorials every week! #AIApps #Llama #HuggingFace #PythonAI #MachineLearning #ArtificialIntelligence