Loading video player...
[music]
Hello everyone and welcome to the
Agentic AI [music] course for beginners.
Your starting point into the new era of
intelligent autonomous [music] AI
systems. Agentic AI is transforming how
machines work. Instead of just
responding to [music] prompts, AI agents
can now reason, plan, take actions, and
work across tools to complete [music]
task on their own. From smart assistants
to automated research agents and
workflow bots, this is the next big
[music] shift in artificial
intelligence. In this course, you will
learn the core building blocks [music]
behind agentic AI from understanding
deep learning, LLMs and transformers
[music]
to exploring lang chain, rag, LLM ops
and the modern [music] agent frameworks.
Whether you're completely new to AI or
already exploring generative models,
this course makes the concept simple,
[music] practical and beginner friendly.
By the end you will understand how
agentic systems work, how to build basic
[music] agents and how to start your
journey in this fast growing fit. So
before we begin, please like, share and
subscribe to Edureka's YouTube channel
and hit the bell icon to stay [music]
updated on the latest content from
Idureka. Also check out Edureka's
[music]
agentic AI certification training. It is
carefully crafted to meet [music]
industry demands and prepare you for the
future of intelligent agents. You will
gain practical [music] skills in lang
chain rag LLM ops and more through live
instructorled sessions and hands-on
labs. Whether you're [music] a beginner
or a tech professional, this course
helps you master the concepts and
accelerate your AI career. So check out
the course link given [music] in the
description box below. Now let us get
started by understanding what [music]
agentic AI is.
Agentic AI is transforming industries by
allowing machines to learn, adapt, and
evolve independently. Similar to live
organisms.
Unlike traditional AI, these intelligent
agents investigate, optimize, and
develop solutions over time without
requiring direct human participation.
Recent advancements include OpenAI's
deep research, which automatically
analyzes massive amounts of data to
provide detailed reports, and Google's
Gemini 2.0, O which improves AI's
capacity to plan and reason across
different data types. Service Now's AI
agent orchestrator is transforming
enterprise automation by coordinating
many AI agents to address difficult
business concepts. As these systems
become more powerful, they have the
potential to unlock ideas beyond the
human imagination ranging from wind
turbine blade design to AIdriven company
management. Let's start with our first
topic. What is agentic AI? Agentic AI
denotes artificial intelligence systems
capable of autonomously executing
actions to attain designated objectives
unlike reactive AI which only responds
to the inputs. Agentic AI is proactive
capable of planning, adapting and making
decisions autonomously. So let's explore
deep into agentic AI and see its
capabilities. Agentic AI is a type of
artificial intelligence that exhibits
autonomous behavior, enabling it to take
actions and operate without continuous
human guidance. It is goal-driven,
actively working towards achieving
specific objectives rather than
passively responding to inputs like
reactive AI. And with advanced
decision-m capabilities, it can evaluate
multiple options, select the optimal
course of action based on current
conditions and acquired knowledge and
adapt its strategies dynamically in
response to unforeseen changes in its
environment. Moreover, agentic AI
demonstrates proactiveness by taking the
initiative to act rather than waiting
for external triggers making it highly
effective in dynamic and complex
scenarios. Now let us see its relevance
in the current AI market. When AI
systems can act autonomously to
accomplish predefined objectives, we
call that agentic AI, making it highly
relevant in the current AI market. Its
autonomy allows it to operate without
continuous human guidance, making
decisions and adapting dynamically to
achieve objectives. This capability is
complemented by its advanced problem
solving skills, enabling it to evaluate
complex situations, strategize and
respond effectively to challenges.
However, the growing adoption of agentic
AI also rises important ethical
considerations such as ensuring
responsible behavior, minimizing
unintended consequences and maintaining
transparency in its decision-m
processes. Now that you know about
agentic AI, so let us discuss how it
differ from other AI systems. Agentic AI
differs significantly from other AI
systems in its autonomy, decision making
and adaptability to achieve long-term
goals. Unlike reactive AI which performs
predefined task only when prompted such
as spam filters or image classifiers,
agentic AI takes the initiative and
operates independently. It also contrast
with the generative AI which focuses on
creating content like child GPT
generating text but it is not
goal-driven by combining autonomous
behavior, strategic decision making and
the ability to adapt dynamically.
Agentic AI stands out as a powerful
system designed to achieve specific
objectives in evolving environments. Now
since we know a bit of differences, let
us see the comparison between generative
AI and agentic AI. Generative AI and
agentic AI differ in several key aspects
that define their functionality and
applications. Generative AI is primarily
focused on creation, excelling in output
focused tasks such as generating text,
images or other form of content. Its
adaptability is limited as it relies
heavily on prompts for guidance and
lacks the ability to operate
independently. In contrast, agentic AI
emphasizes autonomy, making it
goal-driven and capable of dynamically
adapting to changing environments.
Unlike the prompt dependent nature of
generative AI, agentic AI is
self-directed, enabling it to take the
initiative and execute strategic task
effectively. These differences highlight
the complimentary roles of both AI types
in addressing distinct challenges. Now
let us see the impact of agentic AI on
various industries. Agentic AI has had a
profound impact across various
industries transforming operations and
solving long-standing challenges.
Autonomous logistics systems such as
those in Amazon warehouses have
significantly improved operational
efficiency by 30 to 40%. In healthcare,
AI enabled surgical robots like the
Davinci system have performed over 10
million less invasive procedures
worldwide, enhancing precision and
patient outcomes. Scientific
advancements have also been transformed
by systems like deep minds alpha fold
which successfully solved the decades
old protein folding problem. On a global
scale, the World Economic Forum predicts
that by 2025, AI will displace 85
million jobs while creating 97 million
new ones, reshaping the labor market.
And in the energy sector, AI powered
smart grids can reduce electricity waste
by up to 10%. Promoting greener energy
solutions. Additionally, over 90
countries are investing in AI enabled
military technology to modernize their
defense systems, showcasing the
strategic importance of agentic AI in
global security. Now, let us see the
applications of agentic AI. Agentic AI
is transforming various industries by
enabling systems to make autonomous
decisions, adapt to changing
environments, and achieve specific
goals. Autonomous vehicle powers
self-driving cars and drones to navigate
roads, avoid obstacles, and make
real-time decisions as seen with Tesla
autopilot and autonomous delivery
drones. In robotics, agentic AI allows
industries healthcare and exploration
robots to perform complex task
independently as demonstrated by Boston
Dynamics robots used in logistics and
rescue operations. Personalized virtual
assistants like Google Assistant and
Amazon Alexa leverage agentic AI to
predict user needs, manage schedules,
and execute task without direct
commands. And in gaming, adaptive AI
agents enhance the experience by
creating challenging humanlike opponents
such as Alph Go and AI boards into
realtime strategy games. In healthcare,
agentic AI supports personalized
treatments, accurate diagnostics, and
surgical assistance with examples
including AIdriven surgical robots and
systems for remote patient monitoring.
These applications demonstrate the
transformative potential of agentic AI
across diverse domains. Agentic AI is
making a significant impact across
various industries by enabling autonomy,
adaptability, and efficiency in diverse
applications. In finance, it powers
algorithmic trading systems and fraud
detection tools, optimizing financial
operations such as managing investment
portfolios and identifying fraudulent
activities. In smart cities, AI systems
manage energy consumptions, optimize
traffic flow and enhance public safety
with examples like smart traffic lights
adapting in real time and autonomous
energy grid optimization. In space
exploration, autonomous spacecraft and
planetary rovers such as NASA's Mars
rovers perform exploration task
independently. In education, AI powered
tutors like Carnegie Learning provide
personalized instruction by adapting to
individual learning styles. In military
and defense, autonomous drones and
surveillance system improves situational
awareness and decision making such as
AIdriven surveillance drones in defense
applications. Now let us see the
challenges and risk associated with
agentic AI. While agentic AI offers
tremendous potential, it also faces
several challenges and risk that must be
addressed to ensure its safety and
ethical deployment. So one key concern
is misalignment with human goals where
AI system may pursue objectives that
conflict with human intentions due to
poorly defined parameters or intended
unintended consequences such as
autonomous robot prioritizing efficiency
over safety. Ethical questions arise
regarding accountability and decision-m
demonstrated by the challenge of
determining who is responsible when an
autonomous vehicle causes an accident.
The complexity of decision-m in agentic
AI can also lead to a lack of
transparency making it difficult to
understand or explain its actions
particularly in sensitive fields like
healthcare or finance. Ensuring safety
and reliability is another challenge as
AI systems must operate effectively in
unpredictable environments such as
autonomous drones encountering extreme
weather or medical failures.
Additionally, agentic AI systems often
require substantial computational
resources making their deployment costly
as seen in advanced robotics and
self-driving cars. Security
vulnerabilities pose further risk as
autonomous systems could be targeted by
cyber attacks potentially leading to
harmful consequences like the
manipulation of autonomous vehicles.
Lastly, overdependence on AI may reduce
human oversight or lead to skill
degradation in critical areas such as
relying too heavily on autonomous
systems for medical diagnosis without
human validation. These challenges
highlight the need for robust design,
rigorous testing and ethical frameworks
to mitigate risk and maximize the
benefits of agentic AI. Now let's see
the future of agentic AI. The future of
agentic AI is set to be transformative
with advancements across various domains
influencing its deployment. Future
systems will exhibit increased autonomy
and adaptability, enabling them to make
a complex decisions in real time and
operate effectively in dynamic
environments without human intervention.
The integration of agentic AI with
advanced technologies like quantum
computing, IoT, the edge computing will
further enhance its capabilities
allowing for faster decision making and
realtime processing at the edge. These
systems will have the widespread
applications in sectors such as
healthcare where they will enable
autonomous medical diagnostics,
personalized treatment plans and robotic
surgery. Climate action with advanced
systems for environmental monitoring and
response and space exploration where
smart rovers and spacecraft will carry
out missions on their own. As these
technologies evolve, ethical concerns
and accountability will need to be
addressed. promoting the development of
regulatory frameworks to ensure
responsive AI usage. Additionally,
agentic AI will foster human AI
collaboration, enhancing productivity
and creativity in the fields such as
education, engineering, and research.
[music]
Imagine asking Chad GP for a poem and it
writes one instantly. Now think about an
AI assistant planning your entire day,
booking meetings, and even handling
emails without your constant input.
That's the difference between generative
AI which creates content and agentic AI
which acts with autonomy making
decisions. In 2025, as AI becomes more
than just a tool, understanding the
shift is very critical. Are we heading
towards just smarter chatbots or truly
independent digital agents? Let's break
it down through this video. To truly
understand the ship, let's first break
down what generative AI is. Generative
AI is a type of artificial intelligence
designed to create content, whether it's
text, images, music, or even code.
Instead of making decisions or even
taking action on its own, it focuses on
producing outputs based on the patterns
it has learned from the vast amounts of
data. At its core, generative AI models
use deep learning techniques like
transformers to generate new content
that resembles human created work. For
example, Chad GPT generates humanlike
text based on prompts. Midjenny and Dali
creates stunning images from simple text
description and GitHub copilots helps
developers suggesting code snippets in
real time. Generative AI has several
strengths. It enhances creativity and
productivity allowing artists, writers
and programmers to work faster and even
more efficient. It scales effortlessly
generating unlimited variation of
content in just few seconds. It also
adapts responses based on user input
making interactions feel more
personalized. But it also comes with few
limitations. Generative AI lacks
autonomy. It doesn't think or act on its
own. It only responds when prompted. It
has no real decision-m abilities and
cannot evaluate consequences or make
even independent choices. Additionally,
it can generate biased or inaccurate
content based on the data that it has
seen. While generative AI is powerful
for creating, it cannot act
independently. And that's where agentic
AI comes in. Let's explore what agentic
AI is. Agentic AI goes beyond just
generating content. It acts autonomously
making decisions and executing tasks
without the need of constant human
input. Unlike generative AI which can
only responds to prompts, agentic AI can
plan, adapt and take initiatives based
on goals rather than the specific
instructions. At its core, agentic AI
combines reasoning, memory, and decision
making to operate more like an
independent agent. It doesn't just
create, it analyzes, strategize, and
acts. Real world examples include
autonomous robots which navigates and
complete the task on their own. AIdriven
personal assistant like those managing
schedules, booking flights and handling
emails without human oversight. Even
self-driving cars which continuously
assess their environment and make
split-second driving decisions. Agentic
AI has its own strengths. It reduces the
needs for manual intervention automating
the complex workflows. It adapts to real
world conditions, learning and improving
over time. It can even handle multi-step
tasks that require planning, execution,
and adjustment. But it also has its own
challenges. Developing truly autonomous
AI requires significant advancements in
reasoning and adaptability. There are
certain risks including unintended
behaviors and ethical concerns around
AI, which makes independent decisions.
And unlike generative AI which focuses
on creativity, agentic AI is limited in
how well it can generate novel content.
So while generative AI creates and
agentic AI acts, the real powers comes
when these two work together. Let's see
the key differences between generative
AI and agentic AI. Generative AI and
agentic AI serve different purposes,
each with unique strengths and
applications. The key distinction comes
down to creativity versus decision
making. As previously discussed,
generative AI focuses on producing
content, whether it's text, image, or
code. It enhances creativity by
assisting writers, designers, and
developers. But it lacks true autonomy.
It only works when prompted and doesn't
make any decision on its own. Agent AI,
on the other hand, is designed for
interactions and execution. Instead of
just generating responses, it can
analyze situations, make decisions, and
take actions. While it may not create
content like generative AI, it can
manage workflows, automate task and
adapt to real world conditions. Another
key difference is user dependency.
Generative AI is entirely reactive,
meaning it requires human input to
function. It waits for prompts before
generating anything. In contrast,
agentic AI is proactive. It can initiate
actions independently, setting
reminders, optimizing schedules, or even
solving problems without human
intervention. The applications of these
AI types also differ. Generative AI is
widely used in content creating,
marketing, entertaining, and software
development. And agentic AI powers
autonomous system like self-driving
cars, AI powered customer service and
personal assistant that can handle
complex workflows. Both AI types are
transforming the industries. But when
they work together, they unlock even
greater potential. Imagine an AI that
not only generates a marketing campaign,
but also launches it, tracks engagement,
and refine the strategy automatically.
The future isn't just about choosing
between generative AI and agentic AI.
It's about combining them two to build
truly intelligent systems. Now that we
understand the key differences between
these two, let's explore the future of
AI by asking, will generative AI be
replaced? As AI continues to evolve, one
big question arises. Will agentic AI
replace generative AI? Right now,
generative AI is everywhere, helping
people write, design, and code faster
than ever before. But it has one major
limitation. It relies entirely on human
input. Agent AI on the other hand takes
things further. It doesn't just
generate, it decides, plans, and even
acts. It's the next step towards the
true autonomous intelligence. Does that
means generative AI will be obsolete?
Not necessarily. The future of AI isn't
about one replacing the other. It's
about coexisting. Generative AI will
keep getting more creative and even
sophisticated, producing even higher
quality content. Agentic AI will become
even more autonomous, integrating deeper
with industries like healthcare,
finance, and robotics.
But this shift does comes with some
risk. As AI takes on decision-m power,
we face new challenges. ethical
concerns, unintended consequences and
the need for accountability. If an AI
agent makes a bad decision, who is
responsible? And how do we ensure it
aligns with the human values? The answer
lies in balance. The real future of AI
is hybrid approach where generative AI
fuels creativity and agentic AI drives
intelligent action. Imagine an AI system
that not only writes a research paper
but also submits it to generals,
responds to reviews and refine it
automatically. And this is where we are
headed. Not just smarter AI, but AI that
truly works with us as both a creator
and an agent. The question isn't whether
agentic AI will replace generative AI.
It's how we'll harness both to shape the
future of intelligence. Now that we have
explored the differences between
generative AI and agenic AI, let's move
on to building an intelligent AI agent
that can interact with our database
using natural language. This means you
can simply ask a question like show me
all the students who have scored about
80 and the agent will automatically
convert it into an SQL query, fetch the
data and return the exact result from
the database. No need to write complex
SQL queries manually, just ask and the
AI response. Let's dive in and build
this powerful system. First, we need to
set up a cond environment to manage our
project dependency. To do this, we open
the terminal and run the following
command. We'll write create p vv
python equals to
3.10 - y.
So, creates a new environment and hyphen
pvnv specify the environment path as
VNV. Python equals to 3.10 installs
Python version 3.10 inside the
environment and hyphen y automatically
confirms the installation without asking
for approval. Once the process is
complete, our virtual environment is
ready and we can move forward with
setting up our agentic AI project. Next,
we'll create a file named
requirements.txt. txt
where we'll list all the necessary
libraries for our project. This will
help us easily install dependencies in
one go. Additionally, we'll create a NV
file to securely store our Google
generative AI API key, keeping sensitive
information separate from our main code.
With these files in place, we ensure a
well structured and organized setup for
our agentic AI project.
First, we will work with SQLite, a
lightweight self-contained database
engine to create and manage a student
database. Let's break it down step by
step. So, we'll create a file named SQL.
py and import the SQLite 3 module which
allows us to work with SQLite databases.
We'll write import SQLite 3.
This module provides all the necessary
functions to create a database, insert
records, retrieve data, and manage
connections. Next, we create a
connection to an SQLite database file
named student. DB. We'll write
connection equals to SQLite 3.
Connection equals to SQLite3.Connect
in the bracket
in double inverted comma student. DB.
If this file doesn't exist, SQL light
will automatically create it. The
connection object will allow us to
interact with the database. Now we
create a cursor object which is used to
execute SQL commands in Python. We'll
write cursor equals to
connection.cursor.
Think of the cursor as a tool that helps
us send queries to the database and
retrieve results. Now we define a SQL
command to create a table named student
with four columns. We'll write table
info equals to triple inverted commas.
Next we'll create a table.
For that we'll write create table. Then
student we'll write in the bracket name
type vcar and we'll have 25 characters.
Comma class type vcar and the same 25
characters. Comma section type var with
25 characters and marks type integer.
Then we'll write cursor.execute
in the bracket table info. The name
stores the students name string up to 25
characters. The class store the class's
name and the section stores the section
of the student and lastly the mark
stores the marks obtained as integer.
Executing this commands creates the
table in the database. Next, we insert
five student records into the student
table using SQL insert statements. I've
already created and inserted five values
in the table. You can create as much as
you can. Each insert commands adds a new
role with the students name, class,
section, and marks. Now, we retrieve and
display all records from the student
table. For that, we'll have to write
print in the bracket. Print in the
bracket the inserted records are. In the
next line, we'll write data equals to
cursor do.executed in the bracket three
single inverted comma select star from
student closing the inverted commas in
the bracket. Then we'll write for row in
data colon print in the bracket row. The
select star from student query fetches
all the data from the table. The for
loop iterates through the records and
prints them one by one. And finally we
commit our changes and close the
database connection. For that we'll
write connection.
And then connection dot close. The
dotcommit function ensures all the
changes are saved in the database. The
dot close closes the connection freeing
up the system resources. And that's it.
We have successfully created a student
database inserted records and retrieved
them using SQLite and Python. Now let's
build an interactive streamlit app that
converts natural language questions into
SQL queries using Google's Gemini model.
It then retrieves data from an SQLite
database and display the result. Let's
break it down step by step. But before
we start, we have to activate the
environment. For that we'll write
activate venv forward slash.
And here our environment is activated.
First we'll create a file named app. py
and load environment variables using
env. For that we'll write from env we'll
import load env. Next we'll write load
env. It will load all environment
variables. This ensures that sensitive
information such as API keys is securely
stored and accessed. Next we import the
necessary modules. For that we'll write
import stream lit as st. Then import OS.
Then import escalite 3 and then import
Google.generative AI as genai.
Streamlight here powers the web
interface. OS helps access the
environment variables. SQLite 3 allows
us to interact with the database and
Google generative AI enables the
conversion of natural language into SQL
queries. Now we configure the Google
Gemini API key. But before that we'll
have to create a API key through Google
studio itself. I've already generated
one. You can create yours through Google
studio itself.
Then we'll write genai.configure
in the bracket API_key
equals to os do.get env
key. This allows the app to use Gemini
1.5 Pro to generate SQL queries. Then we
define a function to generate SQL
queries from natural language input
using gemi. For that we'll write defaf
get_jemni
response in the bracket question,
prompt. Next we'll write model equals to
genai, generative model in the bracket
we'll write models/jna
version 1.5 pro. Then we'll write
response equals to model.generate
generate underscore content in the
bracket and in square brackets prompt in
the square bracket zero and comma
question and then we'll write return
response text. The function initializes
the Gemini model. It takes a question
and predefined prompt as input and the
AI model generates an SQL query as
output. Next, we define a function to
execute SQL queries on the database and
retrieve results. For that we'll write
def read_sql_query
in the bracket sql comma db. Next we'll
write con equals to skqite 3 dot connect
in the bracket db. Then cur equals to
con.cursor
and then cur equals to execute in the
bracket sql. Then we'll write rows
equals to cur do fetch call. then con
dot commit and then con.t close
and then we'll create a loop by writing
for row in rows and then we'll print it
and then return rows. The function
connects to the student db database. It
executes the given SQL's query and it
fetches all the retrieve records and
prints them. Now we define the AI prompt
that instructs Gemini on how to convert
the questions into SQL queries. As you
can see, I've already created a prompt
for my own and you can create yours
according to how you want your model to
function. If you want the prompt which
I've used over here, you can just
comment on the video and I'll send it to
you. This prompts ensures the Gemini AI
generates SQL queries accurately without
unnecessary text. Now we'll set page
configuration with a title and icon. For
that we'll write st set_page
configuration in the bracket page title
equals to SQL query generator edurea
comma page icon. Then we'll display the
edureka logo and header. For that we'll
write st dot image in the bracket
123.png png comma width equals to let's
keep it as 200
st dom markdown in the bracket logo plus
ederica's gemini app/ your AI powered
SQL assistant
next we'll write next we'll write
st.mmarkdown
then the logo and ask any questions and
I'll generate the SQL query for you the
page title and the icon are set a logo
is displayed at the top and the app's
purpose is to introduce to the user. And
before we import the logo, just make
sure that you have the logo in your
folder. We take user input for a natural
language query.
For that, we'll write question equals to
st.ext_input
in the bracket enter your query in plain
English colon, key equals to input. This
allows users to type their questions
such as show all students with marks
above 80. A submit button triggers the
SQL generation process and for that
we'll write submit equals to ST dot
button in the bracket generate SQL
query. When clicked the app processes
the query and retrieves the result. Now
we define what happens when the submit
button is clicked.
For that we'll write if submit in the
next line response equals to get gemini
response in the bracket question,
prompt. This is to convert the question
to SQL. And then we'll print the
response.
Then we'll write response equals to
read_sql_query
in the bracket response, student db. And
this is to execute SQL on the database.
Then we'll write ST dos subheader. In
the bracket the response is brackets
close.
Next we'll include a loop for then row
in response. Then we'll write st. dot
subheader in the bracket the responses
and then we'll include a loop for row in
response. Then we'll print row and then
st dot header and in the brackets row.
The user's question is converted into an
SQL query using Gemini AI. The SQL query
is executed on the student DB database
and the retrieve records are displayed
on the streamllet app and that's it. The
AI powered streamllet app allows users
to ask natural language questions which
are automatically converted into SQL
queries and executed on a student
database. Now let's open the terminal
and run our streamllet app. To do this,
we simply type streamllet run app. py
and hit enter. It's running. And as you
can see, our agentic AI is up and
running, ready to interact with our
database. Let's test it by asking a
simple question.
We'll ask, give me the names of all the
students. The AI processes our request,
converts it into an SQL query, and
retrieves the student names from the
database. Perfect. As you can see, the
response is generated. Now, let's try
another query. We'll say, give me the
average of marks.
And just like that, the AI calculates
and returns the average marks. The
response which is provided is 72.2. So
in this video we successfully built an
agentic AI that can understand natural
language, generate SQL queries and
interact with our data seamlessly.
Think about this. Instead of you doing
all your work, you have a machine to
finish it for you or it can do something
which you thought was not possible. For
instance, predicting the future like
predicting earthquakes, tsunamis so that
preventive measures can be taken to save
lives, chat bots, virtual personal
assistance like Siri in iPhones, Google
Assistant and believe me, it is getting
smarter daybyday with deep learning,
self-driving cars. It will be a blessing
for elderly people and disabled people
who find it difficult to drive on their
own. And on top of that, it can also
avoid a lot of accidents that happen due
to human error. Google AI eye doctor. So
this is a recent initiative by Google
where Google is working with an Indian
eye care chain to develop an AI software
which can examine retina scans to
identify a condition called diabetic
retinopathy which can cause blindness.
AI music composer. Who thought that we
can have an AI music composer using deep
learning? And maybe in the coming years
even machines will start winning
Grammys. And one of my favorites, a
dream reading machine. With so many
unrealistic applications of AI and deep
learning that we have seen so far, I was
wondering that whether we can capture
dreams in the form of a video or
something. And I wasn't surprised to
find out that this was tried in Japan a
few years back on three test subjects
and they were able to achieve close to
60% accuracy. And that is amazing. But
I'm not sure that whether people would
want to be a test subject for this or
not because it can reveal all your
dreams. Great. So this sets the base for
you and we are ready to understand what
is artificial intelligence.
Artificial intelligence is nothing but
the capability of a machine to imitate
intelligent human behavior. AI is
achieved by mimicking a human brain by
understanding how it thinks, how it
learns and work while trying to solve a
problem. For example, a machine playing
chess or a voice activated software
which helps you with various things in
your phone or a number plate recognition
system which captures the number plate
of an oversp speeding car and processes
it to extract the registration number
and identify the owner of the car so
that he can be charged. And all of these
wasn't very easy to implement before
deep learning. Now let's understand the
various subsets of artificial
intelligence. So till now you'd have
heard a lot about artificial
intelligence, machine learning and deep
learning. However, do you know the
relationship between all three of them?
So deep learning is a sub field of a sub
field of artificial intelligence. So it
is a sub field of machine learning which
is a sub field of artificial
intelligence. So when we look at
something like Alph Go, it is often
portrayed as a big success for deep
learning, but it's actually a
combination of ideas from several
different areas of AI and machine
learning like deep learning,
reinforcement learning, self-play, etc.
And the idea behind deep neural networks
is not new, but it dates back to 1950s.
However, it became possible to
practically implement it only when we
had the new high-end resource
capability. So I hope that you have
understood what is artificial
intelligence. So let's explore machine
learning followed by its limitations.
So machine learning is a subset of
artificial intelligence which provide
computers with the ability to learn
without being explicitly programmed. In
machine learning, we do not have to
define all the steps or conditions like
any other programming application.
However, we have to train the machine on
a training data set large enough to
create a model which helps the machine
to take decisions based on its learning.
For example, if we have to determine the
species of a flower using machine, then
first we need to train the machine using
a flower data set which contains various
characteristics of different flowers
along with the respective species. As
you can see here in the image, we have
got the sele length, sele width, petal
length, petal width, and the species of
the flower too. So using this input data
set, the machine will create a model
which can be used to classify a flower.
Next, we'll pass on a set of
characteristics as input to the model
and it will output the name of the
flower. And this process of training a
machine to create a model and use it for
decision making is called machine
learning. However, this process had some
limitations. Machine learning is not
capable of handling highdimensional data
that is where input and output is large
and it is present in multiple dimensions
and handling and processing such a data
becomes very complex and resource
exhaustive and this is termed as the
curse of dimensionality.
So to understand this in simpler terms,
let us consider a line of 100 yards. And
let us assume that you dropped the coin
somewhere in the line. You'll easily
find the coin by simply walking on the
line. A line is a single dimension
entity. Now let's consider that you have
got a square of side 100 yard each and
you dropped a coin somewhere inside the
square. Now definitely you'll take more
time to find the coin within that
square. A square is a two-dimensional
entity. Now let's take it a step ahead
and consider a cube of side 100 yards
each and you dropped a coin somewhere
inside the cube. Now it is even more
difficult to find the coin. So if we see
that the complexity is increasing as the
dimensions are increasing and in real
life the highdimensional data that we're
talking about has got many dimensions
which makes it very very complex to
handle and process. The highdimensional
data can be easily found in use cases
like image processing, natural language
processing, image translation etc. And
machine learning was not capable of
solving this use cases and hence deep
learning came to the rescue. So deep
learning is capable of handling the
highdimensional data and is also
efficient in focusing on the right
features on its own and this process is
called feature extraction. Now let's try
and understand how deep learning works.
So in an attempt to re-engineer a human
brain, deep learning studies the basic
unit of a brain called a brain cell or a
neuron and inspired from a neuron an
artificial neuron or a perceptron was
developed. So if we focus on the
structure of a biological neuron, it has
got dendrites and these are used to
receive inputs and these inputs are
summed up inside the cell body and using
the axon it is passed on to the next
biological neuron. So similarly a
perceptron receives multiple inputs
applies various transformations and
functions and provides an output. As we
know that our brain consists of multiple
connected neurons called neural network.
We can also have a network of artificial
neurons called perceptrons to form a
deep neural network. Let's understand
how a deep neural network looks like. So
any deep neural network will consist of
three types of layers. the input layer,
the hidden layer and the output layer.
So if you see in the diagram, the first
layer is the input layer which receives
all the inputs. The last layer is the
output layer which gives the desired
output. And all the layers in between
these layers are called hidden layers.
And there can be n number of hidden
layers thanks to the high-end resources
available these days. And the number of
hidden layers and the number of
perceptrons in each layer will be
entirely dependent on the use case that
you're trying to solve.
And there is mechanics to decide the
number of hidden layers. However, we'll
not get into that in this session. Now,
since you have a picture of deep neural
network, let's try to get a highle view
of how deep neural network solves a
problem. For example, we want to perform
image recognition using deep networks.
So, we'll have to pass this
highdimensional data to the input layer.
And to match the dimensionality of the
input data, the input layer will contain
multiple sub layers of perceptron so
that it can consume the entire input.
And the output received from the input
layer will contain patterns and will
only be able to identify the edges and
images based on the contrast levels. And
this output will be fed to hidden layer
1 where it will be able to identify
various face features like eyes, nose,
ears, etc. Now this will be fed to
hidden layer 2 where it will be able to
form the entire faces and sent to the
output layer to be classified and given
a name. Now think if any of these layers
is missing or the neural network is not
deep enough then what will happen?
Simple we'll not be able to accurately
identify the images and this is the very
reason why these use cases did not have
a solution all these years prior to deep
learning. So just to take this further,
we'll try to apply deep network on an
MNEST data set. So the MNEST data set
consists of 60,000 training samples and
10,000 testing samples of handwritten
digit images. And the task here is to
train a model which can accurately
identify the digit present on the image.
And to solve this use case, a deep
network will be created with multiple
hidden layers to process all the 60,000
images pixel by pixel and finally will
receive an output. So the output will be
an array of index 0 to 9 where each
index corresponds to the respective
digit. So index 0 contains the
probability of 0 being the digit present
on the input image. Similarly, index 2
which has a value of 0.1 actually
represents the probability of two being
the digit present on the input image. So
if you see that the highest probability
in this area is 0.8 which is present at
seven index of the array. Hence the
number present on the image will be
seven. So this is how the handwritten
image processing happens. Let me
practically execute this use case for
you. So this is my PyCharm IDE. First of
all, let me show you the data set. So
this is my emnest data set and it has
got four GZ files which gets extracted
when my program gets executed. Now the
program or the deep neural network using
which I was able to create a model to
process all these images and train my
machine is this create model_2.
py and it's a good lengthy program. So
I'll not be explaining the entire
program for you. But let me tell you the
technology or the framework with which I
was able to implement this. So I've been
using TensorFlow which is one of the
open-source Google libraries for deep
learning. And right here I have imported
TensorFlow and then I'm using this Mnest
data and finally going ahead and
creating a deep neural network. So these
all things are here. It is creating a
deep neural network and the hidden layer
that is required to process all these
images. And finally, I'm creating a
model and I'm saving this model with
this name right here, model 2. CKpt.
Now, if I run this code, it is going to
take a very long time. So, give it some
time.
It has extracted all the files and it
has started its training. It is at step
zero now. So in order to completely
train this model, it is going to take
20,000 steps. Let me show you in the
program as well. So here it is. So in
here I've set the steps to 20,000. But
you can always configure it to a number
that is,000 2,000. However, you'll have
to run this code for n number of times
so that you can achieve a particular
accuracy.
So after executing for 20,000 times what
happens is a model is created with an
accuracy of 92%.
So what does it mean? It means that if
you pass a particular image out of 100
images of the model 92 predictions will
be correct. 92 of times this model will
be able to tell you the exact number
that is present on the image. Let's now
wait for this program to execute
completely otherwise we have to wait for
ours. So I've already executed it once
and the model is already created. So
what is a model? A model is nothing but
a set of files and these three files
along with the checkpoint files. Now
there is another code which is
predict_2.
py in which I'm restoring the model and
let me show you the line where I'm
restoring it.
So it's right here saver.restore
model 2.CAP.
So this is the name of my model. So I'm
restoring my entire model that was
created after training of 20,000 steps
on MNEST data set and I'm passing an
image that is 7_o.png.
So it is in this folder in the test
folders. I've got other images and I've
got here 7_o.
Now this is an image. It's the name of
the image. I'm not telling the program
what is the number. So I'll just stop
this training now and now we'll execute
the prediction part where I'm restoring
the model and this model will tell me
the number the handwritten number that
is there in the image. So this was the
image 7 O and the prediction for this
image is 7. So my model was accurately
able to identify or predict the number
that was there on the handwritten image.
So let me change the image now and let
me execute this again just to show you
the image.
All right, there's one good question
that I would like to take. So Akil asked
that how are you saving the weight for
the neural net? Can you show us? Sure,
Akil. So in the previous file that I
executed, I showed you that I'm using a
object called saver. And using the saver
object I can save the entire model and
weights are also automatically saved
along with this model in the checkpoint
file. So using checkpoint we can
actually reach the final state of the
training and then we can use the
prediction model. I hope that is fine.
All right. So if you see this seven,
this is a handwritten image. This is
somebody who writes seven like this with
a strike in between. And now I'm passing
a different image of 7. It's 7_1.
So it's different from the first one.
And I'll run this code again.
Now this time the seven is different.
And my machine learning model should be
able to predict that this is a seven
because people write seven in different
ways. Somebody likes seven like this.
Somebody writes seven and makes it look
like a one. They make the top part very
small. So there are different ways of
writing seven. So however a machine
learning model should be capable enough
to find out that as well. So let me just
close this and see the prediction.
Our model was able to predict the seven
as well and predict the value is seven
as well. So let me execute it again for
you.
So both the sevens were different but
still the prediction is correct for both
of them.
So now let us go back to our
presentation. So after the MNEST
application, let me show you a few more
applications of deep learning. The very
first is face recognition. Let me give
you an example. So all of you are using
Facebook and you do spend some time on
it. So if you remember a few years back
when you used to upload pictures with
your friends, it makes a box around a
human face with a box appearing at the
bottom to ask you to type the name of
the person to tag him. So it was able to
identify that it was a human face. But
now it is able to autotag. It is not
only able to detect faces but also
identify who it is. And how is this
possible? It is only possible using deep
learning. And Facebook also has a deep
learning library called cafe 2 using
which they have applied all these
things. The next use case that is
implemented using deep learning is
Google lens. This is one of those
applications that has been recently
launched for smartphones by Google. What
does this app do? You just have to
install it, open it, point your camera
on a particular thing like this flower
over here and in real time image
processing happens and Google will get
back with the entire details of the
object like the name of the flower where
it is found etc etc. So if you point it
at a building or any shop it will tell
you what kind of a store it is. If it's
a restaurant it will show you reviews
the ratings menu etc. So what is
happening here is that in real time you
are able to use a deep learning net and
get all the information you want and
these applications are really amazing
because it directly brings deep learning
to the end users or the common people.
So they can easily use the benefits of
deep learning without worrying about
what is happening at the background. The
next use case is the machine
translation. This is again a very
important use case and there is also an
app in play store and this is called
translation app. So here is an image
that says more chocolate.
I don't know what it means. I don't even
know which language it is. But with this
app what you can do is that you can
capture the picture of the packet and
this app will first detect the text in
the image then extract the text like
this and then translate it for you. So
for example it has detected the text
extracted it like this and here it has
translated morg which means dark and
then it writes back again on the image.
So what is happening in this particular
use case is that first an image is
captured. Image processing takes place.
Text is extracted through processing and
once we have the text we translate it to
the desired language which is English in
this case and then again image
processing happens where we are writing
the text on an image again. So more
chocolate means dark chocolate. So this
is a really great use case because this
is a combination of multiple learning
algorithms like CNN and RNN.
[clears throat] So these were a few more
applications of deep learning. I hope
that you found them interesting.
[music]
So first thing which comes to our mind
there have been lot of emphasis on this
term called artificial intelligence. So
let's first try to understand that what
is artificial intelligence on a very
high level and why we may need it in
first place for solving a problem. So
let's try to understand with an example.
Person goes to a doctor and he wants to
get checked at whether he got diabetes
or not. And what doctor would say is
okay there are some tests which you need
to get done and based on the test
results doctor would have a look and
from his experience from his studies and
previous examples the patients he had
seen he would be able to evaluate the
reports and say that the patient has
diabetes or not. So if you just take a
step back and think I said the doctor
has experience. So what do we mean by
experience? The doctor has learned what
are the characteristics of somebody
having diabetes. Will it be possible if
we can provide this experience in the
form of data to a machine and let
machine take this decision whether a
person has diabetes or not? So the
experience which doctor learned through
his studies and his practice what we are
doing is we are taking customers data
who have with different reports and
different parameters on different things
like the glucose count in blood or the
weight and height and all these
parameters about a human being and based
on that we have fed it to a machine and
tell that what are the characteristics
of a person who has diabetes and from
this let's say we have 1 million
customers data we have given to a
machine and let machine do this stuff
from his experience which comes from the
data or historical data to be precise
and do the same task which a doctor is
doing. So what we have done is if you
see from this example what an artificial
machine artificial intelligent machine
is doing that it's learning from the
historical data and trying to do the
same thing which an experienced and
intelligent doctor was doing
this kind of area or domain activities
which human beings were doing. If we can
make machine intelligent enough to do
the same task, why we should create
these artificial intelligence
on a very high level. There may be a lot
of points but if you just discuss points
that human beings have limited
computational power and we guys may be
good in terms of classifying things like
you know you can see a friend in a group
photograph and easily can say who is
your friend and who are others. You can
easily listen to a language and
comprehend what a person is doing. But
human beings are not very good in doing
lot of mathematical. If you try doing
good amount of mathematical computations
probably not very easy and second is
that it's not possible for human beings
to work continuously let's say 24 by 7 a
day for 30 days continuously if we can
make a machine do such stuff one they
would be able to kind of do these
computations very fast and like we spend
a good amount of time in discussing the
GPUs and I also mentioned that Google is
talking about a TPU machine transfer
processing units which would be hugely
changing the entire paradigm of
computations and machine would become
more and more competitive or even better
than human beings in some of the fields.
This is the formal definition but if I
loosely translate it's basically that
artificial intelligence machines are
those machines which can do tasks which
human beings can easily do. So things
like identifying what's written let's
say in a license plate or playing games
and I'm sure some of you would already
heard that machine have defeated the go
champion and the chess players. uh now
we have digital agents like Siri and
others which can understand what we want
them to do and can take intelligent
decisions from the text or from the
voice itself. Basically these are very
high level and some of the fields where
deep learning is made great inroads
something like game playing expert
systems self-driving cars robotics
natural language processing so there may
be different and new areas where we are
implementing all of you know that
everything every experience of human
beings is getting digitalized the kind
of things you buy kind of things you
watch and what your preferences are who
you like on Facebook who you don't like
what kind of movies you like and all
these things in terms of reviews being
captured online.
>> So once your data is going and captured
online there are systems which can
analyze this data. So given this huge
data generation as well as now we have
machines which can process it and make
some intelligent decisions are
available. So that's why you will see
there have been lot of emphasis now in
last couple of years lot of new things
are coming in. Some of you who have been
reading these papers on different
subjects, different architectures would
know that most of these architectures
are not very old.
>> It's a very dynamic field. Every day in
fact, on a weekly basis, you will be
hearing about a new API or a new kind of
architecture being developed by
somebody. Most of the stuff we will be
studying in these classes are not very
old like convolution neural network and
recurren neural network. Some of their
variants are as new as as last year. If
you guys follow TensorFlow closely, they
introduced a library called object
identification. Object identification,
object detection API which TensorFlow
has made available for everybody. You
would be able to see to yourself that
this API works. There are five different
options of selecting different deep
learning architectures or convolution
neural networks for this API. But it's
been able to identify human beings and
all other 90 objects there with almost
99% accuracy. In some cases from even
human beings would be finding it
difficult to kind of see and predict
what the object is. But this machine has
gone even beyond a human capability in
terms of identifying sending emails.
Given that we have a fair understanding
or very high level understanding we
haven't got details of artificial
intelligence basically from a loose
understanding that artificial
intelligence of making decisions or
machine making decisions which human
beings were earlier doing the task
something like
language and driving of car let's
understand how this machine learning and
deep learning related to artificial
intelligence given this learning that
now your machine is able to understand
and learn from the data. We can solve
multiple business problems with the help
of this.
>> So let's take a very small example. It
was like whether a person has diabetes
or not. And I was mentioning that this
kind of decision being taken by the
doctor based on the reports he has got
and these reports have some numbers like
number of time a patient a particular
kind of issues or what is the glucose
count and what is BMI and what the
person's a and bas of these numbers the
doctor was able to make this kind of
decision. We can take the same analogy
where we were trying to predict which
species of FL it is. We can take
information of patients and different
attributes on different features of a
report and the patient and the machine
would be able to learn from these data
sets and for a new customer or a new
patient it would be able to classify
whether the patient has diabetes or not.
So there are two sections of it. First
one is the information about the patient
and different characteristics of his
health. So from this which is number of
time and glucose count till age. These
are the information points about the
patients and the last column is the
information whether the patient has
diabetes or not. this kind of problem
where we have some information points
where they explain what the situation is
and other in the last column or the
information of output is in some kind of
classes. There's a specific type of
machine learning problem it is but as of
now the characteristic is that we have
some information about the patient and
the last column is telling me whether
the patient has diabetes or not. So what
a machine basically learns it that it
learns all those rules in the example
which I was quoting that earlier cases
people used to create these handcoded
rules to predict whether an event will
happen or not. But in machine learning
your [laughter] algorithm will learn
from the historical data and see what
are the combinations which decide
whether a patient has diabetes or not.
And these combinations would be of
something of this type. It is only for
illustration. It's not the real numbers
but it's for illustration that your
machine or your machine learning
algorithm has been able to identify
these rules based on the historical
data. So after learning it has created
the glucose count is less than 99.5.
If yes [snorts] then go to next one. If
no the person does not have to face if
glucose count
the person has diabetes and if no then
there are further drill down of rules.
So all these combinations or rules are
dynamic in nature and what I mean is
that these rules would be changing if
your data says changes and you can take
the same model can do the work whether a
patient has diabetes or not and you can
take the same model and make it learn on
a new data set let's say flower species
it would be able to learn the new rule
from itself. So the intuition like human
beings were learning from examples. Your
machine learning algorithms also learn
from examples. But just to frame our
problem statement that machine learning
we know it learn by experiences and from
the data from the historical events
there are three kind of problems which
may be interested in solving. First one
is called a supervised learning problem
and supervised learning problem is
basically occurs when you have some
input variables and one output column.
So both the examples which we discussed
till now one was on the flower species
where we are taking data on different
features of a flower and then which
species of flower it was. So the last
column is the dependent variable or
output variable we are trying to
predict. And all the information
variables are called input features or
input variables.
>> So input features or input variables
it's kind of interchangeably been used
in different input and output. These are
the two different sections of supervised
learning process why it is called
supervised learning. Another take if I
need to explain it we have a column to
guide the algorithm whether it's making
the correct decisions or not. So let's
say your model says person has diabetes.
The actual data says the person does not
have diabetes. So you have some kind of
correction mechanism within your data
itself which can help your model tune
itself to make better predictions. So
this kind of output variable some text
has also been called a teacher variable.
So it's guiding the algorithm to decide
those rules which I just go through.
Another type of machine learning is
called unsupervised learning. Best of
all in unsupervised learning we have
only the input features. customer
enjoyment
>> and our objective is that we should be
able to identify the patterns within the
data itself. So some of you who are
working in telecom domain or marketing
campaigns you would be very much
familiar with segmentation analysis or
cluster analysis where our objective is
to identify coherent groups within the
larger population. We take the customers
as it is the whole population and based
on different parameters and variables we
identify some of the groups of customers
or products whichever the business
problem is to identify which are similar
in nature so that we can take either
marketing campaign or develop new
products for those specific groups.
Final is called a reinforcement
learning. A reinforcement learning is a
kind of learning where the agent learn
from the environment. So it works kind
of reward and penalty. You can think of
a self-flying helicopter. So you leave
it in the environment and it'll be
deciding based on the wind speed and
other parameters in the environment that
how much it should fly and the reward is
that the fuel should be efficiently be
spent and more time it should be
spending in the environment. So
supervised learning as I said that the
objective here is that we have some
input features and input features would
be holding information about different
aspects of a given problem or customers
and we have an output variable which
would be explaining whether the event
happened or not some kind of output
variable. So there are two types within
supervised learning. One is called as
regression, second is classification.
And the differentiation happens only
because of the type of output column. If
your output column is the type of
numerical values or continuous values
like numbers, so that would be a
regression kind of problem, a very high
intuition level. For example, you're
working on a problem where you need to
predict how much would be the sale of
your company given the information that
how much they're spending on marketing,
how many employees are working, which
month it is of the year. And if you have
this information, you are going to
predict what is the million dollar of
sales your company would be doing. So
these kind of problems where your output
variable is of numerical values, then
it's a regression problem. On the other
hand, if the dependent variable is of
categorical nature or of discrete
values, it signifies that it's a
classification problem. Given that it's
a categorical values, your objective is
that how you can put the different
customers or products into different
classes. So that's why the name suggests
classification problem. So as I said
there are two kind of supervised
learning problems. One is regression and
another one is classification. So let's
take a use case where we need to predict
the housing price of a particular
locality. And we have information about
these houses on different parameters and
these parameters are like these. So
let's say what is the crime rate in that
area? How old is the home? The distance
is how far it is from the city. This is
from Boston housing data. So this is if
I'm not wrong it's percentage of black
population or some variable. We have the
description later on and what is the
actual price of the home and all these
features from crime to iset is the
information about the house. So these
are my input features and the output
feature is the price of the house. This
is in million dollars. And our objective
is that we should be able to fit an
algorithm that it should be able to
learn from all the historical homes
which were sold based on all the
features and what was the price it was
sold for. And once the model is trained,
it should be able to predict that how
much should be the price for a given
home. Let's take an example. Let's say
one of you is interested in buying a
home in the Boston area and you would
like to know that what is the ballpark
figure for a two-bedroom flat which is
of some square ft and let's say 20 miles
from a specific location what should be
the price. So one way would be you go
and talk to people and try to understand
that what has been the average price or
if you have this kind of algorithm
available which can help you understand
that given these features that was the
price and if you can create a regression
model it would be able to help you that
given some features of the new home
which you are interested in buying what
should be the price of it. So as I said
there are two sections. One is the
independent variables. All these
information about the home and the last
variable is dependent variable which
would be information about what was the
price. You see this is a kind of scatter
plot between the distance from the city
and price for the home keeping all other
variables constant. We are not looking
at the influence of other features. But
we are looking if you just need to model
or if you need to find a relationship
between the distance from the city and
what is the price from this graph you
can make out that further the city
houses the lower the price would be if
you keep all other things constant. So
here it's like if you can identify this
kind of relationship that's called
linear regression. But for a given
distance from the city you would be able
to predict what should be the ballpark
figure for a house. If you just have
this information not all other
information which we have talked about
in a similar fashion how it's been done
is that there would be a relationship
between the price of the house and all
the features which we discussed. So this
is only a relationship between one of
the variables distance to the city and
the price of the home. In a similar
fashion we would be able to find
relationship between the price and all
other features. So all of the features
if you know like how old is the home,
how big it is, what is the crime rate
and all. So this kind of model is called
a regression model and it's a very basic
equation of a straight line. Y here is
called the dependent variable. A is
intercept and B is called slope and X is
called independent variable. If you go
deeper and try to understand what it's
basically doing is this equation is
trying to tell me that if I already know
the relationship if I know the value of
a and b from my historical data which is
about different homes given the value of
x I should be able to calculate the y in
our particular example is price of the
home. Let me try giving an example what
slope means. So slope is the change
independent variable. If we change x
which is the independent variable by one
unit. So let's say if I change x by one
unit how much change in happen in y.
Help me understand what is the kind of
relationship between x and y. And a is
the value which tells the value of y
when the x is zero. And you can think of
it something like that. If you put the
value of x equal to0 whatever the value
is y then that's the intercept but
basically from intuition perspective you
can think that an intercept is the value
which is there even though you don't
have any information about x. For
example we were discussing the
relationship between the distance from
the city and the housing price. Even
though the house is exactly in the city
then there would be some value and even
though the house is 100 miles from the
city there would still be some price. So
it help us kind of intuitionally
understand the relationship. It also
help the line to understand where it
start whether it start from the origin
or some place within your axis. And this
kind of equation is called equation of
linear regression because if you see
here the power of x is one. So that's
why it's linear in nature. And it is
also that we are fitting the
relationship between x and only one of
the variables. Multiple regression where
what we do is that instead of finding
the relationship between only one
variable and the dependent variable in
most of the practical scenarios the
dependent variable Y is dependent on
more than one features. So for an
example, your house price is dependent
on all these features, all these
information points available and all you
want is that your regression model would
be able to identify relationship giving
all the information together and then
predict what is the price. The equation
becomes y equal to b1 x1 b2 x2 or b3 x3.
this kind of equation where B, B1, B2,
B3 and all these coefficients help us
understand that what is the contribution
of a single or of a given variable into
your regression equation. So let's have
a look that how you can fit this model
in Python. So if you look at the first
block of the code where we are saying
import panda as pen pd, import numpy as
np and import num plot lib as plt. This
is the convention in Python to import
some of the libraries which we'd be
using. So these are the libraries which
are required for running this module or
this this regression model. So once we
import these all the functions available
in these libraries we can call them very
easily and we will be seeing that how
you can call them. So once you have
imported these libraries if you see that
we are loading the data called Boston
and that this is the same data set which
we have been discussing in terms of a
use case here. So next line of code. So
here we are importing the data and
loading the Boston data set. The next
line of code we are calling the pandas
library because we have imported the
pandas library as pd. Then we are
calling a function called dataf frame so
that we can you know create a data frame
in python and we are creating it Boston
data. So it will be creating boss as a
data frame. So data frame on a very
loose term you can think of kind of
spreadsheet kind of format where your
data is being put in rows and columns
and you can think of an excel file kind
of framework for a data frame though it
will be different but just for intuition
purpose and after importing I'm calling
dot head. So what do head does it'll be
giving top 10 rows of my data set. So
there are all the 13 columns in Python
index start from zero. You can see that
index started from 0 1 2 3. So you can
see what this data is. This line of
command which says dot columns. So dot
column gives you all the features
available in your data set. So these are
the different feature names. So these
are the different column names for the
data and the price. In the end of the
code, I have written actually one line
of code which can give you all the
details that what the target variable
is. What was the history of data where
it was recorded and all. So you can
easily look at this. We are calling this
Boston.target and we are calling it as a
boss. So we creating a column in our
data frame which was boss from
Boston.target. So there is another data
vector available in the Boston data set
itself. And now specifically we are
saying y is equal to this particular
variable y we will be representing our
dependent variable all the features plus
the Boston price. So boss dot drop price
x is equal to 1. It means that we are
dropped the price variable from the
overall data frame and xis one specify
that we are removing the column. So x's
0 represent the rowle operations in
python and x's one represent the column
level operations. Print statement we are
just printing now the x. So this x is
all the input features of our data set.
So all these columns which we will be
using for predicting the housing price
and how it will be working. So it's not
an actual model but what actually
happening is once we have created the
model it will be doing that it'll be
fitting a line which would be going
through the actual data set would be
something like this that price is equal
to some intercept term plus b1
multiplied by crime b2 multiplied by
another variable z and then b3
multiplied by another variable and so on
and so forth and this intercept and b1
b2 and bn would be the coefficient which
your model would be learning from your
already available data and here we are
showcasing top five values of our
housing price. So y is the dependent
variable and we are looking at what is
the top five values. So this was only a
very brief and very basic introduction
that how do you import our data how you
can see what are the different columns.
It has nothing to do with machine
learning but it is only for people who
are new to Python and for people who
have been out of touch in Python if just
want to brush up skills and this line of
code if you have a look which I'm
highlighting now it is we are using a
scikitlearn model for test train and
split and what it's doing is for both
because we have already x and y the test
sizes we are saying 33 so it's basically
we are randomly selecting 33% of the
data for we putting it sep separate in
the test bucket so that we can test it
later on. And this random state five
means because we'll be randomly
selecting if you specify a random state.
Every time you run this code, you will
be selecting the same set of elements
from your data set. It help you
understand that the variation if you run
the code multiple times by changing the
variation is not coming because of the
selection of sample. It should be
because of the different model changes
you are making. This dotshape function
in Python specify that what is the
dimensions and if I mention the first
one X train.shape is giving me 339 and
13. Basically it's telling me that there
are 339 rows and 13 columns in the data
set. X test there are 167 rows and 13
columns. And your Y test is just 339
rows and there's just one column or it's
just one vector. The number of rows in X
train and Y train are same. In X test
and Y test the number of rows are same
because they have been selected for the
same combination. So same houses we have
selected the input features as well as
the corresponding values of output and
same has been done for the test section.
What we are doing here as I was saying
that we have imported a library called
scikitlearn and scikitlearn has
different modules for different machine
learning algorithms and linear
regression is one of the modules in
scikitlearn. So we can call this
scikitlearn module from linear
regression called lm and this equal to
and in python it's called assignment
variable. So we are assigning LM as a
linear regression module in the
scikitlearn. And now what we are doing
is if you look at this line only lm.fit.
So basically we are telling that use the
linear regression module from
scikitlearn and fit the model between x
train and y train. So basically what we
are telling the model that you learn
those coefficients for different x
values given y values in the training
data set. Basically what the fit
function does it calculate the values of
your intercept B1 B2 B3 for all the
features in your input features for a
given Y variable. So once we have fit in
the model and once that has been fit we
can use the same model for doing the
predictions. So let me remove it. It
should be like this. So lm.fit fit we
have fit in the model and once there has
been fit we can use the learned model
which is lm with a function called
predict x train. So what it will be
doing is once it has learned those
coefficients B intercept and B1 B2 B3
for all the features and input you can
use the predict function for making the
predictions for your training data set
and you already have actual values as Y
train and then you can compare that how
good your model is doing and how you can
compare it that if you look that we are
put together same thing we have used for
the X test data set LM.predict predict X
test and again the prediction has been
done. So if you see here I have put
together as a data frame Y test and Y
test bread and the difference look like
this for the first value which was 37.6
and the actual value was 37 this value
then the predicted value this actual
value is this and this difference
between the actual value and the
predicted value signifies that how much
is the error in your data set. So had it
been that your model is giving the same
prediction as it was the actual value
you would say there is no error. Your
model is 100% accurate and all the
predictions being made by the model is
absolutely you know bangon but normally
doesn't happen. You end up having
predictions which are a bit off from the
actual value and we measure the
difference as one of the characteristics
or one of the parameters to identify how
correct your model is. In this
particular statistic there are two
metrics being used for identifying but
the most uh basic one used is called
mean squared errors. And what mean
squared error is it is basically the
difference between actual value and the
predicted value by the model. And what I
mean by this is that let's say this is
your predicted value 37.6 and this is
your actual value. What you do is you
take the difference of these two and
then take the square of it. Why we take
the square of it? Because this in some
values the difference may be negative or
positive and if you sum it up the
difference may come to zero and you may
end up thinking that okay model is doing
really good stuff. To avoid it, what we
do is that we take the square of it so
that the difference between actual and
the predicted becomes positive and you
can sum it up to showcase that how far
your predicted values are from the
actual value and then you take a mean of
it to showcase that what is the mean
difference between the actual value and
the predicted value. It can also be used
for model comparisons. Here I can show
you that how it is working that let's
say you have some actual values
something like this and let's say you
fit a model I fit a model so there is
one model prediction one another model
is prediction two so what you can do is
you can take the difference between the
actual and the predict so 10 minus 2 is
2 and then you take the square of it
which is four 23 and 21 again two square
of 2 4 then third one the difference is
five and then square of is 25 and so and
so forth for all the values You get the
total value of sum of squared errors and
you divide it by the number of inputs
which is five and you get the mean of
the squared errors and you do the same
thing for the second one and if you see
it is very less five. So probably it
would be able to help you understand
that which model is doing a better job
in terms of predicting the housing
prices or any other numerical variable.
And there is another statistic which has
been used for identifying how good your
model is which is called mean absolute
percentage error. And that's basically
the absolute difference between the
actual value and the predicted value
absolute terms. and you sum it up all
the values for all the entries and
divide it by number of all the value of
absolute value of your actual values and
it can help you understand that what is
the average percentage your predicted
values are different from the actual
values. So sometime if you see it will
be somewhere in percentages. So what
I've done is I've taken the absolute
difference between this value and the
predictions I sum it up and divide it by
sum of my input values. So whatever
value comes in you can say okay it's 5%.
So it'll be fair to say that your model
is 5% off from the actual values or the
error term in your model is let's say 5%
or 6%. And whatever predictions you're
making from the model you can keep a
buffer of that percentage when you share
it with the team. And what I mean is
that let's say your mean absolute
percentage error is around 10% and it's
about the sales of a company. So when
you share this forecast you say that my
predictions are around 90% accurate they
may be actual sales may be plus - 10
percentage. So this can help you giving
this kind of variability in your
predictions. However you have
implemented code in python itself or the
scikitlearn library you can call mean
squared error the function from skarn
and it can help you calculate the MSC
for a given model. So basically there
was a very quick introduction to linear
regression. Though there are different
applications but one thing remain common
that we are trying to predict the
dependent variable whose nature or the
type of dependent variable is a
numerical or continuous data. Some of
the applications like predicting life
expectancy based on these features like
eating patterns, medications, disease
etc. You can predict housing price. We
have already seen the example on that.
We can predict the weight on different
features like sex, weight, prior
information about parents and all. And
you can also predict the crop yield of
crop based on different parameters like
rainfall and all. And as I said, this
like very limited uh use case uh list.
I'm sure people who are working with
sales department, you have to make
predictions how much would be the sales.
People who are working with call
centers, you need to predict what would
be the number of calls for next month.
people who are working with marketing
you need to predict what would be the
footfall in a given company or a mall.
So there are different applications of
regression models but one thing is
common across all these applications is
that the dependent variable which we are
trying to forecast is of continuous data
type. So let's get moving to the next
agenda for logistic regression. So at
the time of the introduction to machine
learning, we discussed there are two
kind of supervised learning techniques,
one is regression and other one is
classification. And the major
differentiating factor between the two
were that in regression we held a
dependent variable of continuous values
and in classification problems we had a
dependent variable of categorical types.
So let's take an example of how we can
do it. So here let's take a use case
where we have got some information about
some customers and the data set looks
like this that we have some customer ID
or user ids gender of the customer or
the user his or her age estimated salary
every month so you can think of in any
one of the currency either INR or
dollars and whether this user purchased
an SUV or not. So as I was saying
earlier the dependent variable here is 0
and one. So it's a discrete value or
categorical value which we need to
predict and the features which we'll be
using in the model are age and
estimation.
Why we would need a logistic regression
kind of algorithm? It would be a
straightforward process that if I take
purchase as a numerical value 0 and one
and I take some input features like age
and estimated salary and you will be
right in saying to some extent that this
is a possibility of doing it. So there
are two major problems coming if we
follow this and some of you can help me
what may be the problems if I try using
the linear regression for solving this
kind of problem. But one limitation I
can think of is that here I'm looking
for an output which can give me some
kind of [snorts] probability that how
likely I am to buy a product or service.
So one thing the limitation or the
restriction with probability is that the
probability term should be between zero
and one and zero signifies that there is
no probability or there is no likelihood
of event happening and one means that
it's certain that the event will happen.
There is no possibility that we can have
probability values less than zero or
greater than one. So if I'm fitting a
linear model taking the purchased column
as my dependent variable, my values
because the linear regression has no
such limitations can pass these values
beyond one or less than zero. So what I
require is that I fit the model in the
similar fashion like I did the linear
regression the equation I used earlier.
that I want that information would be
coming from my features in the similar
fashion. But what I want actually is
that this y should be mapped to the
values between 0 and one. And given the
limitation we have just talked about
probability that it should be between 0
and 1. It should not go beyond one and
less than zero. I need to find ways if I
can kind of force fit or kind of force
this y value which would be coming from
this equation and I force fit into a
values between 0 and 1. So to solve this
problem there was a function called
sigmoid [snorts]
activation function which would be
extensively being used in our uh deep
learning as well at different places.
But logistic regression comes from this
activation function itself which is a
function looks something like this that
output value would be 1 / 1 + e ^ - x
and x is not actually the one of the
input but any value we are giving it.
And if you fit any value into this
particular equation it can convert any
value between minus infinity to plus
infinity. It will map it to between 0
and 1. If your value of x which you are
putting in here, I could have selected a
different value, different name at
least. But if you give the highly
negative value, the output would be very
very close to zero. If this input is
positive, then the output would be close
to 1. If the value of x the input here
becomes zero, what would be the output?
1x2 because any values power 0 is equal
to 1 and 1 / 1 + 1 would be equal to 1x
2 or half. So logistic regression is
nothing but an extension of your linear
regression itself with one additional
fact that you want to force fit your
output between zero and one and for that
you are using activation function called
sigmoid activation function or sometime
it's also been called logistic
activation function to do the same task.
So this is an intuition behind your
logistic regression where you take
values of your equation from intercept
and different coefficients for your
input and you map these outputs between
0 and one. So once we have understood
that logistic regression is nothing but
the extension of your linear regression
only with a restriction on the output
being mapped between 0 and one. We are
shifting had we are fitting the
regression equation we would be having
scenarios where the value would be going
beyond one or less than zero and to
avoid this scenario we fit in logistic
regression with the help of sigmoid
activation function which looks like
this and if you see as I was saying when
your value of your model go beyond let's
say this is R0 so all the values which
are positive and greater than zero the
curve goes and tangential towards one
and for all the values which are less
than zero it goes towards zero and at
the place of zero the probability is 0.5
so it's a 50% probability if your output
is very much close to zero or it's zero
it can be used for multiple scenarios
one of the example we are taking is the
example whether somebody will buy an SUV
or not but if you're trying to solve
problems like somebody will say yes or
no to a product or service or whether
something is true or false or high low
or any different categories theories but
logistic regression can easily be put in
for multiclass classification problems
and basically if I just give you a very
quick introduction how it works is that
in multiclass classifications it kind of
does mapping that one class versus rest
of the other classes and then same
analogy follows that which class a
particular event would be associated
with but end of the day for whichever
class or category the probability is
highest the model will predict that uh
it should we belong to that particular
class like MSC. We have a statistic or a
parameter to evaluate how good your
model is doing and that was a parameter
to check that what is the difference
between the actual value and the
predicted value and how we were doing
it. We were taking the value which was
actual subtracting the predicted value
taking square of it. Do it for all the
examples and divide it by number of
training examples we have and then it
gives you some number and I was also
saying that this number is helpful in
kind of comparing different models. So
let's say you fit a model, I fit a model
and we compare MSC for both of them.
Whichever model is giving me a lesser
MSSE, it is kind of an indication that
probably your model is doing a better
job in terms of prediction than mine. In
a similar fashion, we needed a kind of a
statistic to see how good your model is
doing when your model is doing a
classification problem. So here there
are four categories that let's say we
have only two classes good and bad
actual values good and bad and what your
model is predicting good and bad. So
four examples which belong to good
category and your model is also
predicting them good category. So this
type of events or examples are called
true positives because your model is
doing correct prediction on positive
examples. Another category which is your
actual value for those examples is bad.
They belong to bad category and your
model is also predicting them bad. These
are called true negatives and these are
correct predictions because whatever the
actual value is your model is also
predicting the same thing. However,
there are two categories where your
actual value was bad but your model is
predicting good. These kind of examples
are called false positive because your
model is falsely predicting then these
are good examples. And another category
or last category is called false
negative where actual value were good
and your model is predicting bad. So how
do you learn or how does your model say
that which model is doing good job. So
what we do is we calculate what is the
percentage of values examples have been
predicted correctly. These sections in
blue true positive and true negatives
these are the examples which your model
has been able to predict correctly. And
these two groups false positive and
false negative are the incorrect
predictions. So what we do is we just
want to take what is the percentage of
correct predictions. And this matrix is
also called confusion matrix.
Let's say there are some examples out of
which 65 examples were there where
actually they were good category
examples and your model is also
predicting them as good class good
category examples. 44 are those where
they belong to the bad category and your
model is also predicting them bad. This
is 44 and eight are uh actual bad and
prediction is good and four are actually
good and uh predicted bad. What you do
is you sum up all the correct examples
64 and 44 and divided it by all the
examples in your data set all correct
and incorrect ones and here you get 89%.
So all you can say that your model is
being able to predict 89% accurately or
if you want to explain it to your
business team and say that whenever I
give you a prediction that uh 100
customers will churn and I give you a
list of 100 customers I can say with
certaintity that at least 89 will churn
from them with some certaintity. So
because your model has given you 89%
accuracy. So that that's how it's been
kind of communicated to business teams
that we are thinking that our model is
99% accurate and whatever prediction we
are giving we are very very certain but
if your model accuracy is 70 or 60% then
when you give the predictions to your
business team you say that okay though
we are giving you the predictions but we
are not very certain whether it'll work
correctly or not. So this accuracy
percentage is in a similar fashion like
we did for linear regression as MSC to
identify how good the predictions are.
Your accuracy percentage is another
metric to see how close or how correct
the predictions has been. So now we can
see the implementation of logistic
regression in Python. So first few lines
if you see we are importing the
libraries or the machine learning
libraries which we require to do the
data manipulations. We are importing the
data which is a CSV format and this data
is already available on your LMS. If you
want to import you can easily import
from the LMS itself unlike the Boston
data set which we were importing from
the library itself. Here we have got a
flat file as social network ads.csv and
you can call read csv function of pandas
library to import the data. So you are
importing the data as data set and as I
said head showcase the top five rows of
your data. So here we have only five
columns. One is user ID, gender, age and
salary. And the last column is our
dependent variable which signifies
whether a customer or a user bought the
SUV or not. So it's 0 and one and one
means the person bought. In the previous
code, we used one convention of
selecting X and Y. Here we are
showcasing another way of selecting
that's called eyelock. So we are looking
for the location and this convention if
I go through what we are doing here that
this is the data set within the data set
we are specifying the locations. This
colon means that we want all the rows
and as I was saying earlier that in
Python the index start from zero. So
what we are saying we want column 2 and
three. So what we mean that this is zero
this is one this is 2 and three. So we
want as our input features two and three
and the values. So it will be creating
an array of these two columns. We could
have used gender but I will leave it to
you that first we need to create the
gender as a vector of 0 and one. So you
can create a function which will say
okay if gender is equal to male then one
else zero or you can create dummy
variables there are function available
in scikitlearn. So it's an exercise for
you that this is the code already
available but I would encourage that if
you can also include gender information
into your model. The next line of code
why we are saying the dependent variable
is all the rows and column number four.
So column number four is your purchase
information whether a customer bought
the SUV or not and again the values to
convert into a kind of list format. So
we have specified two things. The two
columns is the information about the
input features or the information about
the user in terms of how much money they
make and what their age is and
information of why whether a customer
bought the SUV or not. And the next line
we are doing the train and test split
for the same stuff to evaluate whether
the predictions been made by the model
on the training data set on which the
model was learned is still doing the
correct classification on the data set
or the test data set which was not
involved at the time of training and
this 0.25 means that we are selecting
25% of the data for test and remaining
75% for our train. What is the correct
split of train and test? Normally it is
correct to choose between something like
6040 or 7030 or 80/20. If your data set
is big enough then I think having 80/20
kind of split is good or whatever you
can try these different combinations but
as a rule of thumb most of the time I
have seen people taking something like
6040 or 7030 kind of distribution
between actual value and the predicted
value. Now there is one important thing
for data prep-processing and this
selection [snorts] which I have made for
doing the data prep-processing and some
of you who come from the machine
learning background will already know
that how important it is to kind of
scale your data and what do I mean by
scaling that if you look at the data set
which we are using for input one is the
age column and second one is the income
column age can be somewhere between
let's say 1 to 100 or 120 20 at max and
your income is in like some thousand and
some 100,000 numbers. Both these values
are on different scale. Scaling your
data on let's say all the values between
0 and 1 will help me understand that
what is the importance of each variable.
For for example, if you look at a
regression equation and you see those
coefficient B1 B2 for all the input
features, these feature or these
coefficients can give you kind of
indication that how important a
particular variable is. But this
intuition will only be correct if all my
features were on the same scale. If
these features like age and salary when
they are on different scales, you will
not be able to compare what these
coefficient really mean because there
are two different scale your values come
from. So it is always a good idea to
have all your features on the same
scale. There are multiple ways of doing
it and there are multiple type of
scaling parameters. The simplest one is
called minmax standardization. And what
does it mean is that for a given column
let's say we are talking about age
column which is 19 35 26. If I need to
do minmax standardization what do I mean
is that I take the value it is let's say
19 and minimum value here is let's say
19. I have only these five values. So
how it works is that this is the
formula. This is the value or how it's
being presented. X I minus the minimum
value of the column. So let's say age
divided by max of age minus minimum of
range as well. But basically what this
formula will do if I do it for all the
values in the age column, it will be
converting all the values between 0 and
one. And there are other ways. I also
said that there are normalization
process which is like you take the value
minus the average value divide by the
standard deviation if I call it
correctly. So whichever method we apply
all I saying is that these values of age
and salary should be brought to the same
scale. So if I'm applying this minmax
standardization I'll apply to both my
columns so that both these variables are
on the same scale and I should be able
to use them in my model. And this is
again a very important thing that
whenever you do standardization you will
be using this process that you fit the
normalization or standard scaler on the
train data set and you use the same
learned standardization from the train
on the test data set. But basically how
it helps that your data set on which
your model is being trained it will be
converting the values between 0 and one
based on the minimum and maximum values.
If the test data set have different
minimum and maximum values, it can have
different value for the same number. So
that's why the process is that we make
our standardization fit on the training
data set and use the same minimum and
maximum value for test normalization as
well. It gives the same scale for all
the values and for model predictions.
It's very helpful. In a similar fashion
like we called a linear regression
object from scikitlearn in the previous
example in exact same way we can call a
logistic regression function from the
scikitlearn. So it's a scikitlearn
linear model and we are importing
logistic regression. Now um we are
fitting the logistic regression between
x train and y train the same way we did
it earlier for the linear regression.
And once it has been fit we can do the
prediction for test data set. We are
also doing the same thing that we are
calling the function which was
classifier for the logistic regression
and we doing it on the X test data set.
And here the default probability is 0.5.
So what your algorithm is doing in the
back end for all the examples wherever
the probability in X test became greater
than.5 it was tagged as one and for all
the examples where probability was less
than equal to.5 it was tagged as zero
and now we are calling this function
called confusion matrix between y test
and yred. So we are comparing that what
is the values of your true positive true
negative false positive and false
negative. So this is the values that
these are the true positive these are
true negative and these are the
mclassification values and if you want
to calculate the accuracy you can easily
do it by 65 + 24 divided by 65 + 24 + 8
+ 3. So all we are doing is we are
trying to identify what is the
percentage of correctly predicted
numbers and uh this is the code of
section. So it can do the prediction. If
you see what it has done, basically if
you look at the section that your
regression model has fit this line and
you can see it's a straight line and
that is why in some of the text logistic
regression is also being called a linear
classifier. And why it is called linear
classifier because it is predominantly
being made for fitting a linear
equation. The logistic regression
equation was y= a + b1 x1 b2 x2 and all
these coefficients and respective
inputs. But the highest power of your
inputs were one and you would already
know that if it's a polomial of power
one it stand for a straight line. So
that's why you can see a straight line.
There are ways some of you would argue
that you know we can fit a nonlinear
line with the help of logistic
regression. But you would also concede
that there are some tricks which we use
for creating nonlinear lines through
logistic regression. For example, you
introduce higher order polomials into
your model so that the separation
becomes nonlinear. These kind of
algorithms are really helpful only
solving the problems when the objects
are linearly separable. When the
separation between the objects is not
linearly separable, these kind of
algorithms are not very helpful and we
need to identify algorithms which can
fit in nonlinear hypothesis or
separation boundaries between different
classes. So let's take a use case to
understand that what are the simple
scenarios where unsupervised learning
can be used and how does it really work.
So let's take an example that we have
some housing data and housing data in
terms that what their locations are and
these white dots on the screen in the
blue background showcase that where
these homes are located and the
objective of education officer is that
he needs to find a few locations where
the schools can be set up and the
constraint is that student don't have to
travel much. So given this constraint in
mind the officer needs to decide the
location. There may be easily we can
identify if we are not using any
algorithm. So let's say if I know that
I'm an officer I need to open three
schools in the locality and I know the
information where the homes are located.
I can easily see okay probably this is
one location I'm just highlighting it
and the constraint I also mentioned that
student don't have to travel much. What
I mean is that if you open the school
here then everybody of you would say
that it's not a great location for a
school given that it's far away from the
population. So this is not the correct
location and from the perspective of
identifying the home probably these
three from a human intervention or or
like some of you has been given the task
without any algorithm you can decide
that if you set up the school most of
the students would be traveling less to
go to a school. So given this problem we
can easily see that we don't have a
dependent variable as such which is
telling us whether it's the correct
location or not. All we're doing is that
we have a number of locations which we
need to find schools for and then we
have home locations and based on the
distance of each home we need to
identify which be the proper distance
proper locations of these schools and
another thing which is coming from the
same logic that there are no predefined
classes of these locations. And one more
point if you would like to add and some
of you who have done the clustering or
the segmentation job in your respective
works that these numbers we say three or
four or five it's not predefined it is
most of the time given by the business
that how many clusters or segments they
would be looking for though there are
statistical ways of identifying that
which is the best number of clusters
should be but basically most of the time
it would be coming from somebody in the
business that okay I see that let's say
I was working for one of the Indian
telecom companies here quite a time back
and at that time their subscriber base
was around 300 million customers and
imagine that if you're trying to create
segments for this big a population and
if you create three or four clusters you
can easily understand that it would be
very difficult for marketing team or any
product team to design products for such
a big population. So though
statistically it may look that okay four
or five unique segments are there but
you end up creating lot of small small
segments and there may be a possibility
that you will be creating 20 or 30
segments for such a big population. So
my intent of saying this number that we
trying to identify three locations
within the population has to be decided
either by business or people like you
who have knowledge about the data as
well as that what kind of business they
are running and what is the final usage
of this segmentation exercise. So let's
see one way of doing our selection of
these school locations is like we have
already doing it. If you identify that
somebody looked at the homes and see the
densities where the density is high and
selected the home automatically but
there are algorithms also available to
do the task and I can give the name here
itself it's K K means algorithm and so
first we would like to understand that
how does an algorithm work if it needs
to identify which is the best location.
So if you're looking at my screen, let's
say our objective is that we need to
identify two locations first and we have
some data and it's scatter plot
available and we need to identify where
the school should be so that the
distance from home should be minimum if
that's our objective. So how we can do
it that let's say we randomly assign two
points from the existing data set and
actually easiest ways that you randomly
pick two numbers from your data set
itself and then what you do is you
assign these two selected points as
these are your cluster centroid. So this
is the center of your selected
population. And in the second step, so
once you have initialized these two
random points, then the next step is
that you measure the distance of all the
homes from the initial selected point.
So let's say you do the distance of this
home from this selected point and again
from this that for each house from these
randomly initialized point we measure
the distance from the selected point or
the initialized point and any home
location and see which distance is
minimum or which distance is less in
comparison to the other from the
selected initial point. So we can easily
see that this distance is smaller than
this distance and this point would be
assigned to this particular group. The
first initialization step is initialize
as many number of centroidid as many
clusters you need and in the second step
you do the cluster assignment and in
cluster assignment how it's been done is
that you measure the distance from these
initialized points and see wherever the
distance is minimum and then assign this
home to that particular segment or
cluster. So this exercise has been done
for each home. I'm just trying to show
for a couple of them. And based on the
distance, the assignment is complete. So
this color also signifies what we have
done is after measuring the distance for
each home from the initial points, we
have assigned these points to this
cluster and these blue points to second
cluster. And then once this assignment
is complete, it moves the centrid. So
what it'll be doing is it'll be taking
the center of all the selected points
and then it'll move the centrid from the
previous point to the next point based
on the new assignment which has already
been completed. And then what's been
done is the same exercise which was done
earlier in terms of cluster assignment
that we measure the distance of each
home from the centrid. So distance from
this centrid and this centrid and
wherever which minimum assign it to that
particular cluster and this process has
been repeated again for both the centrid
and once the distance has been measured
on the improved or changed centrid again
the assignment process has been started.
So once you have moved and then measure
the distance and then assignment also
changes like it was done in the previous
step and we continue this process till
the time we have reached a location or a
point where this change in assignment
have stopped completely. So once we have
reached this kind of place or this kind
of scenario where as many time you
measure the distance from the centrid to
the different points your centr does not
change. This exercise or this point is
called that your model has converged and
at that point you can say okay these all
group there is one group of these points
or these homes. So this is one cluster
and second one is this cluster. So this
is how K means work. It has wide variety
of applications. There is a function
available in scikitlearn library. You
can try implemented it. The intent of
showcasing you this example of
unsupervised learning was that we will
be having two algorithms which come from
unsupervised learning section of uh
machine learning and these would be your
restricted boltsman machines and
autoenccoders which work on a similar
methodology of unsupervised [snorts]
learning. So in a similar fashion like
we started discussing in the beginning
that where should be the location of
these schools. We can use a key means
algorithm and initialize three points
randomly and do this distance measure to
each home and assign the homes to a
cluster wherever the distance is minimum
and we continue this process of
measuring the distance and assigning it
to the cluster till the time these value
have been converged. The most important
task for any data scientist is not to
remember which library is required or
what are the codes. In my understanding,
the most important thing which data
scientist should remember is that once
you've been given a business problem,
first you should be able to understand
that what kind of problem it is. Whether
it's a problem of supervised learning or
it's an unsupervised learning. Given
it's a supervised learning problem,
whether it falls into the regression
type or a logistic regression type. If
you can make these decisions then for
implementing the algorithm you will find
lot of help. In fact scikitle learn
would have initial codes for almost
every algorithm. So you don't have to
remember line of code and algorithms.
All you should be able to do is once the
problem has been given to you should be
able to identify what kind of problem it
is. Most of the time in unsupervised
learning and specifically in C means
kind of models, we use this elbow method
as an indicator or help you understand
that what is probably a number we should
start with for starting the final
implementation of your model. So let me
give you an intuition how does it work?
SSD stands for sum of squared errors.
And what it means is actually if I go
back a little that suppose you have
identified these two clusters. So sum of
squared error would be that you take the
centrid and measure the distance for the
points which are associated with this
cluster. So you measure the distance for
each point in the orange group and sum
square all the distances and the same
exercise being done for the blue points
and whatever the total number comes in
after doing this exercise you will be
getting what is the total number of
squared errors and if you have two
clusters you would have some number and
just for intuition I'm saying that this
total sum of square is coming as 100 and
that's only for intuition and example
I'm taking this number to help you let's
say there was one more cluster somebody
identified here and all these three
points though it's it's blue in color
but I'm saying all these three point
belong to this particular segment and
rest of these points remained same as it
was previously and as we saw with two
clusters our sum of square was coming as
100 when we have three you can see that
these points are bit far off from this
particular cluster so if I'll be doing
it with three clusters this distance
would be a bit less given that now I
have a point which is closer to these
points And whatever error or distance
these three points were adding it would
be bit less given the cluster was here.
And let's say this distance goes down to
95. I'm just making up some numbers. So
probably what it is telling me that sum
of squar is going down and probably I'm
finding clusters which are closer or
more closer to the actual data points.
And as you would know that if I'll be
increasing the number of clusters in the
population, this distance would be going
down hopefully. And this distance can go
up to zero when every point become a
cluster itself. So if let's say I have
20 data points there and I assign that
every point is a cluster in itself then
just measuring the distance from the
point which would be zero and overall
SSD will become zero. So it may start
from a very high number but it will be
reducing with each cluster point or
cluster you will be adding to your data.
So this line which has sometime been
called the elbow method what it's
actually showing you. So if you had one
cluster only anywhere in the population
and you do some of the squared distances
this was the distance when you had two
clusters this was the distance when you
had three these were the distance when
you had four this was the distance but
when you had five the sum of squared
error did not reduce much. So if you see
it's like very less and after that even
though you keep on adding different
clusters the sum of squared errors is
not going down. So as I was saying this
process or this method is kind of
indicative method and it gives you an
intuition that if I have done it my
cluster analysis with different number
of clusters and I'm measuring the sum of
squared errors for given number of
clusters and I see that after four that
the sum of squared error is not going
down. It gives me an induction that
probably I have found clusters which are
more or less coherent and the population
is not very much away from the centroid.
From the point you can make it an
assumption that probably four clusters
is a good idea for my given population.
But as I also mentioned that it is just
a indicative process. It's a good
starting point. But you need to see that
how the distribution of your clusters
look like whether they solve the
business problem you're trying to solve
or not. and if not whether you need to
further divide the clusters which your
initial model has identified. And here
we'll be taking very quick introduction
to a third type of learning which is
called reinforcement learning. What it
actually is we have seen from the two
learning types the supervised learning
and unsupervised learning. The first one
was that we are trying to predict some
dependent variable. In the second one,
we are trying to identify some kind of
structure in the data set or if I put it
into other words that we are trying to
identify some kind of coherent groups in
the population. Third one is
reinforcement learning and it's
basically that an object or a system
learns from the environment and there is
no right or wrong answer given to the
system explicitly or in the beginning
itself like in the case of supervised
learning. Here the object would be
moving in the environment. Self flying
helicopters where they fly on its own
and they take the decision that what is
the wind speed and what is the pressure
around it and they correct their
procedure accordingly and the objective
they need to achieve is that they need
to fly for a longer period of time. So
here we are given an example. Let's say
we have a robotic dog and somebody needs
to train it to take correct decisions
and correct things would be that it
walking on the path where people needs
to walk and it's not going down from the
path and if some task is being given
it's working correctly. So there are two
components of reinforcement learning
which is called reward and penalty. If
the object or the system does the
correct thing it receives some reward in
terms of you know mathematical things.
Obviously we will be providing
everything in terms of mathematical
numbers and if it does the wrong thing
it receives a penalty and basis this
thing it'll keep on taking its
decisions. So like a dog if it's walking
correctly it receives the points like
ball is being thrown if the robotic dogs
go and pick it up it's a reward point.
If it doesn't do the correct thing it
receives a penalty. So most of like all
these reinforcement learning agents
working in a similar fashion. Some of
you who are interested in implementing
it, there is an algorithm called deep Q.
It's an algorithm where you can design
your own system and you can assign what
are the rewards and penalty. Similar
fashion, reinforcement learning is also
interacting with the space as I
mentioned. So self-driving car is also
one of the examples which would be
receiving rewards and penalties based on
whether it's running on the track,
taking the right turns and moving at the
correct speed, maintaining distance from
other cars which are running. So
reinforcement learning has a huge
implementation or requirement for
self-driving cars or some of the
components of it not all some of the
components also in self-driving car are
supervised learning for the point that
car needs to understand what the objects
are in front of it and all other objects
identification. So what are the real
limitations of machine learning? Given
that we already have all three type of
algorithms supervised, unsupervised and
reinforcement learning algorithm already
available. Then why we want a new
architecture or new type of algorithms
for our artificial intelligence systems.
First and foremost is the dimensions.
And when I say dimension, it's like the
type of data we get from lot of sources.
Let's say we receive images which is
gridlit like image. So where are the
pixels and what is the strength of
pixels in the image natural language
processing. So language data comes in a
different length and you know the work
is also different in the sense that
suppose you need to design a machine
learning algorithm which can do language
translation and if you conceptualize
this idea of language translation from a
machine learning algorithm perspective
your inputs become a sequence of words
and your output is also sequence of
word. And some of you who are working in
machine learning algorithm try thinking
that whether we have any algorithm
currently available like logistic
regression or decision tree which can
help me even fit the algorithm or fit
the problem. Leave aside how good the
accuracy would be and all but these
problems which come from a different
type of data source and we trying to
solve a different kind of problem like
language translation or chatbot kind of
problem where you give a sequence and it
returns you a sequence. So these kind of
architecture is already not available in
machine learning. So that is one of the
reasons that we need to identify some
algorithms which can deal with such data
sets like images and languages. And
second, it can also fit different kind
of models which are not only for
predicting or classifying but also give
you some kind of values like sequence I
take an example of. So that is one of
the reasons first we are looking for a
different type of architecture for
solving such problems. Then second
problem which machine learning
algorithms are not very good in dealing
is the dimensionality. So we would have
seen with a size of let's say thousand
variables and let's say 100,000 rows and
thousand columns probably you can still
fit some of the machine learning
algorithms on top of it. But given the
kind of problems we are dealing with
like images every image let's say it's
uh 200x 200 means 200 pixel by 200 pixel
and it's a colored image it means there
are three channels if you do the math
200 multiplied by 200 let's say this is
your image and it's 200 by 200 because
every image is kind of a matrix only and
if it's a colored image actually colored
image are being represented in system
through three channels red green and
blue so there would be three such grids
but one top of the other. So number of
pixels you need to have to represent
your image in the system or in your
algorithm would be 200x 200 by3 and then
you calculate how many features it would
be. If I if my math is correct it should
be like 120,000 features. So even a
simple image of such small dimensions
you end up getting 120,000 features. And
plus if you're really working on a
complex problem solving in terms of
let's say an object identification in
the images there may be five or six
objects which you need to identify and
you're dealing with let's say 100,000
images then your scale of data becomes
so huge for any machine learning
algorithm to easily handle it and your
machine learning algorithms fail in
terms of getting any interesting results
out of it. So coming to solution part
that we need an architecture which can
not only read such data in terms of
images but it is capable enough of
dealing with such huge dimensions of
data. So this is the second benefit
which comes from the deep learning
algorithms and we'll be discussing how
do they manage such high dimensional
data when we go and talk about different
architectures. And third and the most
important reason that we will be looking
for a different kind of model structure
or different kind of algorithm is for
identifying the features. So in machine
learning algorithms we as data
scientists spend lot of time in kind of
curating the important features. either
first you'll be scaling the features and
after scaling you'll be creating the
interaction variables then you'll be
creating if the separator is not very
clear then you need to introduce
highdimensional data let's say it's your
data point and if you see that line
you're fitting is not separating clearly
then some of you would be trying the
higher order polomials of your input
features so all such things which not
only difficult to you know come up there
is lot of trial and error that which
kind of transformation and which kind of
variable creation. So what kind of
variable will really work for
classification problem? That's first
thing. And second is if you're working
on higher order polomials, what is the
correct order of polomial I need to
create it. And just to give you the
scale of it, let's say you are dealing
with only 100 features and you need to
create second order polomial with
interaction of all these 100 features.
Then you'll end up getting around 5,000
features from the second order
polinomial only. If you want to get
third order polomial like cube variables
or or the interaction of three variables
together then these 100 variables will
come around 170,000 features. So this
creation of features is very very
difficult. If we go and start creating
these features on our own and our
objective let's say to identify a
television in the image and we have some
pictures where we need to identify even
though you have created those features
manually and some of you who are working
in the field of computer science for
quite some time would know that earlier
we used to use features like sift s if
par features and hog features but these
are like kind of static features for a
given object but we may argue that this
television is there in this picture here
but in other picture it can be somewhere
else. So the feature which I'm
identifying it has to be spatial
indifference that it can be anywhere in
the image and same goes for language
that if you're dealing with language
data it should be not only able to
understand the meaning of word or how
does the word fit into the sentence but
should also be able to understand that
what is the context of each word. But
these word embeddings neural network
help you understand it what are the
related word to a given word and from
that you make predictions. So these
broad problems of machine learning
algorithms one is they are not being
able to play with or deal with different
type of data like images and natural
language. Second is the dimensional
problem if the dimensionality goes in
like 100,000 features and all. And third
is this feature creation on its own. So
these are the three basic reasons that
one of you or all of you would be
interested in going to one of the deep
learning architecture for solving such
problems. And fourthly, if I may add it
that all the deep learning architectures
given that we are putting lot of
computational powers in them, they end
up giving you a better accuracy both for
classification and regression problems.
So that that's the fourth benefit. And
how does it really work? There are
different stages in a deep learning and
why they are called deep because it's
not just input and output like we have
seen in regression that you have a y and
x is some kind of linear equation here
we have different intermediate field
like but there are lot of intermediary
field for doing such complex
calculations so that all these features
which I mentioned that suppose you need
to identify a television all such
features get calculated at different
stages one After the other and final
stage you have very very refined
features not only for image we are
taking the image classification but any
problem we are trying to solve through
the multiple stages your model would be
able to learn these intelligent features
which are really important for your
classification or regression or any such
problem which we are trying to solve. So
these were the few benefits for deep
learning and these are actually the
broad reasons that somebody would be
interested in learning the deep learning
algorithms.
>> [music]
>> So this is the problem statement guys.
We need to figure out if the bank notes
are real or fake and for that we'll be
using artificial neural networks and
obviously we need some sort of data in
order to train our network. So let us
see how the data set looks like. So over
here I've taken a screenshot of the data
set with few of the rows in it. Data
were extracted from images that were
taken from genuine and forged
banknotelike specimens. After that,
wavelength transform tools were used to
extract features from those images. And
these are few features that I'm
highlighting with my cursor. And the
final column or the last column actually
represents the label. So basically label
tells us to which class that pattern
represents whether that pattern
represents a fake note or it represents
a real node. Let us discuss these
features and labels one by one. So the
first feature or the first column is
nothing but variance of wavelength
transformed image. The second column is
about skewess. The third is courtesis of
wavelength transformed image and finally
fourth one is entropy of the image.
After that when I talk about label which
is nothing but my last column over here
if the value is one that means the
pattern represents a real node whereas
when value is zero that means it
represents a fake node. So guys let's
move forward and we'll see what are the
various steps involved in order to
implement this use case. So over here
we'll first begin by reading the data
set that we have. We'll define features
and labels. After that we are going to
encode the dependent variable. And what
is a dependent variable? It is nothing
but your label. Then we are going to
divide the data set into two parts. One
for training, another for testing. After
that we'll use TensorFlow data
structures for holding features, labels,
etc. And TensorFlow is nothing but a
Python library that is used in order to
implement deep learning models or you
can say neural networks. Then we'll
write the code in order to implement the
model. And once this is done, we will
train our model on the training data.
We'll calculate the error. The error is
nothing but your difference between the
model output and the actual output and
we'll try to reduce this error and once
this error becomes minimum we'll make
prediction on the test data and we'll
calculate the final accuracy. So guys
let me quickly open my PyCharm and I'll
show you how the output looks like. So
this is my PyCharm guys over here I've
already written the code in order to
execute the use case. I'll go ahead and
run this and I'll show you the output.
So over here as you can see with every
iteration the accuracy is increasing. So
let me just stop it right here. All
right. Till now any questions any doubts
with respect to what is our use case?
What is the data set about? So we'll
move forward and we'll understand why we
need neural networks. So in order to
understand why we need neural networks,
we are going to compare the approach
before and after neural networks and
we'll see what were the various problems
that were there before neural networks.
So earlier conventional computers use an
algorithmic approach that is the
computer follows a set of instructions
in order to solve a problem and unless
the specific steps that the computer
needs to follow are known the computer
cannot solve the problem. So obviously
we need a person who actually knows how
to solve that problem and he or she can
provide the instructions to the computer
as to how to solve that particular
problem. Right? So we first should know
the answer to that problem or we should
know how to overcome that challenge or
problem which is there in front of us.
Then only we can provide instructions to
the computer. So this restricts the
problem solving capability of
conventional computers to problems that
we already understand and know how to
solve. But what about those problems
whose answer we have no clue of. So
that's where our traditional approach
was a failure. So that's why neural
networks were introduced. Now let us see
what was the scenario after neural
networks. So neural networks basically
process information in a similar way the
human brain does and these networks they
actually learn from examples. You cannot
program them to perform a specific task.
They will learn from their examples from
their experience. So you don't need to
provide all the instructions to perform
a specific task and your network will
learn on its own with its own
experience. All right. So this is what
basically neural network does. So even
if you don't know how to solve a
problem, you can train your network in
such a way that with experience it can
actually learn how to solve the problem.
So that was a major reason why neural
networks came into existence. So these
neural networks are basically inspired
by neurons which are nothing but your
brain cells and the exact working of the
human brain is still a mystery though.
So as I've told you earlier as well that
neural networks work like human brain
and so the name and similar to a newborn
human baby as he or she learns from his
or her experience we want a network to
do that as well but we wanted to do very
quickly. So here's a diagram of a
neuron. Basically a biological neuron
receives input from other sources
combines them in some way perform a
generally nonlinear operation on the
result and then outputs the final
result. So here if you notice these
dendrites these dendrites will receive
signals from the other neurons. Then
what will happen? It will transfer it to
the cell body. The cell body will
perform some function. It can be
summation can be multiplication. So
after performing that summation on the
set of inputs via exxon it is
transferred to the next neuron. Now
let's understand what exactly are
artificial neural networks. It is
basically a computing system that is
designed to simulate the way the human
brain analyzes and process the
information. Artificial neural networks
has self-arning capabilities that enable
it to produce better results as more
data becomes available. So if you train
your network on more data, it'll be more
accurate. So these neural networks, they
actually learn by example. And you can
configure your neural network for
specific applications. It can be pattern
recognition or it can be data
classification, anything like that. All
right. So because of neural networks we
see a lot of new technology has evolved
from translating web pages to other
languages to having a virtual assistant
to order groceries online to conversing
with chat bots. All of these things are
possible because of neural networks. So
in a nutshell if I need to tell you
artificial neural network is nothing but
a network of various artificial neurons.
All right. So let me show you the
importance of neural network with two
scenarios before and after neural
network. So over here we have a machine
and we have trained this machine on four
types of dogs as you can see where I'm
highlighting with my cursor and once the
training is done we provide a random
image to this particular machine which
has a dog but this dog is not like the
other dogs on which we have trained our
system on. So without neural networks
our machine cannot identify that dog in
the picture as you can see it over here.
Basically our machine will be confused.
It cannot figure out where the dog is.
Now when I talk about neural networks,
even if we have not trained our machine
on this specific dog, but still it can
identify certain features of the dogs
that we have trained on and it can match
those features with the dog that is
there in this particular image and it
can identify that dog. So this happens
all because of neural networks. So this
is just an example to show you how
important are neural networks. Now I
know you all must be thinking how neural
networks work. So for that we'll move
forward and understand how it actually
works. So over here I'll begin by first
explaining a single artificial neuron
that is called as perceptron. So this is
an example of a perceptron. Over here we
have multiple inputs x1 x2 dash till xn
and we have corresponding weights as
well. W1 for x1 w2 for x2 similarly wn
for xn. Then what happens? We calculated
the weighted sum of these inputs. And
after doing that we pass it through an
activation function. This activation
function is nothing but it provides a
threshold value. So above that value my
neuron will fire else it won't fire. So
this is basically an artificial neuron.
So when I talk about a neural network it
involves a lot of these artificial
neurons with their own activation
function and their processing element.
Now we'll move forward and we'll
actually understand various modes of
this perceptron or single artificial
neuron. So there are two modes in a
perceptron. One is training, another is
using mode. In training mode, the neuron
can be trained to fire for particular
input patterns. Which means that we'll
actually train our neuron to fire on
certain set of inputs and to not fire on
the other set of inputs. That's what
basically training mode is. When I talk
about using mode, it means that when a
tot input pattern is detected at the
input, its associated output becomes the
current output. Which means that once
the training is done and we provide an
input on which the neuron has been
trained on so it'll detect the input and
we'll provide the associated output. So
that's what basically using mode is. So
first you need to train it then only you
can use your perceptron or your uh
network. So these were the two modes
guys. Next up we'll understand what are
the various activation functions
available. So these are the three
activation functions although there are
many more but I've listed down three
step function. So over here the moment
your input is greater than this
particular value your neuron will fire
else it won't. Similarly for sigmoid and
sine function as well. So these are
three activation functions. There are
many more that I've told you earlier as
well. So these are the three majorly
used activation functions. Next up what
we are going to do we are going to
understand how a neuron learns from its
experience. So I'll give you a very good
analogy in order to understand that and
later on when we talk about neural
networks or you can say multiple neurons
in a network I'll explain you the maths
behind it. I'll explain you the math
behind learning how it actually happens.
So right now I'll explain you with an
analogy and guys trust me that analogy
is pretty interesting. So I know all of
you must have guessed it. So these are
two beer mugs and all of you who love
beer can actually relate to this analogy
a lot and I know most of you actually
love beer. So that's why I've chosen
this particular analogy so that all of
you can relate to it. All right, jokes
apart. So fine guys, so there's a beer
festival happening near your house and
you want to badly go there. But your
decision actually depends on three
factors. First is how is the weather,
whether it is good or bad. Second is
your wife or husband is going with you
or not. And the third one is any public
transport is available. So on these
three factors, your decision will depend
whether you'll go or not. So we'll
consider these three factors as inputs
to our perceptron and we'll consider our
decision of going or not going to the
beer festival as our output. So let us
move forward with that. So we'll move
forward and we'll see what are the
various inputs that I'm talking about.
So the first input is how is the
weather? We'll consider it as x1. So
when weather is good it'll be one and
when it is bad it'll be zero. Similarly
your wife is going with you or not. So
that be your x2. If she is going then
it's one. If she's not going then it's
zero. Similarly for public transport if
it is available then it is one else it
is zero. So these are the three inputs
that I'm talking about. Let's see the
output. So output will be one when
you're going to the beer festival and
output will be zero when you want to
relax at home. You want to have beer at
home only. You don't want to go outside.
So these are the two outputs whether you
are going or you're not going. Now what
a human brain does over here. Okay fine
I need to go to the beer festival but
there are three things that I need to
consider. But will I give importance to
all these factors equally? Definitely
not. There'll be certain factors which
will be of higher priority for me. I'll
focus on those factors more. Whereas few
factors won't affect that much to me.
All right. So let's prioritize our
inputs or factors. So here our most
important factor is weather. So if
weather is good, I love beer so much
that I don't care even if my wife is
going with me or not or if there is a
public transport available. So I love
beer that much that if weather is good
that definitely I'm going there. That
means when x1 is high output will be
definitely high. So how we do that? How
we actually prioritize our factors or
how we actually give importance more to
a particular input and less to another
input in a perceptron or in a neuron. So
we do that by using weights. So we
assign high weights to the more
important factors or more important
inputs and we assign low weights to
those particular inputs which are not
that important for us. So let's assign
weights guys. So weight w is associated
with input x1, w2 with x2 and similarly
w3 with x3. Now as I've told you earlier
as well that weather is a very important
factor. So I'll assign a pretty high
weight to weather and I'll keep it as
six. Similarly w2 and w3 are not that
important. So I'll keep it as 22. After
that I've defined a threshold value as
five which means that when the weighted
sum of my input is greater than five
then only my neuron will fire or you can
say then only I'll be going to the
bfest. All right. So I'll use my pen and
we'll see what happens when weather is
good. So when weather is good, our x1 is
1. Our weight is six. We'll multiply it
with six. Then
if my wife decides that she is going to
stay at home and she will probably be
busy with cooking and she doesn't want
to drink beer with me, so she's not
coming. So that input becomes zero. 0
into 2 will actually make no difference
because it'll be zero. Then again there
is no public transport available also.
Then also this will be 0 into 2.
So what output I get here? I get here as
six.
And notice the threshold value it is
five. So definitely six is greater than
five.
That means my output
will be [snorts] one or you can say my
neuron will fire or I'll actually go to
the beer festival. So even if these two
inputs are zero for me that means my
wife is not willing to go with me and
there is no public transport available
but weather is good which has very high
weight value and it actually matters a
lot to me. So if that is high it doesn't
really matter whether the two inputs are
high or not I will definitely go to the
BF festival. All right now I'll explain
you a different scenario. So over here
our threshold was five but what if I
change this threshold to three. So in
that scenario even if my weather is not
good uh I'll give it the zero. So 0 into
6 but my wife and public transport both
are available. All right. So 1 into 2 +
1 into 2
which is equal to 4 and it is definitely
greater than three.
Then also my output will be one. that
means I will definitely go to the beer
festival even if weather is bad and my
neuron will fire. So these are the two
scenarios that I have discussed with
you. All right. So there can be many
other ways in which you can actually
assign weight to your uh problem or to
your learning algorithm. So these are
the two ways in which you can assign
weights and prioritize your inputs or
factors on which your output will
depend. So obviously in real life all
the inputs or all the factors are not as
important for you. So you actually
prioritize them and how you do that in
perceptron you provide high weight to
it. This is just an analogy so that you
can relate to a perceptron to a real
life. We'll actually discuss the maths
behind it later in the session as to how
a network or a neuron learns. All right.
So how the weights are actually updated
and how the output is changing that all
those things we'll be discussing later
in the session. But my aim is to make
you understand that you can actually
relate to a real life problem with that
of a perceptron. All right? And in real
life problems are not that easy. They
are very very complex problems that we
actually face. So in order to solve
those problems, a single neuron is
definitely not enough. So we need
networks of neuron and that's where
artificial neural network or you can say
multi-layer perceptron comes into the
picture. Now let us discuss that
multi-layer perceptron or artificial
neural network. So this is how an
artificial neural network actually looks
like. So over here we have multiple
neurons in present in different layers.
The first layer is always your input
layer. This is where you actually feed
in all of your inputs. Then we have the
first hidden layer. Then we have second
hidden layer and then we have the output
layer. Although the number of hidden
layers depend on your application on
what are you working what is your
problem. So that actually determines how
many hidden layers you'll have. So let
me explain you what is actually
happening here. So you provide in some
input to the first layer which is
nothing but your input layer. You
provide inputs to these neurons. All
right? And after some function the
output of these neurons will become the
input to the next layer which is nothing
but your hidden layer one. Then these
hidden layers also have various neurons.
These neurons will have different
activation functions. So they'll perform
their own function on the inputs that it
receives from the previous layer. And
then the output of this layer will be
the input to the next hidden layer which
is hidden layer 2. Similarly, the output
of this hidden layer will be the input
to the output layer and finally we get
the output. So this is how basically an
artificial neural network looks like.
Now let me explain you this with an
example. So over here I'll take an
example of image recognition using
neural networks. So over here what
happens? We feed in a lot of images to
our input layer. Now this input layer
will actually detect the patterns of
local contrast and then we'll feed that
to the next layer which is hidden layer
one. So in this hidden layer one the
face features will be recognized we'll
recognize eyes nose ears things like
that and then that will be again fed as
input to the next hidden layer and in
this hidden layer we'll assemble those
features and we'll try to make a face
and then we'll get the output that is
the face will be recognized properly. So
if you notice here with every layer we
are trying to get a more abstract
version or the generalized version of
the input. So this is now basically an
artificial neural network how it works.
All right. And there's a lot of training
and learning which is involved that I'll
show you now training a neural network.
So how we actually train our neural
network. So basically the most common
algorithm for training a network is
called back propagation. So what happens
in back propagation after the weighted
sum of inputs and passing through an
activation function and getting the
output. We compare that output to the
actual output that we already know. We
figure out how much is the difference.
We calculate the error and based on that
error what we do we propagate backwards
and we'll see what happens when we
change the weight will the error
decrease or will it increase and if it
increases when it increases by
increasing the value of the variables or
by decreasing the value of variables. So
we kind of calculate all those things
and we update our variables in such a
way that our error becomes minimum and
it takes a lot of iterations. Trust me
guys it takes a lot of iterations. We
get output a lot of times and then we
compare it with the model with the
actual output. Then again we propagate
backwards. We change the variables and
again we calculate the output. We
compare it again with the desired output
of the actual output. Then again we
propagate backwards. So this process
keeps on repeating until we get the
minimum value. All right. So there's an
example that is there in front of your
screen. Don't be scared of the terms
that I used. I'll actually explain you
with an example. So this is the example
over here. We have 0 1 and two as
inputs. And our desired output or the
output that we already know is 0 1 and
4. All right. So over here we can
actually figure out that desired output
is nothing but twice of your input. But
I'm training a computer to do that.
Right? The computer is not a human. So
what happens? I actually initialize my
weight. I keep the value as three. So
the model output will be 3 into 0 is 0.
3 into 1 is 3. 3 into 2 is 6. Now
obviously it is not equal to your
desired output. So we check the error.
Now the error that we have got here is 0
1 and 2 which is nothing but your
difference. So 0 - 0 is 0 3 - 2 is 1 6 -
4 is 2. Now this is called an absolute
error. After squaring this error we get
square error which is nothing but 0 1
and 4. All right. So now what we need to
do we need to update the variables. We
have seen that the output that we got is
actually different from the desired
output. So we need to update the value
of the weight. So instead of three our
computer makes it as four. After making
the value as four, we get the model
output as 0 4 and 8. And then we saw
that the error has actually increased.
Instead of decreasing, the error has
increased. So after updating the
variable, the error has increased. So
you can see that square error is now 0 4
and 16. And earlier it was 0 1 and 4.
That means we cannot increase the weight
value right now. But if we decrease that
make it as two, we get the output which
is actually equal to desired out. But is
it always the case that we need to only
decrease the weight? Definitely not. So
in this particular scenario, whenever
I'm increasing the weight, error is
increasing and when I'm decreasing the
weight, error is decreasing. But as I've
told you earlier as well, this is not
the case every time. Sometimes you need
to increase the weight as well. So how
we determine that? All right. Fine guys,
this is how basically a computer decide
whether it has to increase the weight or
decrease the weight. So what happens
here? This is a graph of square error
versus weight. So over here what
happens? Suppose your square error is
somewhere here and your computer it
starts increasing the weight in order to
reduce the square error and it notices
that whenever it increases the weight
square error is actually decreasing. So
it'll keep on increasing until the
square error reaches a minimum value and
after that when it tries to still
increase the weight the square error
will increase. So at that time our
network will recognize that whenever it
is increasing the weight after this
point error is increasing. So therefore
it will stop right there and that will
be our weight value. Similarly there can
be one more scenario. Suppose if we
increase the weight but then also the
square error is increasing. So at that
time we cannot increase the weight. At
that time computer will realize okay
fine whenever I'm increasing the weight
the square error is increasing. So it'll
go in the opposite direction. So it'll
start decreasing the weight and it'll
keep on doing that until the square
error becomes minimum. And the moment it
decreases more the square error again
increases. So our network will know that
whenever it decreases the weight value
the square error is increasing. So that
point will be our final weight value. So
guys this is what basically back
propagation in a nutshell is. If you
have any questions or doubts you can go
ahead and ask me. All right fine we have
no doubts here. Fine. So we'll move
forward and now is the correct time to
understand how to implement the use case
that I was talking about at the
beginning. That is how to determine
whether a node is fake or real. So for
that I'll open my PyCharm. This is my
PyCharm again guys. Uh let me just close
this. All right. So this is the code
that I've written in order to implement
the use case. So over here what we do we
import the first important libraries
which are required. Mattplot lab is used
for visualization. TensorFlow we know in
order to implement the neural networks.
Numpy for arrays pandas for reading the
data set. Similarly sklearn for label
encoding as well as for shuffing and
also to split the data set into training
and testing task. All right fine guys.
So we'll begin by first reading the data
set as I've told you earlier as well
when I was explaining the steps. So what
I'll do I'll use pandas in order to read
the CSV file which has the data set.
After that I'll define features and
labels. So x will be my feature and y
will contain my label. So basically x
includes all the columns apart from the
last column which is the fifth one. And
because the indexing starts from zero
that's why we have written zero till
fourth. So it won't include the fourth
column. All right. And so our last
column will actually be our label. Then
what we need to do, we need to encode
the dependent variable. So the dependent
variable as I've told you earlier as
well is nothing but your label. So I've
discussed encoding in TensorFlow
tutorial. You can go through it and you
can actually get to know why and how we
do that. Then what we have done, we have
read the data set. Then what we need to
do is to split our data set into
training and testing. And these are all
optional steps. You can print the shape
of your training and test data. If you
don't want to do it, it's still fine.
Then we have defined learning rate. So
learning rate is actually the steps in
which the weights will be updated. All
right. So that is what basically
learning rate is. Then when we talk
about epoch means iterations. Then we
have defined cost history that will be
an empty numpy array and it shape will
be one and it'll include the flow type
objects. Then we have defined end which
is nothing but your x shape of axis one
which means your column. Then we'll
print that. After that we have defined
the number of classes. So there can be
only two class whether the node can be
fake or it can be real. And this model
path I've given in order to save my
model. So I've just given a path where I
need to save it. So I'll just save it
here only in the current working
directory. Now is a time to actually
define our neural network. So we'll
first make sure that we have defined the
important parameters like hidden layers,
number of neurons in hidden layers. So
I'll take 10 neurons in every hidden
layer and I'm taking four layers like
that. Then x will be my placeholder and
the shape of this particular placeholder
is none, n dim. n dim value. I'll get it
from here and none can be any value.
I'll define one variable w and I'll
initialize it with zeros and this will
be the shape of my weight. Similarly for
bias as well. This will be the
particular shape and there will be one
more placeholder ydash which will
actually be used in order to provide us
with the actual output of the model.
There'll be one model output and
there'll be one actual output which we
use in order to calculate the
difference. Right? So we'll feed in the
actual values of the labels in this
particular placeholder ydash. And now
we'll define the model. So over here we
have name the function as multi-layer
perceptron. And in it we'll first define
the first layer. So the first hidden
layer and we are going to name it as
layer_1 which will be nothing but the
matrix multiplication of x and weights
of h1 that is the hidden layer 1 and
that'll be added to your biases b1 after
that we'll pass it through a sigmoid
activation function. Similarly in layer
2 as well matrix multiplication of layer
1 and weights of h2. So if you can
notice layer 1 was the network layer
just before the layer two right. So the
output of this layer 1 will become input
to the layer 2 and that's why we have
written layer_1 it'll be multiplied by
weight h2 and then we'll add it with the
bias. Similarly for this particular
hidden layer as well and this particular
layer as well but over here we are going
to use rail activation function instead
of sigmoid. Then we are going to define
the weights and biases. So this is how
we basically define weights. This is how
we basically define weights. So weights
h1 will be a variable which will be a
truncated normal with the shape of n dim
and n hidden_1. So these are nothing but
your shapes. All right. And after that
what we have done we have defined biases
as well. Then we need to initialize all
the variables. So all these things
actually I've discussed in brief when I
was talking about tensorflow. So you can
go through tensorflow tutorial at any
point of time if you have any question.
We have discussed everything there.
Since in tensorflow we need to
initialize the variables before we use
it. So that's how we do it. We first
initialize it and then we need to run
it. That's when your variables will be
initialized. After that we are going to
create a saver object and then finally
I'm going to call my model and then
comes the part where the training
happens. Cost function. Cost function is
nothing but you can say an error that
will be calculated between the actual
output and the model output. All right.
So y is nothing but our model output and
ydash is nothing but actual output or
the output that we already know. All
right. And then we are going to use a
gradient descent optimizer to reduce
error. Then we are going to create a
session object as well. And finally what
we are going to do we are going to run
the session. So this is how we basically
do that. For every epoch we will be
calculating the change in the error as
well as the accuracy that comes after
every epoch on the training data. After
we have calculated the accuracy on the
training data, we going to plot it for
every epoch how the accuracy is. And
after plotting that we going to print
the final accuracy which will be on our
test data. So using the same model we'll
make prediction on the test data and
after that we are going to print the
final accuracy and the mean squared
error. So let's go ahead and execute
this guys.
All right. So training is done and this
is the graph we have got for accuracy
versus epoch. This is accuracy. Y-axis
represents accuracy whereas this is
epox. We have taken 100 epochs and our
accuracy has reached somewhere around
99%. So with every epoch it is actually
increasing apart from a couple of
instances it is actually keep on
increasing. So the more data you train
your model on it'll be more accurate.
Let me just close it. So now the model
has also been saved where I wanted it to
be. This is my final test accuracy and
this is the mean squared error. All
right. So these are the files that will
appear once you save your model. These
are the four files that I've
highlighted. Now what we need to do is
restore this particular model and I've
explained this in detail how to restore
a model that you have already saved. So
over here what I'll do I'll take some
random range. I've taken it actually
from 754 to 768. So all the values in
the row of 754 and 768 will be fed to
our model and our model will make
prediction on that. So let us go ahead
and run this.
So when I'm restoring my model, it seems
that my model is 100% accurate for the
values that I have fed in. So whatever
values that I have actually given as
input to my model, it has correctly
identified its class whether it's a fake
node or a real node because zero stands
for fake node and one stands for real
node. Okay. So original class is nothing
but which is there in my data set. So it
is zero already and what prediction my
model has made is zero. That means it is
fake. So accuracy becomes 100%.
Similarly for other values as well.
Fine guys. So this is how we basically
implement the use case that we saw in
the beginning. So in the slide you can
notice that I've listed out only two
applications although there are many
more. So neural networks in medicine.
Artificial neural networks are currently
a very hot research area in medicine and
it is believed that they will receive
extensive application to biomedical
systems in the next few years and
currently the research is mostly on
modeling parts of human body and
recognizing diseases from various scans.
For example, it can be cardiograms, CAT
scans, ultrasonic scans etc. And
currently the research is going mostly
on two major areas. First is modeling
and diagnosing the cardiovascular
system. So neural networks are used
experimentally to model the human
cardiovascular system. Diagnosis can be
achieved by building a model of the
cardiovascular system of an individual
and comparing it with the real-time
physiological measurements taken from
the patient. And trust me guys, if this
routine is carried out regularly,
potential harmful medical conditions can
be detected at an early stage and thus
make the process of combating disease
much easier. Apart from that it is
currently being used in electronic noses
as well. Electronic noses have several
potential applications in tele medicine.
Now let me just give you an introduction
to tele medicine. Tele medicine is a
practice of medicine over long distance
via a communication link. So what the
electronic noses will do? They would
identify odors in the remote surgical
environment. These identified odors
would then be electronically transmitted
to another site where an door generation
system would recreate them. Because the
sense of [snorts] the smell can be an
important sense to the surgeon. Teley
smell would enhance teleresent surgery.
So these are the two ways in which you
can use it in medicine. You can use it
in business as well guys. So business is
basically a diverted field with several
general areas of specialization such as
accounting or financial analysis. Almost
any neural network application would fit
into one business area or financial
analysis. Now there is some potential
for using neural networks for business
purposes including resource allocation
and scheduling. I have listed down two
major areas where it can be used. One is
marketing. So there is a marketing
application which has been integrated
with a neural network system. The
airline marketing tactician is a
computer system made of various
intelligent technologies including
expert systems. A feed forward neural
network is integrated with the AMT which
is nothing but airline marketing
tactician and was trained using back
propagation to assist the marketing
control of airline seat location. So it
has wide applications in marketing as
well. Now the second area is credit
evaluation. Now I'll give you an example
here. The HNC company has developed
several neural network applications and
one of them is a credit scoring system
which increases the profitability of
existing model up to 27%. So these are
few applications that I'm telling you
guys neural network is actually the
future. People are talking about neural
networks everywhere and especially after
the introduction of GPUs and the amount
of data that we have now neural network
is actually spreading like plague right
now.
What is KN&N algorithm? Well, K nearest
neighbor is a simple algorithm that
stores all the available cases and
classify the new data or case based on a
similarity measure. It suggests that if
you are similar to your neighbors, then
you are one of them, right? For example,
if apple looks more similar to banana,
orange or melon rather than a monkey,
rat or a cat, then most likely apple
belong to the group of fruits. All
right? Well, in general, KN&N is used in
search application where you're looking
for similar items. That is when your
task is some form of find items similar
to this one. Then you call this search
as a KN&N search. But what is this K in
KN&N? Well, the K denotes the number of
nearest neighbor which are voting class
of the new data or the testing data. For
example, if K equal 1, then the testing
data are given the same label as the
closest example in the training set.
Similarly, if K equal 3, the labels of
the three closest classes are checked
and the most common label is assigned to
the testing data. So this is what a K
and KN&N algorithm means. So moving on
ahead, let's see some of the example of
scenarios where KNN is used in the
industry. So let's see the industrial
application of KN&N algorithm starting
with recommended system. Well, the
biggest use case of KN&N search is a
recommended system. This recommended
system is like an automated form of a
shop counter guy. When you ask him for a
product, not only shows you the product
but also suggests you or displays your
relevant set of products which are
related to the item you're already
interested in buying. This KN&N
algorithm applies to recommending
products like an Amazon or for
recommending media like in case of
Netflix or even for recommending
advertisement to display to a user. If
I'm not wrong, almost all of you must
have used Amazon for shopping. Right? So
just to tell you more than 35% of
Amazon.com's revenue is generated by its
recommendation engine. So what's their
strategy? Amazon uses recommendation as
a targeted marketing tool in both the
email campaigns and on most of its
website pages. Amazon will recommend
many products from different categories
based on what you are browsing and it
will pull those products in front of you
which you are likely to buy like the
frequently bought together option that
comes at the bottom of the product page
to tempt you into buying the combo.
Well, this recommendation has just one
main goal that is increase average order
value or to upsell and cross-ell
customers by providing product
suggestion based on items in the
shopping cart or based on the product
they are currently looking at on site.
So, next industrial application of KN&N
algorithm is concept search or searching
semantically similar documents and
classifying documents containing similar
topics. So, as you know the data on the
internet is increasing exponentially
every single second. There are billions
and billions of documents on the
internet. Each document on the internet
contains multiple concepts that could be
a potential concept. Now there's a
situation where the main problem is to
extract concept from a set of documents
as each page could have thousands of
combination that could be potential
concepts. An average document could have
millions of concept. Combine that with
the vast amount of data on the web.
Well, we are talking about an enormous
amount of data set and sample. So what
we need here? We need to find a concept
from the enormous amount of data set and
samples. Right? So for this purpose,
we'll be using KN&N algorithm. More
advanced example could include
handwriting detection like an OCR or
image recognization or even video
recognization.
All right. So now that you know various
use cases of KN algorithm, let's proceed
and see how does it work. So how does a
KN algorithm work? Let's start by
plotting these blue and orange point on
our graph. So these blue points they
belong to class A and the orange ones
they belong to class B. Now you get a
star as a new point and your task is to
predict whether this new point it
belongs to class A or it belongs to
class B. So to start the prediction the
very first thing that you have to do is
select the value of K. Just as I told
you K in KN&N algorithm refers to the
number of nearest neighbors that you
want to select. For example, in this
case K equal 3. So what does it mean? It
means that I'm selecting three points
which are the least distance to the new
point or you can say I'm selecting three
different points which are closest to
the star. Well, at this point of time
you can ask how will you calculate the
least distance. So once you calculate
the distance you'll get one blue and two
orange points which are closest to the
star. Now since in this case as we have
a majority of orange points so you can
say that for k= 3 the star belongs to
class b or you can say that the star is
more similar to the orange points.
Moving on ahead well what if k equal to
6. Well for this case you have to look
for six different points which are
closest to the star. So in this case
after calculating the distance we find
that we have four blue points and two
orange point which are closest to the
star. Now as you can see that the blue
points are in majority. So you can say
that for k equals 6 the star belongs to
class a or the star is more similar to
blue points. So by now I guess you know
how a kan algorithm work and what is the
significance of k in kan algorithm. So
how will you choose the value of k. So
keeping in mind this k is the most
important parameter in kan algorithm. So
let's see when you build a k nearest
neighbor classifier how will you choose
a value of k? Well, you might have a
specific value of K in mind or you could
divide up your data and use something
like cross validation technique to test
several values of K in order to
determine which works best for your
data. For example, if n equal thousand
cases, then in that case the optimal
value of k lies somewhere in between 1
to 19. But yes, unless you try it, you
cannot be sure of it. So you know how
the algorithm is working on a higher
level. Let's move on and see how things
are predicted using Kenan algorithm.
Remember I told you the KN&N algorithm
uses the least distance measure in order
to find its nearest neighbors. So let's
see how these distances calculated.
Well, there are several distance measure
which can be used. So to start with,
we'll mainly focus on ukidian distance
and Manhattan distance in this session.
So what is this ukidian distance? Well,
this ukidian distance is defined as the
square root of the sum of difference
between a new point X and an existing
point Y. So for example, here we have
point P1 and P2. Point P1 is 111 and
point P2 is 54. So what is the ukidian
distance between both of them? So you
can say that ukidian distance is the
direct distance between two points. So
what is the distance between the point
P1 and P2? So we can calculate it as 5 -
1 square + 4 - 1 square and we can root
it over which results to 5. So next is
the Manhattan distance. Well, this
Manhattan distance is used to calculate
the distance between real vector using
the sum of their absolute difference. In
this case, the Manhattan distance
between the point P1 and P2 is mod of 5
- 1 plus mod value of 4 - 1 which
results to 3 + 4 that is 7. So this
slide shows the difference between
ukidian and Manhattan distance from
point A to point B. So ukidian distance
is nothing but the direct or the least
possible distance between A and B.
Whereas the Manhattan distance is a
distance between A and B measured along
the axis at right angle. Let's take an
example and see how things are predicted
using Kenan algorithm or how the Kenan
algorithm is working. Suppose we have a
data set which consists of height,
weight and t-shirt size of some
customers. Now when a new customer come,
we only have his height and weight as
the information. Now our task is to
predict what is the t-shirt size of that
particular customer. So for this we'll
be using the KN&N algorithm. So the very
first thing what we need to do we need
to calculate the ukidian distance. So
now that you have a new data of height
161 cm and weight as 61 kg. So the very
first thing that we'll do is we'll
calculate the ukidian distance which is
nothing but the square root of 161 minus
158 square + 61 - 58 whole square and
square root of that is 4.24. Let's drag
and drop it. So these are the various
ukidian distance of other points. Now
let's suppose K equal to 5. Then the
algorithm what it does? It searches for
the five customer closest to the new
customer. That is most similar to the
new data in terms of its attribute. For
K equal 5. Let's find the top five
minimum ukidian distance. So these are
the distance which we are going to use 1
2 3 4 and 5. So let's rank them in the
order. First, this is second. This is
third. Then this one is fourth. And
again this one is five. So this is our
order. So for k equal 5 we have four
t-shirts which come under size m and one
t-shirt which comes under size L. So
obviously best guess or the best
prediction for the t-shirt size of
height 161 cm and weight 61 kg is M or
you can say that our new customer fit
into size M. Well, this was all about
the theoretical session. But before we
drill down to the coding part, let me
just tell you why people call KNN as a
lazy learner. Well, KNN for
classification is a very simple
algorithm. But that's not why they are
called lazy. KN&N is a lazy learner
because it doesn't have a discriminative
function from the training data. But
what it does it memorizes the training
data. There is no learning phase of the
model and all of the work happens at the
time a prediction is requested. So as
such this is the reason why KN&N is
often referred to as lazy learning
algorithm. So this was all about the
theoretical session. Now let's move on
to the coding part. So for the practical
implementation of the hands-on part,
I'll be using the iris data set. So this
data set consists of 150 observation. We
have four features and one class label.
The four features include the sephil
length, the sele, petal length and the
petal width. Whereas the class label
decides which flower belongs to which
category. So this was the description of
the data set which we are using. Now
let's move on and see what are the
step-by-step solution to perform a CANN
algorithm. So first we'll start by
handling the data. What we have to do?
We have to open the data set from the
CSV format and split the data set into
train and test part. Next, we'll check
the similarity where we have to
calculate the distance between two data
instances. Once we calculate the
distance, next we'll look for the
neighbor and select K neighbors which
are having the least distance from a new
point. Now once we get our neighbor,
then we'll generate a response from a
set of data instances. So this will
decide whether the new point belongs to
class A or class B. Finally, we'll
create the accuracy function and in the
end we'll tie it all together in the
main function. So let's start with our
code for implementing KN algorithm using
Python. I'll be using Jupyter notebook,
Python 3.0 installed on it. Now let's
move on and see how Ken algorithm can be
implemented using Python. So there's my
Jupyter notebook which is a web- based
interactive computing notebook
environment with Python 3.0 installed on
it. Let's set the launch. Yeah, it's
launching. So there's our Jupyter
notebook and we'll be writing our Python
codes on it. So the first thing that we
need to do is load our file. Our data is
in CSV format without a header line or
any code. We can open the file the open
function and read the data line using
the reader function in the CSV module.
So let's write a code to load our data
file. Let's execute the run button. So
once you execute the run button, you can
see the entire training data set as the
output. Next, we need to split the data
into a training data set that Kenan can
use to make prediction and a test data
set that we can use to evaluate the
accuracy of the model. So, we first need
to convert the FL measure that will
loaded as string into numbers that we
can work. Next, we need to split the
data set randomly into train and test. A
ratio of 67 is to 33 for test is to
train is a standard ratio which is used
for this purpose. So let's define a
function as load data set that loads a
CSV with a provided file name and split
it randomly into training and test data
set using the provided split ratio. So
this is our function load data set which
is using file name split ratio, training
data set and testing data set as its
input. All right. So let's execute the
run button and check for any errors. So
it's executed with zero errors. Let's
test this function. So this is our
training set, testing set, load data
set. So this is our function load data
set and inside that we are passing our
file iris data with a split ratio of
0.66 and training data set and test data
set. Let's see what our training data
set and test data set it's dividing
into. So it's giving a count of training
data set and testing data set. The total
number of training data set it has split
into is 97 and total number of test data
set we have is 53. So total number of
training data set we have here is 97 and
total number of test data set we have
here is 53. All right. Okay. So our
function load data set is performing
well. So let's move ahead to step two
which is similarity. So in order to make
prediction we need to calculate the
similarity between any two given data
instances. This is needed so that we can
locate the ko similar data instances in
the training data set and in turn make a
prediction. Given that all four FL
measurement are numeric and have same
unit. We can directly use the ukian
distance measure. This is nothing but
the square root of the sum of squared
differences between two arrays of the
number. Given that all the four flower
measurements are numeric and have same
unit, we can directly use the ukidian
distance measure which is nothing but
the square root of the sum of square
difference between two arrays of the
number. Additionally, we want to control
which field to include in the distance
calculation. So specifically we only
want to include first four attribute. So
our approach will be to limit the
ukidian distance to a fixed length. All
right. So let's define our ukidian
function. So this is a ukidian distance
function which takes instance one,
instance two and length as parameters.
Instance one and instance two are the
two points of which you want to
calculate the ukidian distance. Whereas
this length and denote that how many
attributes you want to include. Okay. So
there's our ukidian function. Let's
execute it. It's executing fine without
any errors. Let's test the function.
Suppose the data 1 or the first instance
consists of the data point as 222 and it
belongs to class A and data 2 consist of
444 and it belongs to class B. So when
we calculate the ukidian distance of
data 1 to data 2 and what we have to do
we have to consider only first three
features of them. All right. So let's
print the distance. As you can see here
the distance comes out to be 3.464. All
right. So this is nothing but the square
root of 4 - 2 square. So this distance
is nothing but the ukidian distance and
it is calculated as square roo<unk> of 4
- 2 square + 4 - 2 square that is
nothing but 3 * of 4 - 2 square that is
12 and square root of 12 is nothing but
3.464. All right. So now that we have
calculated the distance now we need to
look for k nearest neighbors. Now that
we have a similarity measure we can use
it to collect the ko similar instances
for a given unseen instance. Well, this
is a straightforward process of
calculating the distance for all the
instances and selecting a subset with
the smallest distance value. And now
what we have to do, we have to select
the smallest distance values. So for
that, we'll be defining a function as
get neighbors. So for that what we'll be
doing, we'll be defining a function as
get neighbors. What it will do? It will
return the k most similar neighbors from
the training set for a given test
instance. All right. So this is how our
get neighbors function look like. It
takes training data set and test
instance and K as its input. uh the K is
nothing but the number of nearest
neighbor you want to check for. All
right. So basically what you'll be
getting from this get neighbors function
is K different points having least
uklidian distance from the test
instance. All right. Let's execute it.
So the function executed without any
errors. So let's test our function. So
suppose the training data set includes
the data like 222 and it belongs to
class A and other data includes 444 and
it belongs to class B and our testing
instance is 555. And now we have to
predict whether this test instance
belongs to class A or it belongs to
class B. All right. For K equal 1, we
have to predict its nearest neighbor and
predict whether this test instance it
will belong to class A or will it belong
to class B. All right. So let's execute
the run button. All right. So on
executing the run button, you can see
that we have output as 444 and B. Our
new instance 555 is closest to 444 which
belongs to class B. All right. Now once
you have located the most similar
neighbor for a test instance, next task
is to predict a response based on those
neighbors. So how we can do that? Well,
we can do this by allowing each neighbor
to vote for their class attribute and
take the majority vote as a prediction.
Let's see how we can do that. So we have
a function as get response which takes
neighbors as the input. Well, this
neighbor was nothing but the output of
this get neighbor function. The output
of get neighbor function will be fed to
get response. All right, let's execute
the run button. It's executed. Let's
move ahead and test our function get
response. So we have a neighbor as 111.
It belongs to class A. 222 it belongs to
class A. 333 it belongs to class B. So
this response what it will do it will
store the value of get response by
passing this neighbor value. All right.
So what we want to check is we want to
predict whether our test instance 555 it
belongs to class A or class B when the
neighbors are 111 A 222 A and 333 B. So
let's check our response. Now that we
have created all the different function
which are required for a Kenan
algorithm. So important main concern is
how to evaluate the accuracy of the
prediction. An easy way to evaluate the
accuracy of the model is to calculate a
ratio of the total correct prediction to
all the prediction made. So for this
I'll be defining a function as get
accuracy and inside that I'll be passing
my test data set and the predictions get
accuracy function. Check it executed
without any error. Let's check it for a
sample data set. So we have our test
data set as 111 which belongs to class A
22 which again belongs to class 333
which belongs to class B. And my
predictions is for first test data it
predicted that it belongs to class A
which is true. For next it predicted
that belongs to class A which is again
true. And for the next again it
predicted that it belongs to class A
which is false in this case cuz the test
data belongs to class B. All right. So
in total we have two correct prediction
out of three. All right. So the ratio
will be 2x3 which is nothing but 66.66.
So our accuracy rate is 66.66.
So now that you have created all the
function that are required for KN&N
algorithm. Let's compile them into one
single main function. All right. So this
is our main function and we are using
iris data set with a split of 0.67 6 7
and the value of K is three. Let's see
what is the accuracy score of this.
Check how accurate our model is. So in
training data set we have 113 values and
in the test data set we have 37 values.
These are the predicted and the actual
values of the output. Okay. So in total
we got an accuracy of 97.29%.
Which is really very good. All right.
[music]
Now the success of human race is because
of the ability to communicate and share
information. Now that is where the
concept of language comes in. However,
many such standards came up resulting in
many such language which each language
having its own set of basic shapes
called alphabets. And the combination of
alphabets resulted in words and the
combination of these words arranged
meaningfully resulted in the formation
of a sentence. Now each language has a
set of rules that is used while
developing these sentences and these set
of rules are also known as grammar. Now
coming to today's world that is the 21st
century. According to the industry
estimates only 21% of the available data
is present in the structured format.
Data is being generated as we speak, as
we tweet, as we send messages on
WhatsApp, Facebook, Instagram or through
text messages. And the majority of this
data exists in the textual form which is
highly unstructured in nature. Now in
order to produce significant and
actionable insights from the text data,
it is important to get acquainted with
the techniques of text analysis. So
let's understand what is text analysis
or text mining. Now it is the process of
deriving meaningful information from
natural language text. And text mining
usually involves the process of
structuring the input text, deriving
patterns within the structured data and
finally evaluating the interpreted
output. Compared with the kind of data
stored in database, text is
unstructured, amorphous and difficult to
deal with algorithmically.
Nevertheless, in the modern culture,
text is the most common vehicle for the
former exchange of information. Now as
text mining refers to the process of
deriving highquality information from
text the overall goal here is to turn
the text into data for analysis and this
is done by the application of NLP or
natural language processing. So let's
understand what is natural language
processing. So NLP refers to the
artificial intelligence method of
communicating with an intelligence
system using natural language. By
utilizing NLP and its components, one
can organize the massive chunks of
textual data, perform numerous or
automated task and solve a wide range of
problems such as automatic
summarization, machine translation,
named entity recognition, speech
recognition, and topic segmentation. So
let's understand the basic structure of
an NLP application. Considering the
chatbot here as an example, we can see
first we have the NLP layer which is
connected to the knowledge base and the
data storage. Now the knowledge base is
where we have the source content that is
we have all the chat logs which contain
a large history of all the chats which
are used to train the particular
algorithm and again we have the data
storage where we have the interaction
history and the analytics of that
interaction which in turn helps the NLP
layer to generate the meaningful output.
So now if we have a look at the various
applications of NLP. First of all we
have sentimental analysis. Now this is a
field where NLP is used heavily. We have
speech recognition. Now here we are also
talking about the voice assistants like
Google assistant, Cortana and the Siri.
Now next we have the implementation of
chatbot as I discussed earlier just now.
Now you might have used the customer
care chat services of any app. It also
uses NLP to process the data entered and
provide the response based on the input.
Now machine translation is also another
use case of natural language processing.
Now considering the most common example
here would be the Google translate. It
uses NLP and translates the data from
one language to another and that too in
real time. Now other applications of NLP
includes spellchecking. Then we have the
keyword search which is also a big field
where NLP is used. Extracting
information from any particular website
or any particular document is also a use
case of NLP. And one of the coolest
application of NLP is advertisement
matching. Now here what we mean is
basically recommendation of the ads
based on your history. Now NLP is
divided into two major components that
is the natural language understanding
which is also known as NLU and we have
the natural language generation which is
also known as NLG. The understanding
involves tasks like mapping the given
input into natural language into useful
representations, analyzing different
aspects of the language. Whereas natural
language generation it is the process of
producing the meaningful phrases and
sentence in the form of natural
language. It involves text planning,
sentence planning and text realization.
Now NLU is usually considered harder
than NLG. Now you might be thinking that
even a small child can understand a
language. So let's see what are the
difficulties a machine faces while
understanding any particular languages.
Now understanding a new language is very
hard. Taking our English into
consideration, there are a lot of
ambiguity and that too in different
levels. We have lexical ambiguity,
syntactical ambiguity and referential
ambiguity. So lexical ambiguity is the
presence of two or more possible
meanings within a single word. It is
also sometimes referred to as semantic
ambiguity. For example, let's consider
these sentences and let's focus on the
italicized words. She is looking for a
match. So what do you infer by the word
match? Is it that she looking for a
partner or is it that she's looking for
a match be it a cricket match or a rugby
match? Now the second sentence here we
have the fisherman went to the bank. Is
it the bank where we go to collect our
checks and money or is it the river bank
we are talking about here. Sometimes it
is obvious that we are talking about the
river bank but it might be true that
he's actually going to a bank to
withdraw some money. You never know. Now
coming to the second type of ambiguity
which is the syntactical ambiguity in
English grammar. This syntactical
ambiguity is the presence of two or more
possible meanings within a single
sentence or a sequence of words. It is
also called as structural ambiguity or
grammatical ambiguity. Taking these
sentences into consideration, we can
clearly see what are the ambiguities
faced. The chicken is ready to eat. So
here what do you infer? Is the chicken
ready to eat his food or is the chicken
ready for us to eat? Similarly, we have
the sentence like visiting relatives can
be boring. Are the relatives boring or
when we are visiting the relative it is
very boring? You never know. Coming to
the final ambiguity which is the
referential ambiguity. Now this
ambiguity arises when we are referring
to something using pronouns. The boy
told his father the theft he was very
upset. Now I'm leaving this up to you.
You tell me what does he stand for here?
Who is he? Is it the boy? Is it the
father or is it the thief?
So coming back to NLP. Firstly, we need
to install the NLTK library that is the
natural language toolkit. It is the
leading platform for building Python
programs to work with human language
data and it also provides easy to use
interfaces to over 15 corpora and
lexical resources. We can use it to
perform functions like classification,
tokenization, stemming, tagging and much
more. Now once you install the NLTK
library, you will see an NLTK
downloader. It is a pop-up window which
will come up and in that you have to
select the all option and press the
download button. It will download all
the required files the corpora the
models and all the different packages
which are available in the NLCK. Now
when we process text there are a few
terminologies that we need to
understand. Now the first one is
tokenization. So tokenization is a
process of breaking strings into tokens
which in turn are small structures or
units that can be used for tokenization.
Now tokenization involves three steps
which is the breaking a complex sentence
into words understanding the importance
of each words with respect to the
sentence and finally produce a
structural description on an input
sentence. So if we have a look at the
example here considering this sentence
tokenization is the first step in NLP.
Now when we divide it into tokens, as
you can see here, we have 1 2 3 4 5 6
and seven tokens here. Now NLTK also
allows you to tokenize phrases
containing more than one word. So let's
go ahead and see how we can implement
tokenization using NLTK. So here I'm
using Jupyter notebook to execute all my
practicals and demo. Now you are free to
use any sort of IDE which is supported
by Python. It's your choice. So let me
create a new notebook here. Let me
rename as text mining and NLP.
So first of all let us import all the
necessary libraries. Here we are
importing the OS NLTK and the NLTK
corus.
So as you can see here we have various
files which represent different types of
words, different types of functions. We
have samples of Twitter.
We have different sentimental word net.
We have product reviews. We have movie
reviews. We have non-breaking prefixes
and many more files here.
Now let's have a look at the Gutenberg
file here and see what are all the
fields which are present in the
Gutenberg file. So as you can see here
inside this we have all the different
types of text files. We have Austin,
Emma. We have the Shakespeare, we have
the Hamlet, we have Mobex, we have the
Carol Alice and many more. Now this is
just one file we are talking about and
NLTK provides a lot of files. So let's
consider a document of type string and
understand the significance of its
tokens. So if you have a look at the
elements of the Hamlet, you can see it
starts from the tragedy of Hamlet by
William SPE.
So if you have a look at the first 500
elements of this particular text file.
So as I was saying the tragedy of Hamlet
by William Shakespeare 1599 actor's
premise. We can use a lot of these files
for analysis and text for understanding
and analysis purposes and this is where
NLTK comes into picture and it helps a
lot of programmers to learn about the
different features and the different
application of language processing. So
here I have created a paragraph B on
artificial intelligence. So let me just
execute it. Now this AI is of the string
type. So it will be easier for us to
tokenize it. Nonetheless, any of the
files can be used to tokenize. For
simplicity here, I'm taking a string
file. The next what we are going to do
is import the word tokenize under the
NLTK tokenize library. Now this will
help us to tokenize all the words. Now
we will run the word tokenize function
over the paragraph and assign it a name.
So here I'm considering AI tokens and
I'm using the word tokenize function on
it. Let's see what's the output of this
AI tokens.
So as you can see here it has divided
all the input which was provided here
into the tokens.
Now let's have a look at the number of
tokens here we have here. So in total we
have 273 tokens. Now these tokens are a
list of words and the special characters
which are separated items of the list.
Now in order to find the frequency of
the distinct elements here in the given
AI paragraph, we are going to import the
frequency distinct function which falls
under NLTK.probability.
So let's create a f test in which we
have the function here frequentist.
And basically what we are doing here is
finding the word count of all the words
in the paragraph.
So as you can see here we have comma 30
times we have full stop nine times and
we have accomplished one according one
and so on. We have computer five times.
Now here we are also converting the
tokens into lower case so as to avoid
the probability of considering a word
with upper case and lower case as
different. Now suppose we were to select
the top 10 tokens with the highest
frequency. So here you can see that we
have comma 30 times the 13 times of 12
times and 12 times. Whereas the
meaningful words which are intelligence
which is six times and intelligence six
time. Now there is another type of
tokenizer which is the blank tokenizer.
Now let's use the blank tokenizer over
the same string to tokenize the
paragraph with respect to the blank
string. Now the output here is nine. Now
this nine indicates how many paragraphs
we have and what all paragraphs are
separated by a new line. Although it
might seem like a one paragraph, it is
not. The original structure of the data
remains intact. Now another important
key term in tokenizations are biograms,
diagrams and engrams. Now what does this
mean? Now biograms refers to tokens of
two consecutive words known as a bagram.
Similarly, tokens of three consecutive
written words are known as triagram. And
similarly, we have engrams for the n
consecutive written words. So, let's go
ahead and execute some demo based on
bagrams, diagrams, and engrams. So,
first of all, what we need to do is
import biograms, diagrams, and engrams
from nltk.util.
Now, let's take a string here on which
we'll use these functions. So taking
this string into consideration, the best
and the most beautiful thing in the
world cannot be seen or even touched.
They must be felt with the heart. So
first what we are going to do is split
the above sentence or the string into
tokens. So for that we are going to use
the word tokenize. So as you can see
here we have the tokens. Now let us now
create the bagram of the list containing
tokens. So for that we are going to use
the nltk.bs biograms and pass all the
tokens and since it is a list we are
going to use the list function. So as
you can see under output we have the
best best and and most beautiful thing
in the world. So as you can see the
tokens are in the form of two words it's
in a pair form. Similarly if we want to
do the triagrams and find out the
triagrams what we need to do is just
remove the bagrams and use the
triagrams.
So as you can see we have tokens in the
form of three words and if you want to
use the engrams let me show you how it's
done. So for engrams what we need to do
is define a particular number here. So
instead of n I'm going to use let's say
four. So as you can see we have the
output in the form of four tokens.
Now once we have the tokens we need to
make some changes to the tokens. So for
that we have stemming. Now stemming
usually refers to normalizing words into
its base form or the root form. So if we
have a look at the words here we have
affectation affects affections
affected, affection and affecting. So as
you might have guessed the root word
here is affect. So one thing to keep in
mind here is that the result may not be
the root word always.
Seming algorithm works by cutting off
the end or the beginning of the word
taking into account a list of common
prefixes and suffixes that can be found
in an infected word. Now this
indiscriminate cutting can be successful
in some occasions but not always. And
this is why we affirm that this approach
presents some limitations. So let's go
ahead and see how we can perform
stemming on a particular given data set.
Now there are quite a few types of stem.
So starting with the potter stem, we
need to import it from nltk. stem. Let's
get the output of the word having and
see what is the stemming of this word.
So as you can see we have have as the
output.
Now here we have defined words to stem
which are give, giving, given and gave.
So let's use the porter stemer and see
what is the output of this particular
stemming. So as you can see it has
given, give, given, give and gave. Now
we can see that the stemmer removed only
the ing and replaced it with an e. Now
let's try to do it the same with another
stemmer called the Lancaster stemmer.
You can see the stemmer stemmed all the
words. As a result of it, you can
conclude that the Lancaster stemmer is
more aggressive than the potter stemer.
Now the use of each of these stemmers
depend on the type of task that you want
to perform. For example, if you want to
check how many times the words GIV is
used above, you can use the Lancaster
stemmer. And for other purposes, you
have the Potter stemer as well.
Now, there are a lot of stemmers. There
is one snowball stemmer also present
where you need to specify the language
which you are using and then use the
snowball stemmer. Now, as we discussed
that stemming algorithm works by cutting
off the end or the beginning of the
word. On the other hand, lemitization
takes into consideration the
morphological analysis of the word. Now,
in order to do so, it is necessary to
have a detailed dictionary which the
algorithm can look into to link the form
back to its lema. Now, limitization what
it does is groups together different
infected forms of a word which are
called lema. It is somehow similar to
stemming as it maps several words into a
common root. Now one of the most
important thing here to consider is that
the output of limitization is a proper
word unlike stemming in that case where
we got the output as GIV. Now GIV is not
any word it's just a stem. Now for
example if a limitization should work on
go on going and went it all stems into
go because that is the root of the all
the three words here. So let's go ahead
and see how lemitization work on the
given input data. Now for that we are
going to import the leatizer from NLTK.
Now we are also importing the word net
here. As I mentioned earlier that
lemitization requires a detail
dictionary because the output of it is a
root word which is a particular given
word. It's not just any random word. It
is a proper word. So to find that proper
word it needs a dictionary. So here we
are providing the word net dictionary
and we are using the word net leatizer.
So passing the word corpora into the
word net leatizer. So can you guys tell
me what is the output of this one? I'll
leave this up to you guys. I won't
execute the sentence. Let me remove this
sentence here. You guys tell me in the
comments below what will be the output
of the limitization of the word corpor.
And what will be the output of the
stemming? You guys execute that and let
me know in the comment section below.
Now let's take these words into
consideration. Give, giving, given and
gave and see what is the output of the
limitization.
So as you can see here the limitizer has
kept the words as it is and this is
because we haven't assigned any poss
tags here and hence it has assumed all
the words as nouns. Now you might be
wondering what are poss tags? Well, I'll
tell you what are poss tags later in
this video. So for just now let's keep
it as simple as that is that POS tags
usually tell us what exactly the given
word is. Is it a noun? Is it a verb? Or
is it different parts of speech?
Basically POS stands for parts of
speech. Now do you know that there are
several words in the English language
such as I, ate, for, above, below which
are very useful in the formation of
sentence and without it the sentence
would make any sense. But these words do
not provide any help in the natural
language processing and this list of
words are also known as stop words. NLTK
has its own list of stop words and you
can use the same by importing it from
the NLTK.cus.
So the question arises are they helpful
or not? Yes, they are helpful in the
creation of sentences but they are not
helpful in the processing of the
language. So let's check the list of
stop word in the NLTK.
So from nltk.corus we are importing the
stop words and if we specify what all
stop words are there in the English
language. Let's see. So as you can see
here we have the list of all the stop
words which are defined in the English
language and we have 179 total number of
stop words. Now as you can see here we
have these words which are few more most
other some. Now these words are very
necessary in the formation of sentences.
You cannot ignore these words but for
processing these are not important at
all. So if you remember we had the top
10 tokens from that particular word that
is the AI paragraph I mentioned earlier
which was given as F test top 10. Let's
take that into consideration and see
what you can see here is that except
intelligent and intelligence most of the
words are either punctuation or stop
words and hence can be removed. Now
we'll use the compile from the re module
to create a string that matches any
digit or special character and then
we'll see how we can remove the stop
words. So if you have a look at the
output of the post punctuation,
you can see there are no stop words here
in the particular given output. And if
you have a look at the output of the
length of the post punctuation, it's 233
compared to the 273 the length of the AI
tokens. Now this is very necessary in
language processing as it removes the
all the unnecessary words which do not
hold any much more meaning. Now coming
to another important topic of natural
language processing and text mining or
text analysis is the parts of speech.
Now generally speaking the grammatical
type of the word which is the verb,
noun adjective adverb article
indicates how a word functions in the
meaning as well as the grammatical
within the sentence. Now a word can have
more than one part of speech based on
the context in which it is used. For
example, if you take the sentence into
consideration, Google something on the
internet. Now here Google acts as a verb
although it is a proper noun. So as you
can see here we have so many types of
poss tags and we have the descriptions
of those various tags. So we have the
coordinating conjunction CC, cardinal
number CD. We have JJ as adjective, MD
as modal. We have the proper noun
singular pler. We have verbs, different
types of verbs. We have interjection
symbol. We have the Y pronoun and the Y
adverb. Now we can use P tags as a
statistical NLP task. It distinguishes
the sense of the word which is very
helpful in text realization and it is
easy to evaluate as in how many tags are
correct and you can also infer semantic
information from the given text. So
let's have a look at some of the
examples of pos. So take the sentence
the dog killed the bat. So here the is a
determiner dog is a noun killed is a
verb and again the bat are determiner
and noun respectively. Now let's
consider another sentence. The waiter
cleared the plates from the table. So as
you can see here all the tokens here
correspond to a particular type of tag
which is the parts of speech tag. It is
very helpful in text realization. Now
let's consider a string and check how
NLTK performs POS tagging on it. So
let's take the sentence Timothy is a
natural when it comes to drawing. First
we are going to tokenize it. And under
NLTK only we have the poss tag option.
And we'll pass all the tokens here. So
as you can see we have Timothy as noun
is a verb or as a determiner natural as
an adjective when as a verb it as a
preposition comes as a verb to as a to
and drawing as a verb again. So this is
how you define the poss tags.
The poss tag function does all the work
here. Now let's take another example
here. John is eating a delicious cake.
And let's see what's the output of this
one. Now here you can see that the
tagger has tagged both the word is and
eating as a verb because it has
considered is eating as a single term.
This is one of the few shortcomings of
the POS taggers. One thing important to
keep in mind. Now after poss taggings
there is another important topic which
is the named entity recognition. So what
does it mean? Now the process of
detecting the named entities such as the
person name, the location name, the
company name, the organization, the
quantities and the monetary value is
called the named entity recognition. Now
named entity recognition we have three
types of identification. Here we have
the nonphrase identification. Now this
step deals with extracting all the noun
phrases from a text using dependency
passing and parts of speech tagging.
Then we have the phrase classification.
The step classification. This is the
classification step in which all the
extracted noun phrases are classified
into respective categories which are the
location, names, organization and much
more. And apart from this, one can
curate the lookup tables and
dictionaries by combining information
from different sources. And finally we
have the entity disambiguation. Now
sometimes it is possible that the
entities are mclassified. Hence creating
a validation layer on top of the result
is very useful and the use of knowledge
graphs can be exploited for this
purpose. Now the popular knowledge
graphs are Google knowledge graph the
IBM Watson and Wikipedia.
So let's take a sentence into
consideration that the Google CEO Sunda
Pichai introduced the new pixel at
Minnesota Roy center event. So as you
can see here Google is an organization,
Sundap Pay is a person, Minnesota is a
location and the Roy center event is
also tagged as an organization. Now for
using any in Python we'll have to import
the NE chunk from the NLTK module which
is present in Python. So let's consider
a text data here and see how we can
perform the ne using the NLTK library.
So first we need to import the NE chunk
here. Let's consider the sentence here.
We have the US president stays in the
white house. So we need to do all these
processes again. We need to tokenize the
sentence first and then add the POS tax.
And then if we use the any chunk
function and pass the list of tpples
containing POS tax to it. Let's see the
output. So as you can see the US here is
recognized as an organization and white
house is clubed together as a single
entity and is recognized as a facility.
Now this is only possible because of the
poss tagging. Without the POS tagging it
would be very hard to detect the named
entities of the given tokens.
Now that we have understood what are
named engineary recognition and yes,
let's go ahead and understand one of the
most important topic in NLP and text
mining which is the syntax.
So what is a syntax?
So in linguistics syntax is the set of
rules, principle and the processes that
govern the structure of a given sentence
in a given language. The term syntax is
also used to refer to the study of such
principles and processes. So what we
have here are certain rules as to what
part of the sentence should come at what
position. With these rules, one can
create a syntax tree whenever there is a
sentence input. Now syntax tree in
layman terms is basically a tree
representation of the syntactic
structure of the sentence of the
strings. It is a way of representing the
syntax of a programming language as a
hierarchical tree structure. This
structure is used for generating symbol
tables for compilers and later code
generation. The T represents all the
constructs in the language and their
subsequent rules. So let's consider the
statement the cat sat on the mat. So as
you can see here the input is a sentence
or a w phrase and it has been classified
into non-phrase. Then the prepositional
phrase again the noun phrase is
classified into article and noun and
again we have the verb which is sat. And
finally we have the preposition on the
article and the noun which are the and
matt. Now in order to render syntax
trees in our notebook you need to
install the ghost strip which is a
rendering engine. Now this takes a lot
of time and let me show you from where
you can download the ghost script. Just
type in download ghost script and
select the latest version here.
So as you can see we have two types of
license here. We have the general public
license and the commercial license as
creating syntax and following it is a
very important part. It is also
available for commercial license and it
is very useful. So I'm not going to go
much deeper into what syntax tree is and
how we can do that. So now that we have
understood what are syntax trees, let's
discuss the important concept with
respect to analyzing the sentence
structure which is chunking. So chunking
basically means picking up individual
pieces of information and grouping them
into bigger pieces. And these bigger
pieces are also known as chunks. In the
context of NLP and text mining, chunking
means grouping of words or tokens into
chunks. So let's have a look at the
example here. So the sentence into
consideration here is we caught the
black panther. We is the preposition
caught is a verb. the determiner. Black
is an adjective and panther is a noun.
So what it has done is here as you can
see is that pink which is an adjective,
panther which is a noun and the is a
determiner are chunked together in the
noun phrase.
So let's go ahead and see how we can
implement chunking using the NLTK.
So let's take the sentence the big cat
ate little mouse who was after the fresh
cheese. We'll use the POS tax here and
also use the tokenizing function here.
So as you can see here we have the
tokens and we have the PS tags. What
we'll do now is create a grammar from a
noun phrase and we'll mention the tags
that we want in our chunk phrase within
the curly braces. So that will be our
grammar np.
Now here we have created a regular
expression matching string. Now we'll
now have to pass the chunk and hence
we'll create a chunk pass and pass our
noun phrase string to it. So as you can
see we have a certain error and let me
tell you why this error occurred. So
this error occurred because we did not
use the code script and we do not form
the syntactical tree. But in the final
output we have a tree tree structure
here which is not exactly in the
visualization part but it's there. So as
you can see here we have the NP noun
phrase for the little mouse. Again we
have the noun phrase for fresh cheese
also.
Although fresh is an adjective and
cheese is a noun, it has considered a
noun phrase of these two words. So this
is how you execute chunking in NLTK
library. So by now we have learned
almost all the important steps in text
processing and let's apply them all in
building a machine learning classifier
on the movie reviews from the NLTK
corpora. So for that first let me import
all the libraries
which are the pandas the numpy library.
Now these are the basic libraries needed
in any machine learning algorithm.
We are also importing the count
vectorizer. I'll tell you why it is used
later. Now let's just import it for now.
So again if we have a look at the
different elements of the corpora as we
saw earlier in the beginning of our
session we have so many files in the
given NLTK corpora. Now let's now access
the movie reviews corporates under the
NLTK corpora. As you can see here we
have the movie reviews. So for that we
are going to import the movie reviews
from the NLTK corpor. So if you have a
look at the different categories of the
movie reviews we have two categories
which are the negative and the positive.
So if you have a look at the positive we
can see we have so many text files here.
Similarly if we have a look at the
negative we have thousand negative files
also here which have the negative
feedbacks.
So let's take a particular positive one
into consideration which is the
cv0029590.
You can take any one of the files here
doesn't matter. Now the above
tokenization as you can see here the
file is already tokenized but it is
generally useful for us to do the
tokenization but the above tokenization
has increased our work here and in order
to use the count vectorzer and the TF
IDF we must pass the strings instead of
the tokens. Now in order to convert the
strings into token we can use the d
tokenizer within the nltk but uh that
has some licensing issues as of now with
the with the cond environment. So
instead of that we can also use the join
method to join all the tokens of the
list into a single string and that's
what we are going to use here. So first
we are going to create an empty list and
append all the tokens within it. We have
the review list that is an empty list.
Now what we are going to do here is
remove all the extra spaces the commas
from the list while appending it to the
empty list and perform the same for the
positive and the negative reviews. So
this one we are doing it for the
negative reviews and then we'll do the
same for the positive reviews as well.
So if you have a look at the length of
this negative review list, it's 1,000.
And the moment we add the positive
reviews also, I think the length should
reach 2,000.
So let me just define the positive
reviews.
Now execute the same for positive
reviews. And then again, if we have a
look at the length of the review list,
it should be 2,000. That is good. Now
let us now create the targets. before
creating the f features for our
classifiers.
So while creating the targets we are
using the negative reviews here we are
denoting it as zero and for the positive
reviews we are converting it into one
and also we will create an empty list
and we'll add 1 th00and zeros followed
by th00and ones into the empty list.
Now we'll create uh panda series for the
target list. Now the type of Y must
result into a panda series. So if you
have a look at the output of the type of
Y, it is pandas.code or series. That is
good. Now let's have a look at the first
five entries of the series.
So as you can see it is th00and zeros
which were followed by th00and ones. So
the first five inputs are all zeros. Now
we can start creating features using the
count vectorzer or the bag of words. For
that we need to import the count
vectorzer. Now once we have initialized
the vectorzer now we need to fit it onto
the rev list. Now let us now have a look
at the dimensions of this particular
vector. So as you can see it's 2,000 by
16,228.
Now we are going to create a list with
the names of all the features by typing
the vectorzer name. So as you can see
here we have our list. Now what we'll do
is we'll create a pandas data frame by
passing the sci matrix as values and
feature names as the column names. Now
let us now check the dimension of this
particular pandas data frame. So as you
can see it's the same dimension 200 by
16,228.
Now if we have a look at the top five
rows of the data frame. So as you can
see here we have 16,228
columns with five rows and all the
inputs are here zero. Now the data frame
we are going to do is now split it into
training and testing sets and let us now
examine the training and the test sets
as well. So as you can see the size here
we have defined as 0.25
that is the test set that is 25% the
training set will have the 75% of the
particular data frame. So if you have a
look at the shape of the X train, we
have 15,000.
And if you have a look at the dimension
of X test, this is 5,000.
So now our data is split. Now we'll use
the nave bias classifier for text
classification over the training and
testing sets.
So now most of you guys might already be
aware of what a nave bias classifier is.
So it is basically a classification
technique based on the base theorem with
an assumption of independence among
predictors. In simple terms, a nave bi
classifier assumes that the presence of
a particular feature in a class is
unrelated to the presence of any other
feature. To know more, you can watch our
nave by bice classifier video, the link
to which is given in the description box
below. If you want to pause at this
moment of time and check quickly what a
nave by classifier does and how it
works, you can check that video and come
back here. Now to implement nave bias
algorithm in python we'll use the
following library and the functions. We
are going to import the gshian nb from
sklearn library which is a scikitlearn.
We are going to instantiate the
classifier now and fit the classifier
with the training features and the
labels.
We are also going to import the
multinnomial nave bias because we do not
have only two features here. We have the
multinnomial features. So now we have
passed the training and the test data
set to this particular multinnomial navy
bias and then we will use the predict
function and pass the training features.
Now let's have a look and check the
accuracy of this particular metrics. So
as you can see here the accuracy here is
one that is very highly unlikely but
since it has given one that means it is
overfitting and it is overly accurate.
And you can also check the confusion
matrix for the same. For that what you
need to do is use the confusion matrix
on these variables which is y test and y
predicted. So as you can see here
although it has predicted 100% accuracy
the accuracy is one. This is very highly
unlikely and you might have got a
different output for this one. I've got
the output here as 1.0. You might have
got an output as 0.6 0.7 or any number
in between 0 and 1. [music]
Now let's answer the fundamental
question that is what exactly is
generative AI? Generative AI refers to
algorithm capable of creating new
content whether text, images, audio or
even videos. It's like having a creative
AI assistant that can take a simple
input and produce engaging outputs. For
example, GPT and llama can write essays
or code while image generation models
like DAL E and stable diffusion can
visualize unique scenes from
descriptions. But let's look at some of
the popular tools driving this
innovation. Well, some of the standard
tools in generative AI includes GitHub
copilot which assists developers with
code suggestions and charged
interactions.
Image generation tools like stable
diffusion and midjourney helps creators
bring visual concepts to life. Google's
Gemini merges text and image
capabilities while Adobe Firefly extends
AI's reach to creative source. So if you
want to know how to use these tools then
check out our generative AI examples
video link in the description. So you
might wonder where are these tools being
applied. Now let's explore them.
Generative AI is transforming multiple
creative fields. Image generation tools
power visual design. Music compositions
algorithm create original scores and AI
assist video editors in automatic task.
LLMs help generate and translate text
while code generation tools like GitHub
copilot boost developer efficiency. AI
generated voices are even being used in
audio books and voice assistants. So now
let's take some of these tools and
check. So this time we will use Ptory AI
and Flicky AI. First let's explore Ptory
AI. So for that let's go to its site and
check its functions. So we are at the
Ptory AI site. And on the left side we
have the home project and brand kits.
And on the main screen we have different
features Ptory AI provides. So let's
choose text to video. Here let's write
some names and description and press
generate.
Pictory AI is a tool designed for video
creators that helps transform long- form
content such as articles or blog post
into short engaging videos. It uses AI
to automatically extract key highlights
and create professionallook videos with
minimal efforts. Due to its simplicity
and time-saving capabilities, Pictory AI
is a popular for social media content
creation and marketing. Now let's see
our next tool which is Flicky AI. So now
we are at the flicky.ai site. Here we
have different features like videos
where you can create videos from all of
these blogs, prompts etc. You can also
create audios from these features and
then we also have a design feature and
on the left hand side you can see
options like files, templates, brand
kits, voice clones etc. So now let's
take an idea and convert it into a
video. Now let's write our topic and
generate. Flicky AI is a content
creation tool that turns text into
videos using AI generated voices and
visuals. It helps users create
professional videos quickly by pairing
written content with the stock images,
animations, and voiceovers. Flicky is
ideal for marketers, content creators,
and educators looking to create engaging
video content efficiently. Now that we
have seen the applications, so let's
step back and look at the journey that
brought us here. So basically our
journey starts in 1947 with Alan
Turing's concept of intelligent
machines. By 1961, Joseph Venbomb
introduced Ela, the first chatbot. The
1980s saw the birth of recurrent neural
networks, while 1997 brought long
short-term memory networks to tackle
sequential data. And then GANs emerged
in 2014 transforming creative task. Fast
forward to 2017 when transformers like
GPT entered the scene. By 2023, GPT 3.5
and Google's palm marked significant
milestones. And by 2025, we are on the
brink of AI breakthroughs in chemistry
and genome editing. So what exactly are
these LLMs and why are they so powerful?
An LLM or large language models analyzes
and understands natural language using
machine learning. Examples include
OpenAI's GPT, Google Spetas Lama. These
models drive applications such as
chatbots, language translation and more
by learning from extensive data to
predict and generate text sequences. But
before this, there was a very famous
term called language model. A language
model is a machine learning model that
uses probability statistics and
mathematics to predict the next sequence
of words. Suppose you have a sentence
like I have a boy who is my dash. Here
if we ask a language model to predict
the next word, it considers the context
provided by the words before the blank.
Based on common usage patterns from its
training data, it may predict words like
boyfriend, brother, or friend which fit
naturally. However, it's less likely to
predict colleague or sibling as those
words may not commonly follow this type
of phrases. So, this process shows how
language models predict text by
calculating probabilities for each
possible word based on their likelihood
in context.
So, when a language model is trained on
massive amounts of diverse text, it
gains a wider vocabulary and more
understanding of language, enabling it
to make more accurate predictions. For
example, if we give it a phrase like you
are a dash to me, a model trained on
extensive data might suggest various
fitting words for example friend,
inspiration or anything else. So based
on the sentiment or context it has
learned from the data. Now here
reinforcement learning is used to
improve the model's responses over time.
By giving feedbacks be it positive or
negative on the responses we help the
model learn which type of responses are
preferred in specific context. For
example, if the model frequently
misinterprets the tone or intent, the
reinforcement learning helps adjust its
productions to be more contextually
appropriate and aligned with the
intended meaning. But what do these
models look like under the hood?
Well, LLMs are built on neural networks
composed of input, hidden, and output
layers. The hidden layer process
information to learn complex patterns
and more layers means the model can
capture deeper insights. This structure
allows LLM to perform task from
generating text to complex code
completions. Now, how do these layers
interact and function in real time? Now,
LLM is based on the transformer and a
transformer uses deep learning to
process any information coming to it.
Now, let me tell you a story of three
friends. Imagine we have three
characters. First is our friend. The
next character is Minion Bob and the
third character is Gru. So, our friend
asks Bob, "What's the price of the jet?
It must be $50,000." Minion Bob isn't
sure. So he goes to Gru and asks, "Is
the jet $50,000?" Grrew replies, "No,
it's $70,000."
In this back and forth, Minion Bob is
like the neural network layer trying to
make an accurate guess. So each time he
goes back to group, like receiving more
data or feedback, he gets corrected if
his guess is wrong, leading him to
refine his response. Now after the first
check, Min and Bob returns to our friend
saying, "I guess it's more than
$60,000."
Our friend assumes it might be around
$65,000. And sends Bob back to Gru to
verify. Again, Gru corrects him, "No,
it's actually $70,000."
So this process repeats with Bob
adjusting his guess each time.
Eventually, he learns that the correct
answer is $70,000 and updates his
knowledge. So just like Minion Bob
neural networks make initial guesses
based on available information with each
feedback loop like Bob going back to
group the model's hidden layers adjust
the parameters to refine its guesses
ultimately arriving at the most accurate
prediction possible. So after getting
corrected multiple times minion Bob's
guesses improve until he knows the price
is $70,000.
Similarly in a neural network gradually
learning the correct answer through
training. So once the network learns it
can give accurate answers in future
cases without checking every time. Now
let us move on to understand how LLMs
work. LLMs begin the collection of data
sets then tokenize text and break it
into a manageable pieces. Using a
transformer architecture they process
the data sequence all at once leveraging
vast training data. LLMs contain
millions of learned parameters that
predict the text tokens and generate
coherent outputs. Models often undergo
pre-training for general knowledge and
fine-tuning for specific task. So now
let's see some practical uses of LLMs.
LLM power content generation, creating
anything from articles to code. They
excel in language translation, enhanced
search engines, personalized
recommendation, code development
assistance, and sentiment analysis,
which also owe much to LLM's predictive
capabilities. So guys, are you ready to
use all that knowledge in coding and
witness how these LLMs come together to
drive innovation? Whether through
developing applications, analyzing data
or building smart assistance, the gear
of technology keep turning to unlock
AI's full potential. So now let us look
at our problem statement. So one of the
difficulties in the healthcare industry
is effectively evaluating medical
pictures such as MRIs, CT scans and
X-rays in order to identify anomalies
and illnesses. This procedure takes a
lot of time and calls from specialized
understanding. Automated methods must be
developed to help medical personnel
recognize possible health problems in
medical imaging. In order to provide
better patient care, a system that
integrates cuttingedge machine learning
models with image analysis can greatly
help in the early detection of diseases
including cancer, infections, and other
illnesses. So, the method uses
generative AI to evaluate medical photos
and generate a thorough diagnosis report
based on the findings. This technology
allows users to upload medical images
which the AI model then processes.
Now let us build our project on a
medical image analysis application using
streamlit Python and an LLM of Google
Gemini AI. So this app helps healthcare
professional analyze medical images such
as X-rays, MRIs and CT scans to detect
anomalies and diseases. First let's
import the necessary libraries. So first
import
streamllet as st.
So if this is not working or showing an
error then open the terminal and write
pip install streamllet
and from path lib import path.
Next import google.generative generative
AI as gen AI.
So we are importing streamllet for the
app interface and path from path li for
handling file paths and Google
generative AI which allows us to
interact with the Gemini AI model. Next
we will configure Google's Gemini API by
setting up our API key. So this will
allow us to connect to the AI model and
generate insights from medical images.
So before proceeding let's get our API
key and we will go to the Google to
generate an API key.
So on your left there is an API key
option and after clicking you will get
the create API option. So just select
your model and create your API key.
So as you can see the screen just copy
this API key and go back to terminal. So
now let's configure our model. So just
type gel AI dot configure
and inside the bracket give API key
equal to and over here paste the key.
Now we set up the system prompt which
defines the role of the AI model. So the
prompt specifies that our AI is a
medical image analysis system capable of
detecting diseases like cancer,
cardiovascular issues, neurological
conditions and more. So guys I have
already researched the prompts and
written here. So basically the system
prompt should be inside the triple
quotes.
So this prompt guides the model to
analyze medical images for conditions
such as cancer, fractures, infections
and more making it a valuable tool for
healthcare professionals. Now let's
configure the model settings for
generating responses. We define
parameters like temperature and top K to
control the creativity of the model's
output. First type generation_config
equal to
and inside the double quote we will give
temperature
which is 1 then
top_p
which is 0.95
next top k 40
then max output tokens which is 8192 two
next response_m
type which is of text or plane.
So over here the temperature one that
controls randomness a value of one given
balanced output diversity. Next top P
0.95 uses nucleus sampling selecting
tokens from the top 25% cumulative
probability distribution for diverse
responses. Next, the top K 40 which
limits token selection to the top 40
tokens based on the probability.
Narrowing possible outputs to high
probability tokens. Next, max output
token. This setting allows for longer
responses by limiting the maximum length
of the generated text to 8,192 tokens.
And then we have response m type which
specifies the format of the output as
plain text. So for more information,
read the Google Gemini documentation.
Next, we will also configure safety
settings to ensure that the model
doesn't generate harmful content. So for
example, we block categories like
harassment, hate speech, and sexual
explicit content. Here we are using two
things. First categories and then the
threshold. Then copy this four times
like harassment, hate speech and sexual
explicit content. Now let's set up the
layout for our streamlit application. So
for that we will configure the title and
the layout of the page and even add a
logo to make the interface more user
friendly. So first type st dot set
page_config
and inside the bracket let's give page
title equal to and inside the double
quotes we will give diagnostic analytics
comma page icon equal to
robot.
Now let us type column 1 comma column 2
comma column 3 is equal to st dot
columns and inside the bracket give 1a
2a 1
1 2a 1 next with column 2 so I'll be
using edurea and medical images so this
will show you how to set up images using
streaml now type st dot image and give a
bracket and inside the double quotes
let's type edurea dotpng
and give a comma and give width is equal
to 200.
Now let us copy and paste it for
medical.
So let's type medical.png.
Here we are using streamllets column to
center the logo and title and this makes
the app look professional and visually
appealing. Next, let's allow the user to
upload medical images for analysis. So,
we use streamlits file uploader widget
to accept image file in PNG, JPG or JPEG
formats. For that, let's type upload
file equal to ST dot file
uploader and inside the bracket inside
the double quotes, let's type please
upload the medical images for analysis.
Comma
type is equal to so basically the image
type is equal to and inside the bracket
inside the double quotes let's give PNG
comma jpg and jpeg
next let us type submit button equal to
stutton
and inside the bracket let's give
generate image analysis is
so here when the user uploads a file and
click the generate image analysis button
the model process the image and prepare
it for analysis. So once the user submit
the image we send it to the AI model for
analysis and then the model generates a
response based on the prompt and image
which we then display in the app. So
here as you can see the screen we have
another function. So the if submit
button which runs the code when the
submit button is pressed. Next the image
data is equal to upload file.get value.
This actually gets the raw image data
from the uploaded file. And next we have
the image parts where it creates a list
with the image data in a structured
format. Then we have the prompt parts.
So this combines the image data and a
text prompt for the model. So this part
of the code actually sends the image and
text prompt to the model to generate a
response. And then we have the st.right
which displays the model's responses in
the app. So here we use the image data
and system prompts to generate content
with the Gemini AI model. The result is
displayed as a detailed report with
insights about the medical image. Now
it's time to test the code. So open the
terminal and type streamllet run main.
py. So once you enter it will redirect
you to our model interface.
And there you go.
So the model is ready. So here's a live
demo of the app. We will upload a sample
image and the app will analyze it and
provide a detailed diagnosis based on
the AI models inside. So this is how we
use streamlit and Google's Gemini AI
model to create a medical image analysis
app. So this app can help medical
practitioners by offering precise and
thorough analysis of medical photos. Now
it is the time for testing. So let's
take one image of any disease and test
it. So upload the image from your
computer.
Then we will select an image and press
the generate button. So as you can see
it's running.
So it generates fabulous response and
can help doctors in assisting their
patients saving time and money. So this
is how we built a realtime metagnostic
helper using streamllet python and
Google gemini AI
[music]
lens like GP4 and Gemini 2.0 O are
massive models trained on huge data sets
capable of generating highly
sophisticated and nuanced responses. On
the other hand, SLMs like dist or tiny
GPT are smaller, more efficient models
designed for faster and more lightweight
task. So understanding the differences
between them is crucial for selecting
the right model for your needs. Now
let's dive right in with our first
question. What exactly are LLM and SLMs?
LLMs which are large language models are
powerful AI systems trained on vast data
sets offering deep contextual
understanding and sophisticated
responses. So models like GP4 and Gemini
2.0 are the examples whereas SLMs like
dist or tiny GPD are streamlined for
speed and efficiency exceling in
lightweight task. So both serve distinct
purposes balancing quality cost and
performance. All right. Now that we have
got a good idea of what LLMs and SLMs
are, let's talk about why this
comparison is so important. As AI
adoption grows across industries, the
choice between LLMs and SLMs becomes
more important. LLM offer deep
contextual understanding and complex
outputs while SLMs provide efficiency
and speed. So choosing the wrong model
can lead to excessive cost, slow
performance or supper results. And by
understanding the strengths and
trade-offs, you can make more informed
decision and optimize your AIdriven
solution. So now let's dive into the
core differences between LLM and SLMs
and see what sets them apart. So first
let us compare in terms of model size
and complexity. So when it comes to
model size and complexity, LLM often
have billions of parameters and require
vast computational resources to train
and run. Their large size enables them
to generate highquality context rich
responses. And on the other hand, SLMs
are designed with fewer parameters,
often in millions, making them lighter
and faster. They prioritize efficiency
over complexity, which makes them ideal
for simpler task. Next, let us compare
in terms of performance and output
quality. So when it comes to performance
and output quality, LLM are known for
their exceptional ability to handle
complex conversations, creative writing
and deep analysis. Their vast training
data ensures diverse and sophisticated
responses. On the other hand, while SLMs
are efficient, they may sometimes
struggle with nuanced or open-ended
queries. However, they excel in
straightforward well-defined task. Next,
let's compare them with speed and
latency. When it comes to speed and
latency, LLMs can experience longer
response time and higher latency due to
their large size, especially when
processing extensive input data. Whereas
SLMs are designed for speed, offering
quicker responses and making them well
suited for real-time applications where
low latency is fusion. Next, in terms of
cost and resource efficiency. So when it
comes to cost and resource efficiency,
LN require significant hardware
investments such as powerful GPUs and
extensive cloud resources which lead to
higher operational cost whereas SLMs
with the smaller footprints are more
affordable to deploy and maintain making
them accessible even with limited
computational resources. Now let us
explore the real world use cases of LLM
and SLM. LLMs are ideal for creative
content generation, customer service
chat bots with advanced capabilities,
deep data analysis, and long- form
conversations. On the other hand, SLMs
are perfect for lightweight virtual
assistance, real-time customer support,
simple automation and tasks that require
quick turnaround times. Now, let us see
its advantages and disadvantages of
using LLM and SLMs. The key advantages
of LLMs include their superior
understanding of complex language, the
ability to generate high quality nuance
responses and better generation across
wide range of diverse task. The main
drawbacks of LLMs are their high
computational and cost demands along
with slower response times due to their
large size and complexity. Now let us
have a look at the advantages of SLMs.
SLN offers several advantages including
their speed and efficiency, lower
operational cost and easier deployment
even on limited resources. The primary
disadvantages of SLMs are their limited
contextual understanding and their
tendency to struggle with complex
open-ended queries. Now that we have
explored the strengths and limitations,
so let's take a look at what the future
holds for LLMs and SLMs in AI
development. So both LLMs and SLMs will
play vital role in the future of AI. We
can expect ongoing improvements in
efficiency, quality and adaptability.
Hybrid approaches that combine the
strengths of both models could become
more common offering balanced
performance and scalability. So the
conclusion we get is that the choice
between LLM and SLM depends on your
specific needs. So if you prioritize
depth, nuance and high quality output,
LLMs are the best. So if speed,
efficiency, and cost are more important,
SLMs are the way to go. So by
understanding their strengths and
limitations, you can select the right
model and unlock AI's full potential for
your projects.
[music]
Have you seen how tools like chat GPT
with vision can look at an image you
upload and describe it? Or how Dali and
Midjourney can generate stunning images
from just a text prompt? And now some AI
models can even do both at the same
time. They can see, read, listen, and
even create all in one go. So how is
that possible? Well, that's because of
something called multimodel AI. AI that
doesn't just work with one type of data
like only text or only images, but can
understand and combine multiple types of
information together just like we humans
do. So, in this video, we are going to
break down what multimodel AI really
means and how multimodel AI works and
explore some amazing real world examples
that you are probably already using
without even realizing it. So first
let's break down the word multimodel. So
multi means many and model refers to the
modes of information like text, images,
sound or video. So multimodel AI is an
AI that can understand and work with
multiple types of data at the same time.
For example, a single AI model that can
read text, look at images, listen to
audio, watch videos, and combine all of
this to give a better answer. It sounds
a bit like how humans process
information, right? So, why do we need
multimodel AI? So, think about how we
interact with the world. If you're
watching a movie, you're seeing visuals,
listening to dialogue, and understanding
the story together. Or when you're
explaining a recipe to someone, you
might show pictures, describe steps, and
maybe even play a video. So, humans
naturally combine different scenes to
understand things. So, old AI models
were single model. They could only
process one type of data. Like a text
model could only read and write and a
vision model could only look at images.
But real world problems are not just
text or just images. They are mixed. So
multimodel AI bridges this gap and it
lets AI connect the dots between text,
visuals, audio and more. So how does
multimodel AI work? In simple terms, it
works like this. It takes different
types of input. For example, it could
take a photo and a text question about
that photo. Then it converts them into a
common language inside the AI model. So
think of it like translating text,
images and audio into one shared
understanding. Next, it reasons over all
the data together. Then it gives you a
smart answer that considers all the
inputs. So for example, you show AI a
picture of a dog and asks what breed is
this? So it looks at the image,
understands the features and responds
that looks like a golden retriever. So
it's combining vision plus language to
answer. Now let us go through a working
diagram of a full multimodel pipeline.
So as you can see the screen first it
takes different inputs. It could be a
text image or even a video. Then it
encoders for each modality. Later these
inputs will be translated into common AI
language. Then a multimodel transformer
uses cross attention to connect
relationship across text, images and
audio. And finally the model generates a
response. So let me take another example
to explain this diagram. So as you can
see we have different inputs. So the
model can take text, images, audio or
even video as input. Next is the
encoders for each modality. That means a
text encoder converts words into vectors
and an image encoder converts pixels
into vectors and then an audio encoder
converts sound waves into vectors.
Next is the shared embedding space where
all these different inputs are
translated into a common AI language
which is a vector space where similar
meanings are close together. For
example, the word car and a picture of a
car are mapped close together. Next is
the fusion plus reasoning layer where a
multimodel transformer uses cross
attention to connect relationship across
text, images and audio. For example, it
links the word red to the red region of
the car image. Next is the output
generation. So finally, the model
generates a response which could be
text, a caption, an image like deli or
even sound. All right, I hope this is
clear now. So now let's look at some
real world examples that make it easier
to understand. So first we have chat GPD
with vision. So if you upload an image
to chat GPD and ask what's in this
picture then it can describe the objects
text or even analyze data like a chart.
So that's multimodel AI. It's using both
image understanding and text generation
together. The next example is Google
Lens. So when you point your camera at
something, Google Lens can recognize the
object, read the text in the image and
translate into another language. Again,
it's a vision plus language plus
translation all in one model. The next
example could be a self-driving cars.
So, autonomous cars like Teslas uses
multimodel AI because they have to see
the road through cameras, read traffic
signals, hear alerts and also process
maps and text instructions. So, they
combine all these modes to make driving
decisions. Next is the healthcare AI. So
doctors now use AI that can look at
medical images like X-rays and also read
patient reports combining the
information to help diagnose diseases
more accurately. But why is multimodel
AI a gamecher? Multimodel AI is powerful
because it's closer to human
intelligence. We don't rely on one
sense, we combine many. And it makes AI
more flexible because one model can
handle text, images, audio, and more. It
can solve more complex problems like
explaining what's happening in a video
or understanding a full conversation
with context. All right. Now, for those
of you want a bit more technical depth,
here's a quick peak behind the scene.
So, as I discussed earlier, a multimodel
AI uses transformer based models, the
same type of models behind GPT. So the
text, images and audio are all converted
into a common representations like a
shared language of numbers called
embeddings. For example, a picture of a
dog and the word dog are both mapped
into a similar space. So the AI knows
they mean the same thing. Then the model
can reason across all modalities
together and generate an output. A great
example is click from open AI which
connects images and text. Another is
Google Gemini designed from the ground
up as a truly multimodel model. So what
is the biggest challenge? So the
different types of data have different
formats and complexity. Combining them
efficiently without losing meaning is
still an ongoing research area. So it's
not just a magic. So it's smart design
that lets the AI translate everything
into one common understanding. Let's now
look at some of the most important
multimodel models, how they work and
where they are used. So here are the key
multimodel models. So first on the list
we have clip which is clip which stands
for contrastive language image
pre-training from open AI. So let's see
how it works. So it has two encoders, a
text encoder and an image encoder. So
both encoders map inputs into the same
emitting space. So during training it
learns this caption matches this image
and this caption does not match that
image. So it uses contrastive learning.
It pushes correct pairs closer and
incorrect pairs further apart. So here
is the working diagram. So it takes the
input be image or text and then it
encoders. So a image is a vision encoder
and for text is a text encoder then it
is shared to a embedding space and
finally it generates the output. So
let's have a look at the use cases. So
it is used in DALI and stable diffusion
to align text prompts with images. Next,
it is used in zeroshot classification
where you give it a photo of a dog
versus a photo of a cat and it
recognizes which one matches the image
without retraining and then it is used
in search where it finds images similar
to this caption. Next, moving on to
second model which is BLP2.
It stands for bootstrapping language
image pre-training. So, let us see how
it works. So, first it connects a frozen
vision encoder. for example, a clip or
vit with a frozen large language model
which is NLM. A query transformer acts
as a bridge where it converts visual
features into a language friendly
representation. So here is the working
diagram. The AI first looks at the image
and turns it into a features like
objects, color and shapes. Then a small
bridge model called Q former takes those
visual features and converts them into a
format the language model can
understand. Next the large language
model then reasons about the image
features just like it reasons about
text. And finally it generates a text
answer or a caption describing the
image. So vision encoder sees and Q
former translates and the LLM explains.
So let's have a look at the use cases.
So first it is used in visual question
answering for example what's in this
picture. Next in the image captioning
where it can give it like a man riding a
horse on a beach. Next in a chatboards
with vision for example where you upload
an image and ask questions. Okay. The
next model on a list is Flamingo from
Deep Mind. So let's have a look at its
working. So here's how it works. So
first it's a few short multimodel model.
It doesn't need huge fine-tuning for a
new task. And then it uses gated cross
attention layers to integrate image plus
text inside a frozen MLM and it can
reason across multiple images and a long
text sequence. So it looks at the image,
reads your question, connects both
through cross attention and then
explains it. So let's have a look at the
use cases. So it is used in multimodel
chat bots like look at these five images
and now answer this question. Next, it
is used in educational AI where it read
diagrams plus answer questions. Next, in
document understanding where it read
text plus images in a PDF and the next
multimodel on a list is palm E from
Google. So, here's how it works. So, as
you can see, this is the working
diagram. So first the AI gets both
visual input like a photo or a live
camera feed and the text instructions
like pick up the red apple on the table.
Next the vision transformer understand
what's in the image like objects, colors
and position and the palm language model
understands the instruction and reasons
about what needs to be done. So it's
combining both. The AI creates a
step-by-step action plan for the robot
like move forward, grab the red apple
and place it in the basket. So, here are
the use cases. It is used in robotics
like pick up the red apple on the table.
Next, it is used in real world reasoning
for embodied AI. Then, it is also used
in visual navigation task. And the next
multimodel is Google Gemini. So, here's
how it works. It's a natively multimodel
trained from scratch on text, images,
audio and video. So unlike clip which
aligns two encoders, Gemini has a single
model handling all modalities and it
uses joint training with cross
attention. So this is the working
diagram. So let me explain this. The AI
takes in all types of inputs at once
such as written text, pictures, sound,
and even video. Then instead of using
separate models for each type, it uses
one powerful transformer model that can
understand and combine all these inputs
together. And from that combined
understanding, it can give any kind of
output, a text answer, a generated image
or even an audio response. So basically
it understands everything together and
responds in any form you need. So let us
have a look at its use cases. It is used
in complex queries such as summarize
this video and create a chart. It is
used in advanced digital assistance and
also in future AR VR multimodel
applications. The next model is GPT40
from open AI. It's a optimized
multimodel model. It accepts text,
images, audio in real time and it uses
fused embedding and parallel processing
for speed and it works as a true
interactive assistant. So here are its
use cases. It is used in conversational
AI with vision plus audio and in
realtime assistance where you upload an
image and get explanation instantly and
also in accessibility tools for example
describe surroundings for visually
impaired users. So these models
represent different approaches to
multimodality.
Some align subreate encoders like clip.
Some bridge vision plus LLM like BLP2
and some are natively multimodel like
Gemini and GP4. So now let us see how
are multimodel models trained. So
training multimodel models is much more
complex than training single model
models. So first is the data set
alignment. So you need paired data sets
such as images plus captions, videos
plus transcripts and audio plus text. So
the challenge is the text and images
don't always align perfectly. Next is
the contrastive learning. So train a
model to pull matching pairs closer and
push non-matching pairs apart. For
example, image of a cat plus caption a
cat is a matching pair. Whereas image of
a cat plus caption as a a dog is not a
matching pair. Next is the masked
modeling. Mask parts of the input such
as image patches text tokens as model
predicts missing information. Then it
forces the model to reason across
modalities. For example, mask the object
in a caption a dash is sitting on the
table plus providing image. Next is the
fusion and cross attention training
where models like Flamingo or Gemini
train cross attention layers to
integrate modalities. It requires huge
compute clusters. Next is scaling loss
like LLM's multimodel models get better
with size and data diversity. Gemini
plus GPD40 trained on massive multimodel
corpora. So here are the training
requirements. You need to have high
quality pair data set, billions of
parameters and TPUs, GPUs for weeks or
months and advanced optimizations such
as mixed precision or shade training. So
why is true multimodel AI still hard?
It's because of data mismatch. Text is
sequential, images are spial, and audio
is temporal. So aligning them perfectly
is difficult. Next is limited high
quality data. So billions of image text
pairs exist but have noises and bias.
Next bias and fairness models learn
cultural and social biases from
multimodel data. For example,
stereotypes in images and captions. The
next challenge is compute cost. So
training needs huge GPU clusters. For
example, hundreds of A00 GPUs and
fine-tuning multimodel models is even
more expensive than text only. And the
final challenge is the evaluation
difficulty. So how do you measure
reasoning across modalities? So there's
no single easy benchmark. So while
multimodel AI is powerful, it's also
data hungry, compute heavy, and still
evolving.
So in simple words, the multimodel AI
can process and combine multiple types
of data such as text, images, audio, and
video. It's already in use, such as GPD
vision, Google Lens, self-driving cars,
and healthcare AI. It's a big step
towards AI that can understand the world
like humans do. So what do you think?
Will multimodel AI make AI more
humanlike? So drop your thoughts in the
comments.
>> [music]
>> So what is a transformer? Transformers
operate on a concept called sequencetose
sequence learning. Essentially they take
a sequence of tokens as an input and
predict the next token in the output. A
great example of this is language
translation. Imagine inputting good
morning in English and the transformer
process this and outputs the translation
in languages like Japanese, Korean or
German. The key is how it efficiently
process the relationship between words.
Since we know what a transformer is,
let's dig a bit deep about them. A
transformer has two primary components
encoder and decoder. Encoder identifies
relationship between parts of the input
sequence. Whereas the decoder uses these
relationships to generate the output
sequence. This division is what allows
transformers to handle task like text
translation or summarizations with
remarkable accuracy. Now that we have
the idea of transformers, let's discuss
how they evolve. Before transformers,
there were other neural networks like
RNN, recurrent neural networks invented
by David Rumlhur in 1986.
However, RNN's face significant
challenges. They would forget early
parts of the sequence as they processed
longer ones and couldn't handle
dependencies efficiently. Additionally,
RNN relied on recurrence which made them
inefficient and incapable of
parallelization.
Then came long short-term memory
introduced by Howeter and Smith Huber in
1997.
Long short-term memory improved by
remembering sequences for a longer
duration and addressing some of the
memory issues in RNN.
However, they were slow to train and
difficult to manage at scale. Finally,
transformers transformed neural
networks. First introduced in the
landmark paper, attention is all you
need. Transformers addressed all the
problems faced by RNNs and LSTMs. They
used a completely attention-based
mechanism, eliminating reliance on
recurrence. This made transformers
capable of remembering context
efficiently, training faster and being
parallelized, enabling multitasking and
significantly speeding up processes.
Now, let's discuss on the attention
mechanism. Think about the sentence.
This cat wants to jump on the box. The
attention mechanism identifies the most
relevant parts of this sentence like
cat, jump and box and focuses on these
elements while processing the data. Now
that we know how transformers have
evolved, now let's discuss their
architecture. A transformer consists of
two main components, an encoder and a
decoder. Each typically consisting six
layers. Inside the encoder, there is one
attention layer and one feed forward
layer. While the decoder contains two
attention layers and one feed forward
layer. The magic of parallelism comes
from how data is fed into the network.
In the attention layer, all the words
are processed simultaneously with each
word forming combinations with other in
the sentence. This allows the model to
capture relationships and context
efficiently. After processing in the
attention layer, the data is sent to the
feed forward layer where it is learned
layer by layer. The input to the encoder
and decoder are the raw input embeddings
which are numerical representation of
words. On top of these embeddings,
positional encodings are added to help
the model understand the position and
the order of each word in the sequence.
If we simplify embeddings, there are
essentially vector representation of
words in an n dimensional space. At the
top of the architecture there are two
layers of the output probabilities
converting the final output into the
form of human can understand. These
inputs are represented as vectors with
their length corresponding to the size
of vocabulary.
Now what truly makes transformer unique
is the inclusion of normalization layers
which normalize the output from sub
layers. Additionally, skip connections,
the dark arrows in the architecture,
forward critical information that
bypasses self attention of fed forward
layers directly to the normalization
layers. This ensures the model does not
forget important details and effectively
passes vital information further into
the network. Now, moving forward, let's
discuss why transformers are important.
Transformers are vital because they
utilize semi-supervised learning. They
are trained on massive unlabelled data
sets enabling them to generalize across
a wide range of task. Unlike older
models, transformers don't need to
process data sequentially. Their
attention mechanism allow them to focus
on the most relevant context which
significantly speeds up training.
Transformers revolutionize data
processing by eliminating the need to
handle data sequentially allowing for
parallel processing and significantly
enhancing efficiency. The attention
mechanism lies at the core of
transformers, enabling the model to
focus on the most relevant parts of the
input sequence and improving accuracy
and understanding of context.
Furthermore, transformers excel at
providing context, ensuring that the
meaning of each word or token is
accurately interpreted within its
surroundings. Lastly, these models
dramatically speeds up the training
process, making them faster and more
efficient compared to traditional neural
networks, thus redefining AI's
capabilities across diverse
applications. Now that we know why
transformers are important, let's
discuss some applications.
We have OpenAI's GPT, a groundbreaking
model that leverages the power of
transformers for natural language
processing task. Additionally, Google
has developed several transformer-based
models including vision transformer for
image recognition but birectional
encoder representations from
transformers for understanding the
context of words in a sentence and T5
which stands for texttoext transfer
transformer for a wide range of text
generation task. Microsoft has also
contributed with debirth decoding
enhanced bird with disentangled
attention a model designed to improve
contextually understanding and enhance
NLP applications. These models
demonstrate the versatility and impact
of transformer architecture across
various domains. Now that we know the
application of transformers, how about
checking their realtime products?
Transformers have become an integral
part of many real world products that we
use style. Examples include Grammarly
which leverages transformers for
advanced grammar and writing assistance,
Google search and its translation tools
powered by models like bird and t5 and
chat GPD open AI's conversational AI
that relies on the generative pre-chain
transformer architecture. Additionally,
Meta's deep fake detector uses
transformer-based models for facial
recognition task. These applications
highlight how transformers have
revolutionarized technology, seamlessly
integrating it into tools that enhance
our everyday lives. In conclusion,
transformers are changing the tech world
by enabling smarter, faster, and more
efficient AI systems. Whether it's
generating text, translating languages,
or enhancing search engines, these
models are the cornerstone of modern AI.
[music]
So what are RNN's, right? Well, RNN
basically stand for recurrent neural
network and we usually use this in order
to deal with a sequential data.
Sequential data can be something like a
time series data or a textual data of
any format. So why should one use RN and
write? But this is because there's a
concept of internal memory here. RNN can
remember important things about the
input it has received which allows them
to be very precise in predicting what
can be the next outcome. So this is the
reason why they are performed or
preferred on a sequential data
algorithm. Okay. And some of the
examples of sequential data can be
something like time series, speech,
text, financial data, audio, video,
weather and many more. Although RNN were
the state-of-the-art algorithm for
dealing with sequential data, they come
up with their own drawbacks. And some of
the popular drawbacks over here can be
like due to the complication or the
complexity of the algorithm the neural
network is pretty slow to train and as
there are huge amount of dimensions here
the training is very long and difficult
to do. Okay. Apart from that the most
decisive feature for RNN or for the
improvement in RNN is that of a
vanishing gradient. What this vanishing
gradient is is that you know when we go
deeper and deeper into our neural
network the previous data is lost. This
is because of a concept called as
vanishing gradient and due to this we
cannot work on a large or a longer
sequence of data. Okay. To overcome this
we came up with some new or upgrades to
the current recurrent neural networks or
RNN. Starting off with birectional
recurrent neural network. You see
birectional recurrent neural network
connect two hidden layers of opposite
direction into the same output. With
this form of generative deep learning,
the output layer can get information
from past future states simultaneously.
So as you can see here we have two
layers over here and as they are
birectional what happens is when the
algorithm feels that it is kind of
losing its gradients or the previous
data it can go back and get the data
from the past. So why do we need
birectional recurrent neural network?
Well, birectional recurrent neural
network duplicates RNN processing chain
so that the input process both forward
and reverse time order thus allowing
birectional recurrent neural network to
look into future context as well. The
next one is long short-term memory. Long
short-term memory or also sometime
referred to as LSTM is an artificial
recurrent neural network architecture
used in the field of deep learning.
Unlike standard feed forward neural
network, LSTM has a feedback
connections. It can not only process
single data point but also the entire
sequence of data. So as you can see here
from what I'm trying to say is with LSTM
or longsh short-term memory it has
something like you know we can feed a
longer sequence compared to what it was
with birectional RNN or RNN. So why is
LSTM better than RNN? We can say that
when we move from RNN to LSTM we are
introducing more and more control over
the sequence of the data that we can
provide. whereas LSTM gives us more
controllability and there's better
results. All right. So the next type of
recurrent neural network is the gated
recurrent neural network or also
referred to as GRUs. You see GRU is a
type of recurrent neural network that is
in certain cases is advantageous over
long short-term memory. GRU makes use of
less memory and also is faster than
LSTM. But thing is LSTMs are more
accurate while using longer data sets.
I'm sure by now you might have got a
hint about the trend that has lead to
the improvement. Right? So the trend
over here is you know the model should
be capable of remembering and taking in
on a longer input sequence. The game
changer part for the sequential data was
developed when we came up with something
called as transformers. And this paper
was something which is based on a
concept called as attention is
everything. All right. So let's take a
look at this. The paper attention is all
you need introduces a novel architecture
called as transformers. Like LSCM
transformers is an architecture for
transforming one sequence into another
while helping other two parts that is
encoders and decoders. But it differs
from previously described sequence to
sequence model because it does not work
like GRUs. Okay. So it does not
implements recurrent neural networks.
Recurrent neural network until now were
one of the best ways to capture the
timely dependence on a sequence.
However, the team presenting this paper
that is attention is all you need prove
that an architecture with only attention
mechanism does not use RNN can improve
its result in translation task and other
NLP task. One of the best examples for
transformers is Google's bird. So what
exactly is this transformer? Right? You
see here we have encoder on the top and
decoder on the bottom. Both encoder and
decoder are comprised of modules that
can stick onto the top of each other
multiple times. So what happens here is
the inputs and outputs are first
embedded into n dimension space since we
cannot use this directly. So we
obviously have to encode our inputs
whatever we are providing here. One
slight but important part of this model
is the positional encoding of different
words. Since we have no recurrent neural
network that can remember how sequence
are fed into the model. we need to
somehow give every word or part of our
sequence a relative position since a
sequence depends on the order of the
elements. Okay, these positions are
added to the embedded representation of
each words. All right, so this was the
brief about transformers. So let us now
move ahead and see some of the popular
language models that are available in
the market. All right, so let us now
start off by understanding OpenAI's
GPT3. The successor to GPT and GPT2 is
the GPT3 and is one of the most
controversial pre-trend models by
OpenAI. The large scale
transformer-based language model has
been trained on 175 billion parameters
which is 10 times more than any previous
non-sparse language model. The model has
been trained to achieve strong
performance on many NLP data set
including task like translation,
answering questions as well as several
other tasks. Then we have Google's bird.
Bird stands for birectional encoder
representations from transformers. A is
a pre-trained NLP model which is
developed by Google in 2018. With this
anyone in the world can train either
their own question answering module with
up to 30 minutes on a single cloud TPU
or few hours using single GPU. The
company then released this showcasing
the performance of 11 NLP task including
very competitive Stanford data set
questions. Unlike other language model,
bird has only been pre-trained on 250
million words of Wikipedia and 800
million words of book corpus and has
been successfully used as a pre-trained
model in deep neural network. According
to researchers, bird has achieved 93%
accuracy which has surpassed any
previous language models. Next, we have
Elmo. Elmo also known as embedding for
language model is a deep contextualized
word representation that models syntax
and semantic words as well as their
logistic context. The model developed by
Alan LLP has been pre-trained on a huge
text corpus and learn functions from
birectional models that is by LM. Elmo
can easily be added to their existing
models which drastically improves the
features of functions across vast NLP
problem including answering questions,
textual sentiment and sentiment
analysis.
[music]
Prompt engineering is an interesting
field that combines artificial
intelligence and human language
understanding. In this field,
professionals and researchers work to
create prompts or instructions that
effectively guide AI systems to produce
the expected outcome. Whether it's
fine-tuning language model, designing
prompts for specific task, or optimizing
human machine communication, prompt
engineering is crucial for leveraging
the power of AI for a variety of
applications. Imagine you are developing
a virtual assistant application using a
large language model such as GPT3. The
goal is to provide users with an
engaging and helpful experience by
designing effective prompts that
generate informative and relevant
responses from the model. So let's
consider a scenario in which the virtual
assistant assist users with the travel
planning. So here's how prompt
engineering plays a major part. So the
scenario is you're planning a trip to
Paris and want the virtual assistant to
provide recommendations for activities,
restaurants, and landmarks to visit
during your stay. So let's say you're
looking for a help with a traditional
prompt and you ask for what should I do
in Paris and virtual assistant will
assist you here like here are some
recommendations for activities in Paris
and here's how the enhanced prompt
through prompt engineering respond to
your queries. So if you input a query
that goes like hey there I'm super
excited about my upcoming trip to Paris.
So could you please recommend some must
visit places and activities for me then
the virtual assistant will generate the
response as something like this. So of
course Paris is an amazing city with so
much to offer. So here are some must
visit places and activities and it
continues with the explanation about
each place. I hope you got the idea of
how enhanced prompt provides users with
an engaging and helpful experience by
designing effective prompts that
generate informative and relevant
responses from the model. So now let us
understand what exactly is prompt
engineering. Prompt engineering is a
method used in natural language
processing that is NLP and machine
learning. It's all about crafting clear
and precise instruction to interact with
large language model like GPT3 or B. So
this models can generate humanlike
responses based on the prompts they
receive. Think of prompt engineering as
giving direction to these models. By
crafting specific and concise prompts,
we guide them to produce the response we
want. So to do this effectively, we need
to understand the capabilities of the
model and the problem we are trying to
solve. Finetuning prompts allows
researchers and developers to improve
the performance and usability of LLMS
for a variety of applications including
text generation. question answering,
language translations and others.
Effective prompts engineering
necessitates a thorough understanding of
the underlying models capabilities as
well as the problem domain and desired
result. Now let's find out why prompt
engineering matters for AI. So prompt
engineering is important in AI because
it improves model performance,
customization, and reliability. By
creating clear and tailored prompts,
developers can help AI models produce
more accurate and relevant result,
reduce biases, improve user experience,
and address ethical concerns. In simple
terms, prompt engineering ensures that
AI system produce useful and reliable
result that meets the needs of users
while adhering to ethical principles. So
now let's consider an example in the
context of text generation for
generating product description. Assume
you're using an AI model to create
product description for an online store.
So without prompt engineering, you may
issue a generic prompt such as generate
a product description for a smartphone.
So without prompt engineering, you would
get something like this. This smartphone
has a high resolution display, powerful
processor, and a longlasting battery
life. The given prompt is less effective
because it lacks specificity. So it
simply says, generate a product
description for a smartphone. So this
may make it difficult to come up with an
idea and write something engaging and
informative. So having a good prompt can
make a significant difference in your
writing. They give you a clear idea of
what you need to write about and keep
you focused and organized making it
easier to generate ideas and express
yourself. On the other hand, by using
prompt engineering techniques, you can
provide more specific instructions or
constraints that will tailor the
generated descriptions to the target
audience or brand style. So with prompt
engineering, if you input a query such
as create a product descriptions for a
budget friendly smartphone perfect for
the young professionals highlights, it's
affordable, sleek and packed with a
top-notch camera features and the
generated response would be something
like this. Introducing our sleek and
affordable smartphone design for young
professionals with its stylish design
and advanced camera features capturing
life's moments and have never been
easier and it goes on giving its key
features along with it. So through this
example we understood that prompt
engineering enables the creation of a
product description that is useful to
the target audience and highlight
specific features based on the
instructions provided. So this shows how
prompt engineering can improve the
importance and effectiveness of AI
generated content for specific
applications. To help AI models give
accurate answers, it's important to
create clear prompts. So here are some
simple rules for generating effective
prompts. First, make it clear. So
clearly explain what you want the AI to
do. Unclear prompts might confuse the AI
and lead to wrong answers. So make sure
that the prompts is clear. For example,
the unclear prompts is something like
write about cars. So where we have not
mentioned which type of car or anything
much in details whereas the clear prompt
is write a description of a red
convertible sports car. Next give
context. So provide enough information
so that AIS understands the task. So
this helps it give accurate response
that makes sense in the given situation.
So for example, prompt without context
is write a story. Prompt with context is
write a story about a girl who discovers
a magic book in her attic. Next, show
examples. Use examples to show the AI
what you are looking for. So this helps
it understand the type of response you
want. So for example, the prompt without
example is describe a beat scene. So
prompts with examples is describe a beat
scene with the palm trees, crashing
weaves, and the people playing
volleyball. Next is keep it short. So
don't overload the AI with too much
information. Short prompts help the AI
focus and give quicker, more accurate
responses. For example, long prompts are
like this. Write a detailed essay
discussing the impact of climate change
on biodiversity and ecosystems in
topical rainforest. And short prompts
look something like this. Write about
climate change effects on rainforest.
Next, avoid biases. So, make sure your
prompts are fair and don't include any
unfair assumptions. So, biased prompts
can lead to biased answers which isn't
helpful. So, for example, write about a
woman who struggles with her weight. So
unbiased prompts are right about a
person overcoming challenges. Next, set
limits. So tell the AI any rules or
restrictions it needs to follow. This
helps guide its response and ensures
they meet your specific needs. For
example, prompt without limits are write
a story and prompt with limits are write
a story set in a haunted house with a
maximum word count of 500 words. And I
hope it's very clear. Next, moving on to
some example of prompts for generating
text using chat GPT. For text generation
task, prompts usually consists of a
textual instructions or starting point
that directs the model to produce
coherent and relevant text. Prompts can
be story prompts, questions, or
incomplete sentences. Text generation
prompts provide context and directions
to the model, allowing it to generate
humanlike text responses. They influence
the generated text tone, style, and
context. So let's say the prompt is
write a short story about a character
who discovers a hidden treasure. So by
providing a specific story line and
theme in the prompt, the model is guided
to generate a coherent and engaging
narrative centered around the discovery
of a hidden treasure. So the picture
illustrate how Chad crafts stories with
the engaging touch making them more
captivating and interesting for readers.
Next question answering. So prompt is
can you describe the common signs and
symptoms of COVID 19 along with any
precautions that can be taken to stay
safe and just like that it can generate
answers to all your questions in mere
seconds. So by framing the prompt as a
question the model is directed to
provide a concise answer regarding the
symptoms of COVID 19 ensuring relevant
and informative responses. Next,
language translation. Translate the
given English sentence. The quick grown
fox jumps over the lazy dog into Spanish
while maintaining its original meaning.
So, by specifying the source and target
language in the prompt along with the
input sentence, the model is instructed
to perform a precise translation task
ensuring accurate language conversion.
Next, code auto completion using OpenAI
codeex or Chadi. you can perform code
auto completion task. So here we go with
TPD. Code generation prompts are usually
partial code snippets or descriptions of
programming task. They specify the
desired functionality or behavior that
the model should show. Code generation
prompts allow the model to generate code
that satisfy specific programming
requirements such as implementing
algorithms, defining functions, or
solving coding problems. So the prompt
is complete the following Python
function to calculate the factorial of a
number and here you have also added the
function. So by presenting an incomplete
code snippet along with clear
instructions the model is directed to
suggest appropriate code completion
helping developers write code more
efficiently. Now moving on to the text
to image generation. Image generation
prompts specify the visual sense,
objects or concept that the model should
generate. They may include textual
descriptions, keywords or images. So
image generation prompts allow the model
on what visual content to generate. They
influence the generated images,
composition, style and detail. For
example, the prompt is imagine a tree
where the branches are made of stacks of
books. So can you paint me a picture of
that? And for the given prompt, we got
the image generated as something like
this. An imaginative portrayal of a tree
with branches composed of stat books.
Eight book representing a leaf and
covers visible.
And the next prompt is picture a cloud
in the sky that looks like a huge heart.
Can you draw that for me? And here we
go. These AI tools leverage prompt
engineering techniques to generate text,
perform language translation, code auto
completion, and text to image
generation, demonstrating the
versatility and power of prompt based on
interactions with AI models. Next, why
is machine learning useful in prompt
engineering? Machine learning is very
helpful in prompt engineering especially
in linguistic and language models
because it helps create better prompts
and interactions by analyzing lots of
data and finding patterns. So first
understanding language patterns. Machine
learning algorithms can analyze large
amounts of text to understand linguistic
patterns like grammar, syntax, semantics
and context. So this understanding is
critical for developing effective
prompts that generate desired responses
from language models. Next, generating
relevant prompts. Machine learning
models can suggest or generate prompts
based on input data and user
preferences. These prompts can be
tailored to specific task, domains, or
user requirements, making them more
useful and efficient for guiding
language models. Next, optimizing
prompts design. Machine learning
techniques can be used to optimize
prompt design by comparing the
performance of various prompts and
selecting the one that produce the best
result. This iterative process improves
prompt engineering practices and the
overall performance of language models.
And the next is personalizing
interactions. Machine learning enables
personalized interaction by creating
prompts to individuals users
preferences, history and context. This
personalization increase user engagement
and satisfaction with the language model
interaction. Next, improving model
performance. Machine learning algorithms
can be used to fine-tune language models
based on prompt response pass increasing
their performance and accuracy over
time. Language model can be trained on a
variety of data set and prompts to
produce more relevant and contextually
appropriate responses. And next,
mitigating bias and misinformation.
Machine learning techniques can help
identify and mitigate preferences in
prompt engineering by examining prompt
responses pairs for potential biases or
inaccuracies.
Language models can produce more fair,
inclusive, and reliable results by
detecting and correcting for
preferences. And I hope it is clear why
machine learning is useful in prompt
engineering.
>> [music]
>> Now let us understand what langchen is
and why it is a valuable tool for
building AI applications. You must be
aware of popular applications such as
GPT and Gemini. These applications
utilize APIs and GPD uses OpenAI's API
while Gemini operates through the Gemini
API. to process prompts. They leverage
models like GPT 3.5, GPT4, Palm and
Gemini 1. Additionally, these are other
advanced models such as Llama, Gemini,
Cohair, Cloud version 1, Falcon, Palm,
GPT4, and GPT 3.5. Langchain is a
framework designed to help developers
build flexible and powerful AIdriven
applications by integrating and
utilizing these diverse models
effectively. But why exactly do we need
lang chain? You must be thinking if
langchain is this important then why do
we need lang chain? So let's break down
this question using some real world
examples. So imagine simply asking an
LLM a prompt and getting an answer.
That's easy. But what happens when the
complexity increases? For example, let's
say you're working with data from SQL
databases, CSV files, PDFs, or Google
Analytics, and you need the model to
write code, perform searches, or send
emails. Handling such intricate
workflows manually can get overwhelming.
This is where Langton steps in. It
simplifies the process by offering
components like document loaders, text
splitters, vector databases, prompt
templates, and tools. So this helps you
assemble tasks such as document
summarization, question and answer
systems or even advanced workflows like
Google searches or customer support
automation. Let's visualize this process
with a diagram. Here's how it works. So
first you load a document like a CSV
file using a document loader. Then use a
text splitter to divide it into a
smaller chunks and then store those
chunks into a vector database and add a
prompt template to guide the model. And
finally, use a LLM like a GP4 or Lama to
perform tasks like searching the web or
automating workflows. And lang chain
also offers chains that will help you
assemble components to achieve single
task such as summarization and an agenda
to figure out what each component must
do like password, customer services,
etc. Now that we understand langen's
core components, now let's explore how
it streamlines the LLM application life
cycle.
So it typically involves three key
stages. First is the development where
you build and test your application.
Then productionization where the system
is fine-tuned for a real world use. And
finally deployment where the final
product is launched for users. So Lang
simplifies this life cycle allowing you
to focus on building without worrying
about the underlying complexity. Now
let's take a step back and understand
the role of APIs in powering these LLM
applications and how Lankton effectively
integrates them. In all these
applications and models, one thing is
common that is they use API. So now
let's discuss APIs. APIs act as an
intermediaries that enable different
systems to communicate with each other.
For example, they allow apps like Swiggy
or Blinket to display your delivery
driver's location in real time. So now
let's look at the steps to explain APIs
and API keys. So apps like Zipto, Swiggy
and Blinket use APIs to show the
location of your delivery driver. So
these apps don't communicate directly
with Google Maps but follow a layer
process involving servers and security
mechanism. First the app sends a request
to Google Maps API. Then the API forward
the request to Google servers. Then the
servers validate the request with the
system. So once approved the response
follows back through the servers, APIs
and finally to the app. So previously
apps like Swiggy allowed login using
phone numbers. Now they use API for
login via platforms like Google or
Facebook. So this demonstrates the
versatility of a APIs in enabling
seamless user interactions. To prevent
misuse, APIs require API keys which are
unique identifiers for secure access. So
these keys authenticate request and
ensure that only authorized users can
interact with the APIs. Next, security
systems closely monitor API usage to
detect and prevent misuse. This ensures
that APIs remain safe and functional for
their intended purpose. And these steps
simplify the explanation of how APIs and
API keys work in real world
applications. So this is how langchain
leverages APIs to connect your LLM
applications with external tools making
them versatile and secure. Now that we
understand the role of APIs. So let's
explore some real world applications of
lang chain. So what can you build with
lang chain? Here are few applications.
First application we have is customer
support. So customer support for your
shopping websites to interact with
customers. Next, conversational chat
bots for helping you study, content
generation tools for blogs or social
media. We also have question answering
systems for knowledge bases and then
document summarizers for legal or
academic content. Lenin simplifies AI
development by integrating LLM with
various data sources and tools. Its
applications are vast from chat bots to
document summarization. So, let's
examine a practical example to see Lchin
in action.
All right. In today's datadriven world,
understanding and effectively using SQL
queries is crucial for managing and
analyzing large data sets. However,
beginners and even experienced users
often need help with complex SQL
queries, their syntax, and how they
work. This creates a barrier to
efficiently interacting with databases
and limits their potential to solve real
world problems. To address this
challenge, we propose a SQL query
fetcher application that leverages the
Gemini AI, Python, and Streamlit to
simplify SQL learning and usage. The
application allows users to input or
select a query, generates the SQL
syntax, and provides a detailed
explanation of its components and
functionality. This tool bridges the gap
between technical understanding and real
world database operations, empowering
users with an initiative and interactive
SQL learning experience. Let's jump
right into the code. So the first step
is setting up your dependencies. Here we
import streamlit for the user interface
and then Google generative AI for using
Gemini. So first import streamllet as SD
and next import
Google dot generative AI as Gen AI.
So to get this API you have to go to the
Google Gemini API key and here click on
get a Gemini API key in Google AI studio
and then once you scroll there is a
button on the left called create API.
Now click on it and select your model
here
and let's copy it. And now let's go back
to our VS code editor and paste it here.
So to paste let's type
API
key and inside the double quote let's
paste it. And now let's type genai dot
configure
and inside the bracket let's keep it as
api key equal to and give it as
google_appi
key. Now let's type model equal to genai
dot generative model
and inside the bracket let's keep it as
gemini pro.
So we use the Google Gemini API to
generate SQL queries dynamically. So
make sure to configure your API keys
securely. Now let's display the streaml
layout code.
Now let's set up the app's user
interface. So we use streaml to create
an interactive page where users can
input plain English queries and get SQL
code in return. So we write st dot set
page_config
and inside the bracket let's give it as
page title
and inside the double quotes let's give
a title as edureka sql query generator
and give a comma and let's type it as
page icon equal equal to and inside the
double quote let's keep it as robot.
Now let's put some images. So I'm using
Edurea image and SQL logo and also to
make them center we will type it as
column 1 comma column 2 comma column 3
equal to st dot columns
and inside the bracket let's keep it as
1a 2a 1. Next, let us type width column
2 colon
and let's type it as st dot image and
inside the bracket let's give the image
address and then width is equal to 200.
Now let us add another image. So let's
copy the same and give the other image
address.
Our layout includes a title, logo and
text input box to keep the interface
simple and initiative. So here's where
the magic happens. So when a user clicks
the generate SQL query button, we format
their input into a prompt for the Gemini
model to generate SQL code. So let's
create template by writing template
equal to and inside the triple quotes,
let's type it as create uh SQL query
snippet using the below text.
Next, let us also give text input and
we'll also type I just want a SQL query.
Now let's type the response. So type
response equal to model dot generate
generate
content
and let's keep it as template dot format
and inside the bracket let's give
text_input
equal to text_input
and next let's type the SQL query. So
give SQL query
equal to response
dot text
dot. So let's give the strip function
dot lstrip function dot r strip
function. So the AI generates the SQL
query and we clean up the output for
display. So once the SQL query is ready,
we take it a step further by generating
a sample expected output and a clear
explanation of the query. Now let's type
the logic for showing explanation and
output. So let's type st dot markdown
and inside the bracket we will give HTML
tags. So first div style is equal to we
will align the text and center. So text
align center and next let's give H1 tag
and inside H1 tag we will write SQL
query generator
and let's close the H1 tag. Next, let's
open H3 tag and write I can generate SQL
queries for you. And let's close the H3
tag. And inside the H4 tag, let's type
it as with explanation as well.
Now close the H4 tag and let's open the
paragraph tag which is the P tag. And
inside the P tag, let's type it as this
tool allows you to generate
SQL queries based on your data. Now let
us close the P tag. Also close the div
tag.
Now to make the markdown visible, let us
type unsafe
allow
HTML equal to true. Now let's write text
input equal to st.ext
area and inside the bracket let's give
it as enter your query here in plain
English. Now let us give a submit
button. So for that let us type submit
button equal to st dot
button and inside the bracket let us
give generate SQL query. Now if submit
button colon write it with st. dot
spinner and inside the bracket let's
keep generating SQL query
and then let's create a template and
inside the trible quotes let's type it
as
create a SQL query snippet using the
below text.
Now using the about template we will
write three more templates for SQL query
which are text input and then SQL query
which will include expected output and
also explanation output.
Now to merge all the templates together
we will make a container. So we will
write with st.container.
So it's a function and let us also write
it as st dot
success
and inside the bracket let us give it as
SQL query generated successfully
also we will give here is your query
below next
st code
and inside the bracket let us give SQL
query Okay. And comma language equal to
SQL.
Now let us give once again st dots
success and inside this let us keep it
as
expected output
of this query will be
and now let us keep it as st dot
markdown and inside the bracket a output
once again st dots success and inside
this let us keep it as explanation of
this SQL query.
Next let us give st dot markdown
and inside this function let us keep it
as explanation. So over here this shows
a green success message indicating the
SQL query was generated successfully and
next the show SQL query. This displays
the SQL query as a formatted code block
highlighting it as a SQL. Next is the
display expected output. So this
provides a success message for the
query's expected output followed by st
markdown e output which displays the
expected output in markdown format and
followed by st domarkdown e output which
displays the expected output in the
markdown format. So now this line of
code introduce an explanation and st.m
dot markdown explanation displays that
it is in markdown format for clarity. So
this makes the tool valuable for both
learning and debugging SQL. So now let's
see it in action. So open the terminal
and let us type streamlit run and give
your file name.
Now as you can see the screen your SQL
query generator is ready to go. Now
let's test it. So for that here I will
input a prompt asking for a query which
is give me the query for create table.
Now let's click on generate SQL query
and as you can see it's running. So
let's wait for it to generate.
So as you can see on the screen the app
generates a SQL query expected output
and even a plain English explanation in
seconds. So how cool is that right? And
that's it. Our SQL query generator
powered by line chain Gemini API and
streamlate is complete. So this project
is perfect for simplifying SQL learning
and enhancing productivity.
[music]
So before we talk about agents, let's
quickly understand lang. Langchain is a
framework designed to help you connect
large language models such as GPT with
external tools, APIs, memory, and custom
logic. Normally, LLM like chat GPT can
only generate responses based on the
text you give them. But what if you
wanted to search the web, run Python
code, query a database, or use a
calculator? That's where Lantern comes
in. It acts as a bridge between the LLM
and the tools it can use to interact
with the real world. And one of the most
powerful features in lang chain is
agents. So what exactly is a lang agent?
So think of it like this. Instead of you
telling the AI exactly what to do, you
just give it a goal and the agent
figures out how to get it done. An agent
combines the power of reasoning,
decision making, and tools. It uses the
LLM to understand the task, choose which
tools it needs, call those tools in the
right order, and then return the final
result to the user. It's like giving
your AI assistant a toolbox and letting
it decide which tools to use based on
the question you ask. So, let's look at
a real example to make it clear. Imagine
this prompt. Check the current stock
price of Apple and calculate the average
over the past 5 days. A regular chatbot
can't do that. But a langin agent can
use a web search API to find today's
stock price and use a Python tool to
calculate the average and then respond
with the results all automatically. So
here's what happening under the hood.
The LLM receives your prompt and it
decides it needs to search and calculate
and then it picks the right tools maybe
SER API for search and Python for math
and it performs each step in a sequence
and gives you the final output. So this
is all done dynamically meaning you
don't hardcode each step. So the agent
figures it out using the language models
reasoning. Now let's look at the inner
working of a lang agent. So when you
create an agent in blankchain, you
define three things. First the LLM to
use like GPT4 or CL. Next the tools
available like calculator, web search,
database query etc. Next the agent type.
So the lang supports types like zeros
agent and conversational agent. So now
that you understand how lang agents
work. So let's quickly talk about the
two most popular types. So first we have
zero short agent. So this is the most
commonly used agent. It works by giving
the language model a list of tools along
with a description of what each tool
does. Then the model uses that
information to figure out on the fly
which tool to use and in what order. So
it's called zeros because the model
doesn't get examples. It just reasons
based on the tool descriptions. And it
is best for task that doesn't need
memory or a back and forth conversation
just like data lookups, calculations or
API calls. And the next type is the
conversational agent. This one is more
advanced. It is designed for multi-turn
conversations. That means the agent
remembers previous steps and keeps track
of what's already been done. So it uses
a chat history and a memory module to
maintain context across multiple
prompts. And it is best for chat bots,
virtual assistants or tools where the
user ask follow-up questions or expect
the AI to remember context. So in short
I can say that the zeros short agent is
fast, simple and oneshot task. Whereas
the conversational agent is contextaware
back and forth dialogues and there are
other agents like tool using agents,
plan and execute agents or multi-action
agents for more advanced workflows and
perfect for the future deep dives. Now
to understand lang chain agents let's
quickly explore its core building
blocks. So first we have the LLMs. It is
a brain of the system. Lang chain
supports modules like openAI GPT 3.5 or
GPT4 also anthropic alert and hugging
face Olama cohire etc. And the next
component is prompts these are the
templates that guide the LLM's behavior.
You can use static prompts or chat
prompt template for more dynamic and
multi-turn interactions.
Next we have chains. It is a sequence of
calls or logic. Next tools. These are
the external functions the LLM can call
such as Python calculator, web search
API or SQL query executor. Then we have
agents. Agents dynamically decide which
tool to use and when based on your
input. So agents are what lang chain
from a chatbot into a multi-tool problem
solver. So over here the tools are just
Python functions wrapped in a lang
format. For example, this is the Python
function. So, it's like giving your AI
assistant a toolbox and letting it
decide what to use based on your prompt.
Next, the agent then follows a process
called react, which stands for reasoning
plus acting. So, here's what that looks
like. So, first, the agent receives the
prompt. Then the LLM decides that I need
to search the web. So, the lantern calls
the search tool and then the tool
returns the result. Next lm reasons now
I need to do a calculation. So the
lantern calls the calculator tool and
agent returns the final answer. So here
is the example of the react loop. So the
thought is I need to find today's
weather. The action that takes is it
uses a weather API. Next is the
observation. For example, it's a 28ยฐC in
Bangalore. The thought is now I can tell
the user the temperature. And the final
answer would be it's currently 28ยฐC in
Bangalore. So this entire flow is
written and passed by the LLM itself
using intermediate steps called scratch
packs. So the lang passes those steps
and knows when to call a tool or stop.
Now let's look at where Linen agents are
used in real world projects. So first it
is used in AI customer assistance. So
the agents can look up user info, reset
passwords, and respond to queries
automatically. So the users can ask
things like what was my profit margin
last quarter. So the agent pulls data
from a database, does the math and
explains it. Next, the langin agents can
be used in research tools. So you can
build a research bot that searches
multiple sources, summarizes and gives
you an answer step by step. then in
automated workflows like send a message,
create a task in a Trello and update the
CRM all with one prompt. So that's the
power of Langchain agents. So they allow
your language models to take action, use
tools and solve real world tasks step by
step. So let me know in comments if you
want a full coding tutorial on building
your first Lang agent.
Rack is a hybrid approach in artificial
intelligence that combines retrieval
systems with generative models to
produce highly accurate contextually
relevant responses. It brings the gap
between fractual accuracy and natural
language generation. Now let's
understand it with the help of diagram.
So it's a hybrid approach involving
artificial intelligence that combines a
retrieval system with a generative
system to produce highly accurate
responses. Now that we know what rag is,
so let's explore why it is crucial for
large language models and see a real
world example. So rag addresses several
limitations of traditional LLMs. It
mitigates illusions by grounding
responses in fractual retrieve data by
dynamically accessing up-to-date
information. Rack stays relevant in
rapidly challenging domains. It improves
accuracy and relevance by fetching
specific relevant documents during
inference. By outsourcing factual
knowledge retrieval, rag enables
smaller, more efficient models and it
can adapt to domain specific knowledge
basis for specialized applications.
Additionally, rack provides
explanability by showing the retrieved
documents or data sources increasing
trust and transparency. Now let us see
some of the use cases.
So without rag the sentence would be
when was the last Mars rover launched?
So this is just the incorrect response.
So with rack the sentence would be
dynamically retrieved from NASA's
database and it would be the
perseverance rover was launched on July
30, 2020. Now that we have seen why rag
is important. So let's dive into how it
works. Well, rag operates in three-step
process. A user submits a query which
triggers the retrieval stage. Here a
retriever searches a database or
knowledge base using tools like DM25 to
fetch the most relevant information. The
retrieve data is then fed into a
generative model like GPT or T5 which
process it and generates a coherent
contextually grounded natural language
response. Now let's take an example
here. The query is who wrote 1984.
Retrieve would be fetching a document
containing George Orwell wrote 1984. Now
generative response would be the author
of 1984 is George Orwell. This hybrid
approach makes DAG ideal for real world
applications like chat bots and
knowledge systems. Now that we
understand how rag works, let's explore
some of its real world applications.
Rack's versatile applications span
various domains. In knowledge
management, it can summarize large
databases or documentation, aiding
corporate teams. Legal and compliance
task benefits from RA's ability to
answer queries based on case law and
regulations. While in healthcare, it can
support medical professionals by
summarizing research papers and
guidelines. Education and e-learning can
leverage rack for virtual tutotoring
providing detailed explanations based on
the textbooks and research papers.
Interactive virtual assistants like
Alexa and Siri can utilize rack to
generate accurate and informative
responses to user queries such as news
headlines or product recommendations.
Rack's unique ability to combine
retrieval and generation makes it
essential for task demanding both
factual accuracy and fluent natural
language responses. Now let's compare
retrieval augmented generation with
traditional AI model across three
features. First we have fractual
accuracy. Rack provides highly accurate
responses by using realtime data whereas
traditional models may give less
accurate answers and may give errors.
Next is the context adaptability. So
here drag adapts quickly to new queries
using live data whereas traditional
models offer fixed answers based only on
pre-trained knowledge. Next we have
knowledge updates. Rank is easy to
update just change its data source.
Whereas traditional models need
retraining which takes time. Then we
have scalability. Whereas traditional
models are limited by day size and
training data. And then we have use
cases. Rag is great for task like legal
advice or customer support. Whereas
traditional models work well for
creative rating or casual queries. So
here I want to conclude that rag is
ideal for knowledge based task needing
accuracy and flexibility while
traditional models are better for
creative users. While rag offers
significant advantages, it's essential
to acknowledge its limitations. So let's
discuss the challenges and future of
rack. So the first challenge is the
latency. RACK systems can suffer from
latency issues especially when dealing
with large data sets or complex queries.
Next is the data quality dependency. The
quality of the generated responses
heavily depends on the quality of the
underlying data. The next challenge is
complex integration. Integrating rack
systems with existing applications and
infrastructure can be challenging due to
the need for data synchronization, query
optimization and model management. And
finally, scalability issues. As RA
systems becomes more complex and are
deployed at scale, they can face
scalability issues. This includes
handling increased query loads,
maintaining data freshness, and ensuring
model performance. Now, while rack faces
limitations, its potential is
undeniable. So now let's discuss Rag's
future. The future of Rag holds immense
potential. It will power dynamic
real-time applications like new
summarization, financial analytics, and
live sports commentary. Rag will be
customized for specific domains like
healthcare, law, and science through
integrations with specialized knowledge
bases. Advances in retrieval models and
compression techniques will reduce
latency to enhance efficiency. RAG will
expand to handle multimodel data
enabling use cases like multimedia
question answering. Additionally, RAG
will facilitate personalized AI
assistant and improve transparency and
explanability by attributing sources and
providing clear explanations.
Now let us move on to generative AI
project using rack. So imagine you're
working with a massive library of
documents. You need a way to quickly
search and answer question based on the
content. So manually flipping through
pages takes time and effort. Wouldn't it
be great to have a system that retrieves
relevant information and answers your
questions directly within those
documents? So that's where our Streamlit
app comes in. This app utilizes the
power of natural language processing and
advanced retrieval techniques to turn
your complex document collections into a
powerful question and answer system. So
let's take a look at the code behind
this app. This app will allow users to
ask questions about a collection of PDFs
and get answers directly from the
documents using the power of natural
language processing. Now, first let's
create a virtual environment. Now, in
the terminal, let's type the command for
setting up the environment in your
editor. For that, let's type create
p. Let's type v E Nv Python and its
version give equal equal to 3.10
- Y. Okay, let's enter.
In this command, the hyphen PB env
specifies the path and the environment
name while hyphen Y skips the prompts
for a smoother install. Now, while
that's setting up, let's create a few
essential files. So let's activate your
new environment with the command cond
activate
v e n v and forward slash.
So as you can see our environment is
ready. Now let's import libraries.
So let's start by importing the
libraries we will need. In the first
line we will import stream as st.
This gives us access to all the
functionalities of the stream lead for
building our web app interface. So next
we will import OS for various operating
systems functionalities.
After that now we will import libraries
from langj which is a framework for
building NLP pipelines. So we will use
these for task like text splitting for
document chain creation, prompting,
retrieval and more. So we will explain
each library in detail as we use them.
So let's type from langchen_group
import chat group that is gr
and next we will type from
langin.extplitter
import
recursive character text splitter. So
let's type recursive character text
splitter. Again let us type from
langchain.tchains
dot combine documents
import
create start documents chain again from
lang chain code.prompts prompts. Let's
import
create retrieval chain. Next, import f
import fis
from lankchain community dot vector
stores. So this will help us to create a
vector index for efficient document
retrieval. So let us type from lchain
community.vector Vector stores import F
ASS.
Similar imports will follow for other
functionalities like document loading
and generating embedding. But we will
introduce them as they appear in the
code. But before this go to the group
cloud website and on your left you have
API key option. So select and create
your API key
and copy this. And if you want to check
your model then go to the playground and
at the top right corner click on the
llama model and check there are so many
of them latest also. So choose your
model and generate your free API. Now go
to the terminal and paste it inv file
using variable group API key.
Now again go to the Gemini AI studio. On
your right you have the create API
option. So select your model and create
your API key. Now
now copy the key and paste it into your
environment variable that is the env
file using a variable Google
API key and paste it here.
Now we will load environment variables
from a env file that will securely store
our API keys. So for that use env to
achieve this. Let's type from env import
load
env. Also let's type load env function.
Next we use OS to retrieve the gro API
key and Google API key from the
environment variables do get env. So let
us type
gr
api_key
equal to os dot get env and inside the
function let's type it as gr oq api key
and inside the single code let us type
gr oq api key
and in the next line let us type o dot
env i n and and inside the bracket under
double quotes let us type Google
API key and equal to OS dot get env
let us type google API key. So here
these keys are the required to use
specific NLP services. Now let us write
code for displaying app title and
images. So for that load your image.
Since I'm using edureka image name
edureka.png along with the app title
edureka document question and answer we
will use st. image and st.title for this
purpose. So for that let us type
stimage.
So inside the double quotes let us keep
the image name and comma width is equal
to 200. And let us also keep the title.
So for that st.itle
and let us type edure document question
and answers. Now the next step is to
initialize chat group and prompt
template. Now it's time to interact with
the lang chain group API. So initialize
the chat group object using group API
key and specify the llama 38b 8192 which
is the language model we will be using
for our NLP task. So for that let us
type lm equal to chat group and inside
the bracket give the group API key equal
to group
API
key comma we will give the model name as
well. So for that type model name equal
to and inside the double quotes give the
model name. So here we are using llama 3
- 8b - 819.
All right. Now let us define a prompt
template using chat prompt template. So
this template ensures that AI responses
are based on the context provided and
user questions. So keeping answers
accurate and concise. So for that we
will type prompt equal to chat prompt
template dot from template and inside
the bracket let us paste the prompt. So
here we have the prompt which says
please answer the question strictly
based on the provided context. Also
ensure the response is accurate, concise
and directly addresses the question. Now
let's create function for embedding
vectors. For that let's define vector
embedding function. So type def vector
embedding function and give colon.
Next in the next line give if and under
the double quotes give vectors not in st
dot session state
colon then type st dot session state dot
embeddings equal to Google generative AI
embeddings
give equal to Google generative AI
embeddings
AI caps and inside In the bracket give
the model equal to and inside the double
quotes let us type models forward slash
embedding -001.
Now make a folder where you will load
your PDF. So I am creating ed PDF and
paste your PDF here. Now set the
session. So that is let us type st.
session state.loader loader equal to py
directory loader. Here inside the double
quotes let us paste the path of the pdf.
Next is the data ingestion. For that let
us type sd dot session_state
do.d docs
equal to sd dot session
state
dot loader dot load function.
So this particular line of code is for
data injection and here this particular
line is for document loading. Next let
us type st dot session
state dot texts spplitter
and give equal to recursive
character text splitter and inside the
bracket let us give chunk size and
mention the size here I'll give equal
to,000 and comma chunk_lap
is equal to 200.
Now here these are for the chunk
creation.
Now let us type ST dot session
state dotfal documents
equal to st dot session state dot
textsplitter
dotsplit
documents. Now inside a bracket let us
give stession
state dot docs. Let us give bracket
colon 20. So this line of code is for
splitting. Now let us type st dot
session
state dot vectors
equal to fis
dot from documents
and inside the bracket let us again type
st. dot session state dotfal documents
comma type st dot session
state dot embeddings.
Okay. So this line of code is for vector
openi embeddings. Now to input field for
question let us type prompt one. So give
prompt one equal to st.ext_imp text_imp
import and let us type here enter your
questions from any document.
Now to create a button to load
embeddings let us type if st dotbutton
and inside the function let us give
under the double quotes load edure db
give colon and in the next line let us
type vector embedding function next type
dot success
and the message would be edureka db is
ready for queries
if the question is asked then if prompt
one is true then Type document chain
equal to create_star
document chain and inside the bracket
let us give lm comma prompt and in the
next line to retrieve let us type
retriever equal to
st dot session
state dot vectors
retriever function and in the next line
let us type retrieval_chain
equal to create retrieval_chain
and inside the bracket let us give
retriever document chain. Now to measure
response time let us type start equal to
time dot
process
time function. So for response type
response equal to retrieval_chain
dot invoke and inside the bracket give
input prompt one
and in the next line let us type
response time equal to time dotprocess
time function
start. Next let us write code to display
the response. So for that let us type st
dot markdown and inside the bracket let
us keep it as AI response.
Now in the next line let us type st dots
success
and inside the bracket give response
and give answer inside the single quotes
and in the next line write st dot write.
Let us type f inside the bracket and and
inside the flower bracket let us type as
response time colon 2f and seconds. Now
moving on let us write the code to
display similar documents in an
expander. So for that let us type with
st.expander
inside the bracket under double quotes
type document similarity search results
and give colon. And in the next line let
us type st dot.m markdown and inside the
bracket let us type it as below are the
most relevant document chunks. So type
below are the most relevant document
chunks. Give colon inside the bracket.
Close the double quotes and come to next
line. Here let us type for i comm, dot
in en in enumerate and inside the
function give response dot get context.
Now in the next line let us type st dot
markdown
and inside the bracket keep f and let us
give the html tag which is div class is
equal to card and open the p tag and
inside the p tag let us keep doc dot
page content and now close the p tag.
Now let us close the div tag as well.
Now let us come outside the triple code
and give comma and type unsaved_all
allow
html equal to true.
So you can also add inline styles and
HTML tags and also icons and emojis to
make your application fabulous for the
user. Now it's time for testing. For
that open your terminal and write
streamlit run and give your file name.
So once you enter and there we go here's
our document question and answer loader.
Now select the question from the PDF you
have loaded in the file and ask your
loader.
So as you can see this is my PDF. So I'm
going to copy some question from here.
So let me just copy this. Okay once
copied. So I'm going to paste it here.
So I'm going to click on the load eda
DB. So guys as you can see it provides
an answer in context given in the PDF.
So this is our answer that it has
generated. So that's all we have used
simple Python code and languin
techniques of rack and some inline HTML
and styles.
[music]
Have you ever wondered how massive AI
models like chat GPD are managed and
optimized? That's where LLM ops, which
stands for large language model
operations, comes in. LLM ops is a key
to training, deploying, and scaling
large AI models efficiently while
keeping cost low and performance high.
It ensures faster responses, ethical AI,
and seamless integration into real world
applications. Large language model
operations is a set of practices, tools
and frameworks designed to efficiently
manage, deploy and maintain large
language models like chat GPD, cler and
gemini in real world applications. Just
like MLOps streamlines the development
of machine learning models, LLM ops
optimizes the life cycle of LLMs from
data processing and training to
deployment and monitoring. Now that you
know what LLM ops is, so let's explore
why it's important.
As LLMs become widely integrated into
business applications, customers support
chat bots, content generation tools, and
automation systems. They need to be
continuously monitored and optimized.
Without proper LLM ops practices, large
language models can become inefficient,
leading to slower response times and
increased computational costs. So, they
may also become unreliable, generating
outdated or biased outputs that impact
user trust and decision making.
Additionally, these models can be
difficult to scale and struggle to
handle increasingly user demand which
can result in performance bottlenecks
and degraded user experience. For
example, imagine running strd like AI on
a customer support chatbot. Without LLM
ops, responses would be slow,
repetitive, and expensive. LLM ops
optimized the entire workflow. So now
that we understand why LLM ops is
important, so let's take a look at how
it differ from MLOps and what makes it
unique. All right. So LLM ops is a
specialized branch of MLOps, but it is
tailored for large scale language models
rather than traditional machine learning
models. So here are the key differences
between LLM ops and MLOps.
LLM ops differs from MLOps in several
key aspects. So in terms of data
complexity, LLM ops require vast amounts
of diverse text data whereas MLOps
typically works with structured or
tabular data. Next, compute power is
another major differences as training
LLM depends high performance GPUs and
massive cloud resources while
traditional ML models generally require
lower compute power. And when it comes
to real-time processing, LLM ops
necessitate scalable deployment to
handle continuous inference.
efficiently, whereas MLOps often relies
on batch processing or periodic
inferences. Lastly, ethical and bias
considerations are more prominent in LRM
ops, requiring constant monitoring to
detect and mitigate biases and
misleading outputs. Whereas bias
monitoring in MLOps is important but
generally less complex compared to LLMs.
Next, let us see how LLM ops works. So,
LLM ops follows a structured workflow to
ensure the efficient management of large
language models. So, it begins with data
collection and pre-processing where
large text data sets are cleaned and
structured for training. Next, model
training and fine-tuning help the AI
learn to understand and generate text
effectively. Once trained, the model
moves to deployment where it is run on
cloud servers, edge devices or APIs for
real world applications. And during
inferences and optimization, the model's
response speed is improved while
minimizing computational cost. Next,
monitoring and feedback loops play a
crucial role in tracking performance and
making adjustment based on real world
usage. Finally, continuous improvement
ensures the model remains relevant by
updating it periodically with fresh
data. And here are some real world
examples. Companies like Open AI, Google
and Meta use LLM ops to maintain their
AI products without frequent manual
retraining. So now that we understand
how LLM ops works, so let's explore some
of the popular tools and frameworks that
make it possible to manage and optimize
large language models efficiently. LLM
ops professionally rely on specialized
tools to manage the model life cycle
efficiently. So one of the top three
most popular platforms is hugging face
an open-source tool for NLP and
transformer models. ML flows is widely
used for tracking experiments model
versions and training metrics. While
CubeFlow provides a scalable MLOps
framework for deploying AI in Kubernetes
and companies use a combination of these
tools to streamline their LLM ops
pipelines and ensure smooth deployment.
Now that we have covered the tools and
frameworks used in LLM ops, next let's
explore the career opportunities and the
future prospects in this rapidly growing
field. LLM ops is a rapidly growing
field with a high demand for skilled
professionals. A machine learning
engineer focuses on designing and
optimizing LLM models, ensuring their
frequency and effectiveness. An AI
product manager oversees AI model
deployment for businesses, ensuring
smooth integration into real world
applications. The role of LLM ops
engineer involves managing AI
infrastructure and scaling models for
optimal performance. And if you have a
background in machine learning, cloud
computing or DevOps, transitioning into
LLM ops is a great move.
So, LM Ops plays a crucial role in
managing large AI models efficiently,
ensuring optimal performance, reduce
cost and ethical AI development. By
leveraging top tools like hugging face,
ML flow and cube flow, professionals can
streamline model training, deployment,
and monitoring. And with the increasing
adaption of AI across industries, career
opportunities in LLM ops are booming,
making it an exciting and rewarding
field for AI enthusiast looking to build
a future in artificial intelligence. And
what do you think about LLM ops? Drop
your answers in the comments.
[music]
It seems like everywhere you look today,
businesses are turning to AI agents to
automate complex workflows, boost
productivity, and make smarter decisions
across industries like finance,
healthcare, retail, and tech. More and
more companies are already using AI
agents to handle tasks that once
required entire teams of people. Maybe
you're here because you have heard the
buzz around agentic AI and want to
finally understand what it actually
means. Or maybe you're curious about
which framework is best to build your
own intelligence systems. And you might
be wondering what exactly is an agentic
AI framework? How is it different from
traditional AI tools? And how can you
use it to build autonomous AI systems
that don't just respond but actually
think, plan, and act. In this video, we
are going to break it all down step by
step. From understanding what AI agents
really are to exploring popular
frameworks, their key features, how to
choose the right one, and where they are
used in real world. Whether you're a
developer looking to build powerful
agents or a business leader exploring
automation, by the end of this video,
you will have a clear road map to
agentic AI frameworks. Now let's take a
quick look at how AI agents are making
real impact. Starting with a powerful
example in customer experience. Imagine
reaching out to customer support and
getting instant personalized help. No
repeating details. That's what agentic
AI brings to customer experience. With
rising customer expectations and growing
burnout among support teams, AI agents
are stepping in to make every
interaction smarter and faster. These
agents don't just respond, they learn,
remember, and act. Unlike traditional
chatbots that follow scripts, agentic AI
understand context, predicts needs, and
can even take proactive action like
offering a refund, opening a support
ticket, or escalating an issue before it
becomes a complaint. They use natural
language processing to hold real
conversations, sentiment analysis to
sense emotions and can smoothly hand
over complex cases to human agent when
needed. And behind the scenes, these
agents assist customer service
representatives too fetching data,
troubleshooting problems and suggesting
solutions in real time because they can
interact with multiple system and
remember customer details. Agentic AI
delivers supports that not just quick,
it's deeply personal and proactive. And
the result is happier customers, reduce
workload for agents, and improve
efficiency for businesses. And all
powered by intelligent evolving AI
systems. When most people hear AI agent,
they picture a simple chatbot. But real
AI agent go far beyond just replying to
questions. An AI agent is a system that
can understand a goal, plan how to
achieve it, act on that plan, and learn
from the results without constant manual
input. Let me break this down. First,
the agent understands the task. For
example, if you say, "Give me a daily
sales report," it knows it needs
numbers, trends, and summaries. Next, it
creates a plan like pulling data from
your CRM, cleaning it, and generating
insights. Then, it connects to external
tools such as APIs, databases, web
searches, or even other AI agents to
gather what it needs. After that, it
executes the plan step by step. And
finally, it learns from the feedback,
storing that experience in memory to
perform better next time. So, that's the
real power of AI agents. They don't just
react, they reason, act and improve over
time. Now, here's the challenge.
Building such a capable agent from
scratch is not easy. You would have to
design its architecture, handle
communication between components, manage
memory, integrate external tools, and
make sure everything runs smoothly.
That's a lot of time, effort, and
maintenance. Agelic frameworks solve
this problem by giving developers and
organizations a readymade structure to
build agents like how a game engine
saves developers from writing the entire
physics of a game from scratch. So think
of it like this. An AI agent is like a
skilled driver. An agentic framework is
like the road system providing
direction, structure, rules and smooth
connections between destination. So
without a framework, your agent can
work, but it's slow, fragile, and hard
to scale. With a solid framework, it can
move fast, connect to multiple tools,
and collaborate with other agents
effectively. So I hope now it's clear to
you all why we need agentic frameworks.
Now, a good agentic framework isn't just
a tool. It's a complete environment that
handles the heavy lifting behind the
scenes. And here are some of the most
important features. First is the defined
architecture, a clear blueprint for how
the agent plans, decides and interacts.
Next comes the communication layer. It
enables smooth interaction with APIs,
databases, humans, and other agents.
Next is the task management. It handles
multiplestep task, priorities, and
dependencies. Next comes the tool
integration. Pre-built connectors make
setup faster and easier. Next is memory
and learning. It allows agents to
remember past interaction and improve
over time. And finally is the monitoring
and control. It gives visibility into
what the agent is doing making it easier
to debug and optimize. So in short, I
can say that the framework gives the
foundation so you can focus on building
intelligence instead of worrying about
infrastructure. So why are these
frameworks gaining so much attention?
because they allow businesses to scale
AI beyond single isolated task. With
agentic frameworks, organizations can
build multiple agents that collaborate
on complex workflows, automate entire
processes end to end, keep systems
stable and consistent as they grow, and
deploy solutions faster since the core
infrastructure is already in place. So,
this is what turns a simple chatbot
experiment into a real AIdriven
operation. Now let's look at some of the
most popular agentic frameworks today.
So first on the list we have lang chain.
It's ideal for connecting language
models with external tools and creating
multi-step reasoning pipelines. Lang
chain is an open-source framework that
helps developers build application using
large language models like GPT clar or
llama. These apps can think, use tools
and remember context. It connects LLN to
things like APIs, databases, documents,
computational tools, and memory system,
letting them follow multi-step logic.
Basically, Lanchin turns an LLM into an
intelligent agent that can perceive,
plan, and act. With LChain, you can
build agents fast and your way. Use
readymade templates and patterns like
React to create agents in minutes. swap
models, tools or databases easily with
over a thousand integrations. You can
also customize agents with a simple
middleware for approvals, conversational
management for handling sensitive data.
And with Lang Graphs durable runtime,
your agents get persistent checkpoints
and human in loop support automatically.
So with that said, next we have
Langraph. It is powerful for designing
complex workflows where multiple agents
interact. Keep your agents on track with
human in the loop checks and easy
moderation. You can guide, approve, and
control what your agents do. Langraph
makes it simple to build customizable
workflows, whether it's a single agent,
multiple agents, or hierarchical setup.
It also remembers conversations for
richer long-term interaction. Plus, with
real-time streaming, users can see what
the agent is thinking and doing, making
the experience smooth and interactive.
The next framework on our list is
Autogen. Great for building multi- aent
systems that collaborate like a team.
So, Autogen is an open-source framework
by Microsoft that lets you create AI
agents that can collaborate not just
with humans, but also with other AIs.
These agents can chat, reason, plan,
write code, use tools, and even review
each other's work to finish complex task
automatically. Autogen works like a
team. The user proxy agent represents
the user. It gives instructions or goals
and the assistant agents plan, reason,
and execute task. And a critic agent can
review the output and suggest
improvements. So these agents
communicate through messages just like
people chatting in a group project
discussing steps until they reach the
right answer. Autogen makes it easy to
build multi- aent systems for research
automation, software development, data
analysis, content generation and it
saves time, reduces human effort and
improve quality through self-re and
collaboration. So you can create an
autogen setup where one agent writes
code, another test and third reviews it
all automatically. Autogen turns AI
models into autonomous collaborators
that can think, talk and work together.
It's like giving your AI teammates each
with their own role to solve problems
faster and smarter. Next on our list, we
have Crew AI. It focuses on
orchestrating specialized agents to work
together on complex task. Creo AI makes
it easy to build and manage
collaborative AI agents that can handle
complex task on their own and at scale.
It's easy, trusted, and scalable,
helping businesses adapt AI across teams
with centralized management and
monitoring. You get LLM and tool
configuration, role-based access and
serverless containers. Next, moving on
to open dating built for developer
agents. It enables coding AIs that can
write, test, and deploy code
independently. Open Daven is an
open-source project aimed at creating an
autonomous software engineer, an AI
agent that can understand software task,
write code, debug, test, and deploy
solutions automatically. Open Daving
uses large language models combined with
specialized tools and environments to
perform end-to-end software development
task. It can understand developer
instructions, plan coding step, write
and modify code, run and test the
scripts, fix bugs and deploy solutions.
Basically, it acts as an AI pair
programmer or even a full autonomous
coding agent. Open Davin is part of the
new wave of agentic AI frameworks
systems that can act, learn and
collaborate. It helps developers
automate repetitive coding task, debug
faster, build prototypes independently
and finally improve productivity. Now
let us move on to next framework. We
have semantic kernel. A lightweight and
flexible framework for integrating
external skills easily. Semantic kernel
is an open-source toolkit that makes it
easy to build AI agents and connect the
latest AI models to your C, Python, or
Java projects. It works like smart
middleware, turning model requests into
function calls and sending results back
fast. You can plug in your existing code
as extensions, integrate AI services
easily, and share them across your team.
It's modular, flexible, and built to
rapid enterprise solutions. The next
framework on our list is Llama Index. It
is perfect when your agent need to work
with structured data. Llama index is an
open-source data framework that helps
developers connect large language models
like GPT or Llama to external data
sources such as databases, PDFs,
documents, APIs or websites. Lens are
great at reasoning and generating
language but they don't naturally have
access to your private data like
internal documents, business reports or
real-time information. Lama index acts
as a bridge between LLM and your data.
It ingests data from any source like
text files, PDFs, SQL databases, APIs,
notion, etc. Indexes that data
efficiently for retrieval. feeds the
most relevant information back to the
LLM when you ask a question. So this
process is known as rack which is
retrieval augmented generation. So each
framework brings something unique. Some
emphasizes tool integration other
collaboration or data handling. The
right choice depends on your use case
and goals.
So there is no single best framework
only the best one for your specific
needs. So, here's a simple way to choose
the right agentic framework. First,
start with your goal. Are you building a
chatbot, an autonomous flow, or a multi-
aent ecosystem? Next, check integration
needs. Can it connect easily to your
tools and data sources? Then, consider
scalability. Think, will you need more
agents later? Next, look for
flexibility. Can you customize how your
agents reason and act? And then evaluate
community support. Good documentation
and an active community save time. Then
balance cost and performance. Choose
something that fits your resources and
growth plans. So your framework should
align with the problem you want to solve
and not the other way around. So there
is a common confusion for beginners. AI
agent builders are like a readyto use
kits. You can drag, drop, and launch an
agent fast, which is perfect for simple
task like a support bot, but they are
limited in flexibility. Agentic
frameworks, on the other hand, give
developers the freedom to design
powerful customized systems from the
ground up. So, it's just that the
builders are like instant cake mix, fast
but limited. Frameworks are like having
all the ingredients. It takes more
effort, but you control the flavor,
shape, and result.
So agentic frameworks are becoming the
backbone of modern AI automation. They
make it possible to build intelligent
autonomous agents that don't just
respond but plan, act and learn. As
agentic AI continues to grow, this
frameworks will play a key role in
shaping the future of automation. If
this video helped you understand the
concepts clearly, don't forget to like,
share, and subscribe for more deep dives
on AI agentic systems and genai tools.
>> [music]
>> Let's imagine it's 8:30 in the morning
and you're having your first cup of
coffee. While you're getting ready, your
personal AI assistant has already
rescheduled your meeting through your
calendar agent, reply to a few emails
via your email agent, and even
negotiated a delivery update with your
vendors AI system all by itself. No
waiting, no follow-ups, just intelligent
agents talking to each other in a
digital language of their own. Sounds
great, right? But this is exactly where
technology is headed. A world powered by
AI agent protocols. Today we will break
down what AI agents are, how they
interact, why these protocols matter,
and what makes system like A2A, MCP, ACP
and others so important. So, first
things first, what exactly is an AI
agent? Think of it as a digital helper
that doesn't just follow commands, but
can understand goals, make decisions,
and take actions. Unlike a simple
chatbot, an AI agent can observe what's
happening around it, think or reason
about what to do next, and act either by
talking to another agent or by
performing a task. For example, a
customer support AI agent can understand
a complaint, talk to refund processing
agent, check the company's payment
system, and resolve the issue without
needing human help. But here's the
catch. For all the smart AI agents to
actually work together, they need a
common language. Think about how
computers talk to each other today. When
you visit a website, your browser and
the server don't guess what the other
means. They use standard internet
protocols like HTTP or TCP IP. These
rules make sure every message from your
YouTube video to bank transaction is
sent, received and understood the right
way. Now imagine a world of AI agents.
One built by Open AI, another by
Enthropic, another by Google, all trying
to collaborate without a shared
communication protocol. It's like having
a French agent trying to talk to a
Japanese agent with no translator in
between. Total chaos. That's where AI
agent protocols comes in. Just like
humans use spoken languages, AI agents
use protocols to share context, exchange
information, and coordinate actions
securely and consistently. These
protocols define what agents can say,
how they say it, and when they act on
it. They make sure that when one agent
says schedule a meeting, another agent
understand exactly what that means and
not just the words but the intent behind
them. Now when we talk about
communication, AI agents interact in
three main ways. So let's make this
simple. So we are going to talk about
the types of interactions in agentic
systems. So first of the list we have
agent to agent which is A to A. This is
when two agents talk directly. For
example, a travel booking agent might
ask a hotel reservation agent to find
available rooms. The A2A protocol
defines how they share request responses
and confirmations. Next, we have agent
to user which is AG UI. This is when the
agent talks to you the human like when
you chat with chat GPD, ask CD to set a
reminder or use Google assistant. Here
the focus on understanding intent and
making communication natural and human
friendly. Next we have agent to resource
which is A2R. This is when an agent
access external resources like
databases, APIs or files. For example,
your AI analyst agent might pull data
from a company database or a stock
market API to prepare a report. These
three types of interactions are the
backbone of how intelligence systems
think, talk and act. Now why protocols
matter? Imagine if every company built
their own agents in isolation. One using
lang chain, another using crew AI,
another built on open AI. So if they all
spoke different languages, none of them
could collaborate. It would be like
trying to run the internet without a
standard like HTTP. Complete chaos.
That's why standardized protocols are
essential. They give AI agents a shared
way to exchange the data in context,
coordinate actions, maintain trust and
transparency and collaborate securely
across systems. In short, I can say that
protocols are the glue that make
intelligent systems truly work together.
All right, now let's explore the five
key protocols shaping how AI agents
communicate with real examples so it's
easy to visualize. So again first on the
list we have A2A which is agentto agent
protocol. A2A defines how two or more
agents talk to each other, how they
share task, exchange results and build
trust. It sets the structure for message
exchange, trust verification and intent
sharing. It's like a meeting protocol
between agents. They introduce
themselves, share goals, negotiate and
agree on the next step all without human
involvement. So, it is used in multi-
aent environments. In logistics, a
delivery scheduling agent might talk to
a warehouse inventory agent to confirm
if products are ready before dispatching
a truck. Now, think of Microsoft Copilot
in Excel working together with Copilot
in Outlook. One analyzes your data while
the other summarizes insights in an
email draft. That's A2A communication in
action. Seamless collaboration between
agents behind the scenes. Next on the
list we have MCP which is model context
protocol. This protocol allows an AI
model like cloud or GPT to securely
connect to external tools like APIs and
databases without manual coding. In
simple terms, MCP helps AI models access
real world data safely and on demand.
and it is used whenever an AI needs
up-to-date or private data like
financial reports, product inventories
or real-time analytics. So, let's say
you're a sales manager. Your AI
assistant uses MCP to fetch live sales
numbers from your company's CRM, checks
targets, and generates insights
instantly. So, let's have a look at the
real world example. Anthropic cloud
models already use MCP to connect to
APIs, databases, or even Google Sheets,
allowing them to pull live information
rather than relying only on past
training data. It's what turns an AI
model into a contextaware decision
maker. Now comes ACP or agent
communication protocol, the backbone of
reasoning and goal sharing among agents.
ACP defines how agents express their
beliefs, intentions, and plans.
essentially their thought process. It
lets one agent tell another why it's
doing something, what it believes in
true, and what it plans to do next. And
it is used in large collaborative
systems where multiple agents must plan
together. In a smart city, different AI
agents manage traffic, energy, and
emergency systems. The traffic agent
might signal that it plans to reroute
cars due to congestion and the energy
agent adjust street light timings to
save power in that area. So now let's
have a look at the real world example.
Research lab at MIT and Stanford are
experimenting with ACP style
communication to build multi- aent
reasoning systems where agent plan
together instead of acting individually.
This is what enables collective
intelligence. Agents thinking as a team.
Now, what if these agents aren't all
inside one company, but spread across
the internet? That's where A&P or agent
network protocol comes in. It's designed
for peer-to-peer communication between
agents across different networks or
organizations. Kind of like a
decentralized messaging layer for AI. It
is used in large distributed systems
like global logistic, financial trading
or smart city networks where different
companies agents must cooperate. So
imagine several hospital using AI agents
to detect disease patterns. With AMP,
each hospital's agent can share
anonymous insights with others improving
predictions without revealing patient
data. So the real world example is
projects like fetch AI are already using
A&P like concepts where agents trade
data and digital services securely in
decentralized marketplaces. ALP is like
the internet of agents enabling large
scale collaboration. Finally, let's talk
about AGUI or agent user interface
protocol. The bridge between humans and
agents. This protocol defines how agents
understand and respond to human input.
Whether it's through text, speech or
visuals, it focuses on intent
recognition, clarity and explanability,
ensuring humans always stay in control.
It is used in every interface you are
interacted with like voice assistants,
customer chatbots or AI copilot in
software tools. So a financial advisor
AI might use AGUI principles to explain
why it suggest a certain investment
showing risk and returns in simple
visuals. So now let us have a look at
the real world example. Chant GPT,
Gemini and Meta's AI studio all follow
AGUI like standards focusing on tone,
clarity and transparency to make
interaction humanlike. In companies like
JP Morgans's AGUI style systems ensure
AI tools explain financial advice
clearly so users remain in control. I
hope this is clear now. Now let's see
how all of these work together in one
real world scenario. Imagine a global
e-commerce company. MCP lets its AI
agent pull live inventory and customer
data. A2A allows marketing and logistic
agents to coordinate automatically. ACP
ensures they plan campaigns and delivers
in sync. A&P connects these agents
securely with the partner companies
worldwide. And finally, AJUI delivers
the final insights to human managers
through a simple dashboard. All these
protocols together form a digital
nervous system where every agent and
every human stays connected and
informed. We are entering a time when AI
agents won't just live inside one app or
company. They will work across
platforms, industries, and even nations.
Imagine healthcare agents from different
hospitals sharing data securely to speed
up diagnosis or financial agents during
crossber compliance checks in seconds.
That's not science fiction. It's already
happening and it's only possible because
of protocols like A2A, MCP, ACP, A&P and
AGUI. They are not just technical
frameworks. They are the language of
intelligent collaboration. So next time
you hear about agentic AI, remember it's
not just about powerful models or
automation. It's about creating a world
where intelligent agents can talk,
think, and work together. A world
connected through these silent yet
powerful AI agent protocols. They are
the bridge between isolated AI systems
and truly connected intelligence.
Amazon just dropped a major AI upgrade,
Alexa plus, and it's unlike anything we
have seen before. It's not just an
update, it's a complete transformation
powered by generative AI. But what
exactly makes Alexa smarter, more
conversational, and more capable? Well,
in this video, we will break down how
Amazon has leveraged state-of-the-art AI
models to make Alexa a true AI
assistance, how it compares to
competitors like Chat DBT voice and
Google Assistants, and whether it's the
future of voice AI. Let's rewind a bit.
Alexa started as a simple voice
assistance in 2014. It could set
reminders, play music, and control smart
devices. But it had one major
limitation. It wasn't really thinking,
just following predefined rules. As AI
advanced, assistants like Apple Siri and
Google Assistants improve. But Amazon
saw an opportunity to turn Alexa into a
true conversational AI. And that's where
generative AI comes in. Enter Alexa
Plus, a brand new AI powered version of
Alexa that understands context,
remembers conversations, and sounds more
natural than ever. Launched on February
26, 2025, Alexa Plus is Amazon's next
generation AI assistance designed to
provide more natural conversational
interactions and enhanced capabilities.
This upgrade enables Alexa to perform
complex tasks such as planning events,
managing schedules, and controlling
smart home devices more efficiently.
Alexa Plus represents a significant
evolution from the original Alexa,
introducing several key enhancements. So
let us see what are they. First we have
conversational abilities. Alexa plus
offers more natural and expansive
interactions understanding colloquial
expressions and complex ideas making
conversational feel smoother and more
intuitive. Building on that it also
takes a more proactive approach to
assisting users. Unlike the original
Alexa, which primarily responded to
direct commands, Alexa Plus can
anticipate user needs such as suggesting
earlier dispatches due to traffic or
notifying about sales on desired items.
In addition, it has become more
personalized than ever. Alexa Plus can
remember user preferences, dietary
restrictions, and important dates,
tailoring responses and actions to
individual needs. Whereas the original
Alexa had limited personalization
capabilities. Beyond personalization, it
also enhances task management. The new
Alexa can handle complex task like
making reservations, ordering groceries,
and coordinating multiple services
seamlessly, surpassing the more basic
functionalities of the original Alexa.
Not just that, it also integrated with
more services than before. Alexa Plus
connects with a broader range of
services and devices including GrubHub,
Open Table, Ticket Master and various
smart home products making it even more
versatile. On top of all these
improvements, it now has the ability to
act independently. Agentic capabilities
is a notable advancements in Alexa plus.
Now that we have seen how Alexa plus has
improved, so let's dive into the
technology behind it and understand how
generative AI models and agentic AI
capabilities power this next generation
assistance. Alexa is built on
cuttingedge generative AI and agentic AI
leveraging powerful models and
algorithms to process language,
understand context, and execute task
autonomously. So let's break down the
key technologies that make this
possible. large language models which is
LLMs the brain behind conversations at
the core of Alexa plus is an advanced
transformer-based language model similar
to GPD4 cler and Amazon's preparatory
Titan model this LLMs are trained on
vast data sets allowing Alexa to
understand complex queries and respond
naturally also maintain context across
conversations making interactions feel
more fluid and generate humanlike
responses reducing robotic and
repetitative phrasing and by using
techniques like reinforcement learning
with human feedback, Alexa Plus
continuously improves its conversations
ability based on real world
interactions. The next technology is
agentic AI enabling proactive and
autonomous actions. Beyond just
responding to commands, Alexa plus
integrates agentic AI models which allow
it to act independently. Built on rag
and action models, it can plan
multi-step task example finding a
restaurant, booking a table, and
arranging transportation. It retrieves
real-time web data to provide the latest
information and execute action across
multiple apps and services without user
micromanagement.
This enables a fully autonomous AI
assistance experience, reducing the need
for manual user input. After agentic AI,
the technology that makes Alexa so
versatile is neural network
architectures enabling speech and
context awareness. Alexa plus utilizes
deep learning techniques such as
sequencetose sequence models for natural
language generation but which stands for
birectional encoder representations from
transformers for understanding user
intent with greater accuracy. Next,
Whisper ASR automatic speech recognition
for improved voice processing, making
Alexa more responsive to different
accent and speech patterns. These
advancements enable highly accurate
speech recognition, contextually
understanding, and real-time adaption to
user behavior. Alexa Plus integrates
long-term memory storage using vector
databases like FISS or Amazon Aurora,
allowing it to remember user preferences
over time, adapt to individual habits
and routines for a more personalized
experience, also provide contextual
reminders based on past interactions.
This deep personalization is what makes
Alexa Plus feel more like a true digital
assistant rather than just a voice
control device. Then comes the
technology that makes Alexa plus capable
of understanding and interacting with
user which is multimodel AI. Alexa plus
leverages multimodel AI combining
natural language processing for
textbased queries, computer vision for
eco show devices enabling it to process
and analyze on screen content. Also this
speed synthesis is used to generate
humanlike voice responses and this makes
Alexa place capable of understanding and
interacting with user in multiple ways
enhancing its overall functionality and
by combining LLMs agentic AI deep
learning models and real-time data
retrieval. Alexa represent a significant
leap in AIdriven virtual assistance. It
is no longer just a voice assistance. It
is an autonomous context aware and
highly personalized AI companion
designed to make daily life easier. Now
that we have explored the technology
behind Alexa plus, so let us see how it
stack up against other leading AI
assistants. Alexa plus enters the AI
assistant space with generative AI and
agentic AI making it smarter and more
[clears throat] proactive. But how does
it compare to the top AI models
available today? So let's break it down
across key aspects. So we will compare
them based on five key factors such as
AI powered and capabilities,
personalization and memory. Then
proactive and autonomous task. Next is
the ecosystem and third party
integration and finally conversational
abilities. So first let us compare it
with AI power and capabilities. So how
powerful is the AI behind each
assistant? Alexa plus uses Amazon Titan
plus custom LLMs with generative AI and
agentic AI for smart proactive
responses. Whereas chat GPT voice runs
on GPT4 great for deep conversation but
lacks real world task execution. Whereas
Google assistants uses Gemini AI best
for search and multimodel inputs such as
text, voice and images. And Apple Siri
uses Ajax LLM improving in language but
still rule based and limited. So, Alexa
plus leads in proactive AI while GPT4
dominates in conversation. Next, let us
compare it in terms of personalization
and memory. So, can the assistant
remember your preferences and adapt? Let
us see. So, Alexa plus long-term memory
of routines, preferences, and contextual
adaption. Charge voice limited memory
resets after sessions. Whereas Google
assistants remembers preferences inside
Google apps but lacks deep
personalizations. Apple Siri is minimal
memory. Mostly relies on Apple's preset
commands. So, Alexa Plus leads in
remembering and adapting to users. Next
is the proactive and autonomous task
execution. So, can it handle task on its
own? Let's see. Alexa plus uses agentic
AI for multi-step automation. For
example booking ordering and
reminders. Chat GPD voice assist with
planning but can't perform real world
automation. Google Assistants can set
reminders and retrieve information but
lacks deep automation. Whereas Apple
Siri limited to commands relies on
shortcuts for basic automation. So here
the Alexa Plus is the most proactive
handling task automatically. Next is the
ecosystem and third party integration.
How well does it work with other devices
and apps? Well, Alexa plus is best for
smartome such as Amazon Eco Ring third
party integrations. Chantity voice can
connect to some external tools but no
smart home control. Google assistance is
deep integration with Google apps and
services whereas Apple Siri is limited
to Apple devices with minimal third
party support. So here again Alexa plus
and Google Assistant lead but Alexa has
better smart home control and finally
conversational abilities. So how natural
and humanlike are the conversations?
Alexa plus is natural, expressive and
context aware. Chat GPD voice is best
for deep intelligent conversations.
Google Assistance is accurate but more
search focused. Apple series still
command based with limited depth. So
chat GPD voice is best for deep
conversations but Alexa plus is most
natural for voice interactions. So now
let us see the future of AI assistance.
Let's see what's next. So here we have
smarter AI memory. assistants will
remember and personalized even better.
Next, more autonomy. AI will handle
complex multi-step task independently.
And then more humanlike conversations.
AI will feel more natural and intuitive.
Next, seamless integration. AI will
connect effortlessly across devices and
services. Next, realtime decision
making. AI will anticipate needs and
offer proactive help. So Alexa plus is
best for automation, memory and smart
home control. Whereas chativity voice is
best for deep intelligent conversation.
Google assistance is best for search and
Google productivity. Whereas Apple Siri
is best for Apple users but still
limited in AI features. And Alexa plus
is not just an upgraded. It's a
redefinition of AI assistance with
generative AI and agentic AI for smarter
proactive help. So what do you think?
Which AI assistance is your favorite?
Let me know in the comments below.
[music]
Did Alibaba just do the impossible?
Their latest AI model has outperformed
both GP4 and DeepSync in some key
benchmarks. But how did they manage to
do it? And what does this mean for the
AI race? Stick around as we dive into
the shocking details behind this
breakthrough and what it means for the
future of AI. Alibaba just dropped a
bombshell during the Luna New Year. a
new AI model called Quinn 2.5 Max and
they say it outperforms Open AI's GPT40,
Meta's Lama, and even China's own rising
star Deeps. Is this the new benchmark in
AI? Let's break it down. First off, what
exactly is Quen 2.5 Max? Developed by
Alibaba Cloud, this model is being hyped
as a major rival to GPD 40. According to
their benchmarks, it crashes competitors
in reasoning, coding, and multilingual
task. So let's look at the numbers. In
Arena Hard, a benchmark for complex
problem solving, Quinn 2.5 max scored
85.3%.
Beating GPT4's 80.2% and Deep 6 V3
77.5%.
But here's the twist. It's not just
about the raw power. Alibaba built this
model for businesses. Think customer
service bots that speak 10 languages or
AI coders that debug Python faster than
your engineering team. And unlike
OpenAI's premium pricing, Alibaba's
offering Quen 2.5 Max at a fraction of
the cost. But why drop this during the
Lunar New Year when half of the China's
on vacation? Well, that's where the
discussion begins. Meet Deeps, the
20-month-old startup that's being
shaking up Silicon Valley. 3 weeks ago,
they dropped DeepScri and the R1 model.
And the secret is insanely low cost. We
are talking $0.14 million per tokens.
That's like charging pennis for a
Lamborghini. Deep's cheap open-source
models triggered an AI price war in
China. Alibaba reduced price by up to
97% overnight, but deep founder Young
isn't sweating it. In a raid interview,
he said, "We don't care about the price.
Agi is our goal. Agi that's artificial
general intelligence. AI that can
overthink humans." And here's the twist.
Deepseek isn't some corporate giants.
They are a tiny team of a grand students
and researchers working out of Alibaba's
hometown, Hongu. Meanwhile, Alibaba's
got 200,000 employees. So, how does Quen
2.5 Max actually stack up? Let's
compare. First, let's talk about
reasoning. Quinn takes the lead here.
Next, when it comes to coding, Quinn
continues to shine. And if multilingual
support is what you are after, Quinn
speaks Mandarin, English, Spanish or
whatever you name it. But Deepsec W3
still holds the crown for affordability.
And GPT4, it's holding on to its
reputation. But even open AI Sam Oldman
admitted deep progress is impressive.
But Alibaba's timing is strategy.
Releasing Quen 2.5 Max during Luna New
Year when everyone's distracted is a
power move. It's like dropping a diss
track on the Christmas day. No one's
looking, but everyone will hear it. So,
here's why this matters. Deepix R1 model
wiped $1 trillion of US tech stocks in a
day. Nvidia, Meta, Microsoft all
dropped. Why? Because if a tiny Chinese
startup can match GP4 at 100 the cost,
inventors wonder, are we overspending on
AI? And China's giant aren't sitting
still. Bite Dance updated its AI model
days after Deep Six launch. 10 cent and
BU are in the price cutting frenzy.
Meanwhile, Alibaba's betting big on
Quinn to dominate enterprise AI. Think
hospitals, banks, and mega corporations.
And the real question is who's closer to
AGI, Deep Seeks agile team or Alibaba's
corporate powerhouse? Share your
thoughts in the comments. So, is Quinn
2.5 Max the new AI champion? Maybe. But
this isn't just about benchmarks. It's a
glimpse into the future. A future where
AI isn't just built by Silicon Valley
giants, but by startups in Honguk and
open-source communities worldwide.
A game-changing development has taken
the tech world by surprise.
>> I think we should take the development
out of China very very seriously. A free
open-source AI model emerged seamlessly
out of nowhere. It not only matched but
surpassed some of the most advanced
systems on the market. What made this
even more remarkable was its origin. It
wasn't a new release from OpenAI nor a
breakthrough from Anthropic. It was a
deepse and AI model developed in China
and its development left top AI
researchers in the United States in
amazement especially when they learned
about the staggering cost behind it.
>> It's opened a lot of eyes of like what
is actually happening in AI in China.
The training cost for Deepseek version 3
was just $5.576 million. And in
comparison, OpenAI spends a massive $5
billion annually. While Google's capital
expenditures are projected to exceed $50
billion by 2024.
Microsoft on the other hand invested
over $13 billion just in OpenAI. And yet
deep model outperformed these highly
funded AI models from leading American
companies. And the contrast is truly
mind-blowing
>> to see the deepseek um um new model.
It's it's super impressive in terms of
both how they have really effectively
done an open source model that does what
uh is this inference time compute and
[music] it's super compute efficient.
>> Deepig didn't stop at the success of its
powerful open-source AI model. Instead,
it quickly introduced R1, a next
generation reasoning model that has
already surpassed the advanced OpenAI W1
model in several third party benchmarks.
This rapid innovation highlights Deep
Seek's ability to surpass even the most
well-funded United States AI giants,
proving that agility and creativity can
disrupt the established leaders in the
race of the AI dominance. As we dive
deeper into this, let's hear from Martin
Wishov, the director of Bulgarian
Institute for Computer Science,
Artificial Intelligence, and Technology.
He recently made some interesting
statements about the AI industry, and
they are shaking things up. So, he
pointed out that a Chinese AI startup
claimed to have developed its R1 LLM
model with less than $6 million while
other companies are pouring in billions.
That announcement alone caused Nvidia
stock to drop. And according to Martin,
these models are built by strong
researchers and engineers in the field,
many of whom actively publish their
work. But developing these AI models can
be incredibly expensive. Just to give
you an idea, running 2048 H800 GPUs
could cost anywhere between 50 to$100
million. And he also mentioned that the
company handling the data center is
backed by a massive Chinese investment
fund with far more GPUs than just those
2048 H800 units. As for the architecture
behind Deep6 R1 and V3 models, Martin
explained that they use a mixture of
experts MOE approach. Simply put, this
means that at any given time only a
small percentage of the model is active,
making it much more efficient in
realtime use. This rises a lot of
question about the cost efficiency AI
development strategies and how companies
are competing in this space. Now the
question is if deepix development is
being reported to have cost only 5 to6
million how does this figure align with
the extensive infrastructure data center
operations and substantial backing from
Chinese investment funds? Could there be
more to the story that isn't being
disclosed? Let us know in the comment
section below. As far as the research
indicates, DeepSc V3 has been utilized
as the base model for DeepSync R1 and
this progression highlights DeepSc
strategic approach to building on its
existing architecture while pushing the
boundaries of AI capabilities. Deepsec
R1 distinguishes itself by relying
entirely on reinforcement learning
fine-tuning, a focused and efficient
method that constructs sharply with
OpenAI's GPT infrastructure. Open AI's
GPT generative pre-trained transformer
framework employs a combination of
supervised learning, unsupervised
learning, and reinforcement learning to
train its models. While this
multifaceted approach has proven
effective, it also requires significant
computational resources and time. In
contrast, DeepSync R1's exclusive use of
reinforcement learning fine-tuning
demonstrates a more streamlined and
targeted methodology which not only
reduces cost but also enhances
performance in specific task. This
differences in training strategies
highlights DeepS's remarkable ability to
innovate efficiency and by building on
its foundational V3 model, it developed
R1, a reasoning model that has already
surpassed OpenAI's advanced systems in
some key benchmarks. Deeps focus on
reinforcement learning has allowed it to
carve out a unique position directly
challenging the dominance of United
States AI giants. This approach
demonstrates that with strategic
resource conscious innovation
groundbreaking results are not only
possible but they are already happening.
So what do you think? Does DeepSc
opensource approach give it a long-term
advantage? Or will OpenAI's heavy
investment in research and proprietary
models keep it ahead? Share your
thoughts in the comments.
[music]
Did a Chinese AI model just shake up the
entire US market? Let me tell you what
happened. On January 27, 2025, the stock
market took a serious hit. Tech stocks
drop hard and the biggest shock was that
the Nvidia, a prominent player in the AI
hardware sector, saw its stock crash by
17%. And that's a $590 billion loss in a
market value. And it was because of an
AI model called Deepseek R1. Yes, you
heard that right. A Chinese AI model
just sent shock waves through the
industry, raising big concerns about
China's growing AI dominance and what it
means for companies like Open AI,
Google, and even Nvidia. So why is
DeepSc R1 such a big deal? Well, it's
not just another AI model. It's a
gamecher. And with models like DeepSync
R1, DeepSc V2 and DeepSc Coder and it's
going head-to-head with the top layers
like Open AAI and Google offering
powerful AI at the fraction of the cost.
Here I have added a screenshot for your
reference and this table compares the
large language models based on their
accuracy and calibration error. And
among the models mentioned, Deepseek R1
has the highest accuracy with 9.4%.
Or performing 01 with 9.1%.
Gemini Thinking with 6.2% and other
models such as GPT40 with 3.3% and Group
2 with 3.8%.
Furthermore, Deepseek R1 has the lowest
calibration error with 81.8%.
Indicating improved confidence
calibration over other models with
errors greater than 88%.
This demonstrates that DeepSync R1 not
only produces the most accurate results
but also has a higher forecast
reliability and this benchmark graph
will show the DeepSc R1's expectational
performance across a variety of
evaluation task solidifying its position
as a top tire LLM. Notably, DeepSync R1
achieves the highest scores in AIM 2024
with 79.8%.
code forces with 96.3%,
math 500 with 97.3%
and MMLU with 90.8%.
Indicating superior reasoning, problem
solving and coding skills when compared
to OpenAI's 01 models and Deepsec V3
DeepSc R1 consistently outperforms or
equals top models particularly in
domains requiring precise logical
reasoning and mathematical skills.
Furthermore, its highest rebench
verified score is 49.2%
demonstrates its suitability for
software engineering applications and
this finding supports DeepS R1's
advancements in AI research establishing
it as a formidable competitor in the LLM
space. First, let's talk about DeepSc
R10 and its successor DeepSc R1. So,
let's break it down. In reinforcement
learning, there are two main components,
the agent and the environment. The agent
interacts with the environment and based
on its actions, it receives rewards or
penalties. The goal of the agent is to
maximize these rewards by learning from
its mistakes and improving over time.
Now, let's talk about the DeepS R10.
This model was a pioneering attempt to
use reinforcement learning without
supervised fine-tuning. And the idea was
to let the model learn entirely through
interaction with its environment without
any pre-labelled data. However, this
approach had some challenges. Deepsec
R10 faced two major issues. First is the
poor readability. The models outputs
were often hard to understand and next
the language mixing with Chinese. The
model sometimes mixed languages
especially Chinese which affected its
performance in English task. And to
address these issues, the team
introduced Deepseek R1. This new model
not only solved the problems of
readability and language mixing but also
achieved remarkable performance. In
fact, DeepSync R1 matched the accuracy
of OpenAI's GPD01 model, especially the
OpenAI 01217 model on reasoning task.
That's a huge milestone. But that's not
all. Deepc R1 is also 24 to 28 times
cheaper to train compared to other
state-of-the-art models. And this makes
it not only highly effective but also
costefficient opening up new
possibilities for research and
applications. So to recap here we have
the DeepSync R10 was an ambitious
attempt to reinforcement learning
without supervised finetuning but it
faced challenges with readability and
language mixing. Deepsec R1 addressed
these issues achieving top tier accuracy
and being significantly most cost
effective. Now as far as GPD is
concerned, CH GPD combines unsupervised
learning, supervised fine-tuning and
RLHF making it more aligned for
textbased reasoning and safe AI
interactions.
Now we will look at the model
comparison. Deep seek and GPT are both
pushing the boundaries of what AI can
do, but they take a very different
approaches. So let's break it down. We
asked the GPD01 model and DeepSc R1
model to generate a Python code where a
ball bounces inside a rotating triangle.
Sounds cool, right? Well, let's check
out the result.
First up, here's what GPD01 came up
with. It works, but the physics seems a
bit off and the moment isn't as smooth
as you would expect. Not bad, but it's
not quite there yet. Now, let's look at
what DeepSc R1 generated.
Wow, this one looks way better, right?
The ball's movement feels more natural
and the rotation of the triangle is much
smoother. The overall gameplay
experience is just more polished. So, if
we compare the two, DeepSc R1 definitely
outperform GPD01 in this challenge. Of
course, both models are impressive in
their own ways, but when it comes to
designing this specific game, DeepSc R1
takes the win. Now, we will distinguish
the differences in detail. So first
let's talk about the architecture.
Deepseek uses a mixture of experts
design. Think of it like a team of
specialists. Only the relevant experts
are activated for each task. So for
example, DeepSc V3 has 671 billion
parameters but only 37 billion are
activated per token making it super
efficient. And on the other hand, GPD
model uses a dense transformer
architecture where all the parameters
are active at once. GPD 4 for instance
has 175 billion parameters all working
simultaneously and this makes GPD
powerful but also computationally
expensive. Now let's talk about cost and
efficiency. Deepseek is a gamecher here.
It was developed on the budget of just
$5.5 million due to its efficient design
and that's a fraction of what other
models cost. GPT models like GPT4
requires massive computational
resources. Training GPD4 cost over
hundred billion dollars, making it a
heavyweight in terms of both performance
and expense. And when it comes to
performance, both models shine in
different areas. Deepsec is a powerhouse
in task like coding, translation, and
solving complex math problems. And in
fact, DeepSc R1 has been shown to match
the performance of advanced systems from
Open AI and Google despite its smaller
budget. GPD models like GPD4 are known
for their natural language
understanding, creative writing and
complex reasoning and they are
incredibly versatile and can handle a
wide range of task with ease. Next,
accessibility is another key difference.
Data is an open-source meaning its code
is available to the public and this
promotes transparency, collaboration and
innovation within the AI community. GPD
models on the other hand are primarily
proprietary. While open AI has released
some tools and models, many advanced
versions are restricted and accessed
through APIs. Finally, let's talk about
the ethics and censorship. Deepsec
implements strict content moderation,
especially for politically sensitive
topics. This ensures compliance with
regulatory standards, but can sometimes
limit its responses. GPD models also
have moderation mechanism to prevent
harmful outputs. But they strive to
balance open access to information with
ethical guidelines. Now we will install
the DeepSc R1 and run a short demo on
it. So let's see how the DeepSc R1 model
can be installed. First let's open up
our browser and head over to the
ola.com. And once you're there, you will
see a download button. So go ahead and
click on that. Now select download for
Windows to start the download. And keep
in mind it's a pretty big file. So it
might take a minute to download. So
let's give it some time.
Once the download finishes, go to the
download folder and find the
installation file. Now here, double
click on the file to open the
installation window. And you will see an
install button. So simply click on that
and Ola will start installing on your
system.
So once Ola is installed, let's head
back to the Ola website. Now click on
the models tab at the top of the page
and here you will see a list of
available models. And for this video we
are going to use the DeepSync R1 model.
So we will select the 1.5B model but if
you want you can also choose the latest
7B model too. Now once you have selected
the model you will find the installation
command here. So go ahead and copy that
command. Now open your terminal on your
Windows system and paste the command we
copied earlier. So this is the command
and this command will start pulling the
model. So depending on your internet
speed, this might take a little time. So
be patient
and that's it. Once the process is done,
your DeepSc R1 model will be all set up
and ready to use on your system. All
right. Now let's try out some commands
here. So let's say hello.
Okay, we got some response here. Now
let's ask it to tell something about
himself.
All right, so it responded saying that I
am Deepseek R1 and AI assistant. Next,
let's ask it to design a Python code to
list all the files in a directory.
So as you can see, it has provided up
the code we asked for. Great, right? So,
what does all of this means for the
future of AI? Well, AI is becoming more
accessible. Like, for years, AI
development was mostly controlled by big
companies with massive budgets. But now,
with open-source models like DeepSeek
R1, anyone whether you have a small
startup or you are an independent
developer or a student, you can build AI
solutions without paying huge API fees.
This means faster adaption and
innovation worldwide. Next, the global
AI race is heating up. The AI race
between the United States and China is
getting even more competitive. While the
United States tries to limit AI chip
exports, China is finding ways to keep
up. No matter which side you support,
one thing is clear. AI is evolving
rapidly and staying informed is more
important than ever. Next, AI is
becoming more sustainable. Training AI
models consume massive amounts of
energy. But with advancements in
optimization, we are seeing a shift
towards more efficient and eco-friendly
AI. This means lower CO2 emissions and a
reduced environmental impact. Something
that was once a major concern in AI
development. Next, career opportunities
are growing. And if you're a developer,
AI engineer, or a data scientist, this
is your moment. Companies will need
skilled professionals to build and
deploy AI solution at a faster pace than
ever. So if you have been thinking about
getting into AI, now is the time to
start. Well, Deepseek is making big
moves, but can it really compete with
open AI in long run? Which one do you
think will dominate the future of AI? So
let me know your thoughts in the
comments below. [music]
AI is no longer just responding. It's
acting, planning, and automating entire
workflows. Welcome to the era of agentic
AI, where AI agents can write code, run
businesses, and make decisions without
human input. By 2030, AI automation is
projected to be a $200 billion industry.
And those who master agentic AI tools
like AutoGPT, Devon AI and Langchain
will lead the future. Now let's dive
into the ultimate road map to mastering
agentic AI. So first let's see how you
can build a strong foundation in
generative AI. To truly master agentic
AI, you need a strong foundation in
generative AI. Understanding how AI
models work, their evolution, and their
impact on automation. So start by
exploring how AI has evolved from
rule-based systems to advanced models
like chart GPD, autogen.
You can check out Edurea's video on what
is generative AI and generative AI
examples for valuable insights into the
fundamentals of generative AI, its real
world applications and how it is
transforming various industries. So
first understand the core concepts of
agentic AI where AI can perceive, plan
and act independently to automate
complex workflows. Next learn about real
world applications such as business
automation, AI powered software
engineering and autonomous research
agents. And to deepen your knowledge
familiarize yourself with the key AI
models like GPD4 Turbo Cloud AI, Gemini
and Mistral. and stay updated on multi-
aent systems and self-improving AI
trends. And for hands-on exploration,
leverage open AI's API, Clut AI and Lama
3, or experiment with different AI
models on hugging face spaces.
You can also stay updated with AI
research papers from archive and hugging
face to keep up with the latest
breakthroughs. Edureka's generative AI
certification and training will teach
you Python programming, data science,
artificial intelligence, natural
language processing and so many other
updated technologies that a beginner or
advanced learners is seeking. And by
understanding this concepts and
experimenting with these tools, you will
have a strong foundation to start
working with agentic AI. Next, let's
dive into the programming for AI. To
build and experiment with agentic AI,
you need to understand the fundamentals
of programming, especially in Python,
which is backbone of AI development.
Start by learning Python basics,
focusing on data structures, loops,
functions, and object- oriented
programming.
Then explore essential AI and machine
learning libraries like NumPy and Pandas
for data manipulation, mattplot, lib and
seaborn for data visualization, and
tensorflow and pytorch for deep
learning. To work with AI agents, you
must also understand API interactions as
most AI tools like OpenAI's API, lang
chain, and hugging face models require
API calls. Additionally, learning
automation with fast API, flask, and web
scraping can help you integrate AI into
real world applications. For hands-on
practice, start small projects like
building a chatbot, creating an AI
powered summarizer, or automating data
analysis. You can also explore Edurea's
Python training and certification course
designed by industry experts where you
will learn Python from scratch along
with key libraries like numpy, pandas,
mattplot lib and scikit learn through
hands-on project and real world
applications. With this powerful
promptic techniques and tools, you will
be able to optimize AI responses and
unlock the full potential of agentic AI.
To leverage agentic AI, start
experimenting with cuttingedge tools
that enable autonomous workflows. Auto
GPT and crew AI allows you to create
multi- aent AI systems where AI agents
collaborate to complete task. Baby AGI
is perfect for automated research and
decision-m helping AI iterate on task
dynamically. Deaving AI, the first AI
software engineer, showcases how AI can
independently write, debug, and deploy
code for hands-on learning. Build real
world projects like AI powered
automation assistance, autonomous
research tools, or self-improving
chatbots to see agentic AI in action. By
working with these tools, you will
understand how AI can move beyond just
responding to acting intelligently and
autonomously.
Next, explore land chain and rack.
Powerful tools that give AI the ability
to retrieve real-time information,
process external data, and enhance
decision making to build more powerful
and contextaware AI applications.
Understanding lang and rack is
essential. Langchain is must learn
framework that enables seamless
integration of LLMs with external data
sources allowing AI agents to interact
with APIs, databases, and documents. Rag
enhances AI models by providing memory
and real-time knowledge retrieval,
making responses more accurate and
upto-date. For hands-on learning, try
building your own AI chatbot with lang
chain capable of retrieving real-time
information instead of relying on static
training data. A great project idea to
explore is an AI powered research
assistant capable of summarizing papers,
fitting real world data, and answering
domain specific questions. And to dive
deeper, check out our dedicated video on
lang chain and rag where we cover
everything in detail. Next, here are the
extra tips for your success. To excel in
agentic AI, consistent practice and
community engagement are key. So, start
by pushing your AI projects to GitHub
and using version control like Git to
track your progress and collaborate.
Join AI communities on Discord, Twitter,
and Hugging Face spaces where you can
interact with experts and stay updated
on trends and get feedback on your work.
Take advantages of AI internships and
open-source projects to gain real world
experience and build a strong portfolio.
Also stay updated by regularly reading
AI research papers on archive and Google
Scholar, keeping up with the latest
advancements in multi- aent AI and
automation. And by following these extra
tips, you will accelerate your AI
learning with career growth.
[music]
So let's begin our deep learning
interview questions and answer session
and understand what are the typical
questions which are being asked in deep
learning interview. So the first and
foremost question what any deep learning
interviewer asks is the basic
understanding or the relationship
between machine learning artificial
intelligence and deep learning. So
basically artificial intelligence is a
technique which enables machine to mimic
human behavior and machine learning is a
subset of artificial intelligence
technique which uses statistical methods
to enable machines to improve with
experience. Now deep learning on the
other hand is a subset of machine
learning which makes the computational
multi-layer neural network feasible. It
uses neural networks to simulate
humanlike decision making. Now coming to
the second question. Do you think deep
learning is better than machine learning
and if so why? Now though machine
learning algorithms the traditional
machine learning algorithms solve a lot
of our cases but they are not very
useful while working with
highdimensional data. Now that is where
we have a large number of inputs and
outputs. For example in case of
handwriting recognition we have large
amount of inputs where we have different
types of input associated with different
types of handwriting. Now another major
challenge is to tell the computer what
all features it should look for that
will play an important role in
predicting the outcome as well as to
achieve better accuracy while doing so.
So these are some of the few
shortcomings what machine learning have
and deep learning overcomes all of these
shortcomings. Now coming to our third
question which is what is a perceptron
and how does it work? Now actually our
brain has subconsciously trained itself
to do a lot of things over the years.
Now the question comes how does deep
learning mimics the functionality of the
brain? Well, deep learning uses the
concept of artificial neuron that
functions in a similar manner as the
biological neuron present in our brain.
Therefore, we can say that deep learning
is a sub field of machine learning
concerned with algorithms inspired by
the structure and the function of the
brain called artificial neural networks.
Now, if you focus on the structure of a
biological neuron, it has dendrites
which is used to receive inputs. Now
these inputs are summed in the cell body
and using the axon it is passed on to
the next biological neuron. Now
similarly a perceptron receives multiple
inputs applies various transformations
and functions and provides an output.
Now a perceptron is a linear model used
for binary classification. It models a
neuron which has a set of inputs each of
which gives a specific weight. Now the
neuron computes some function on these
weighted inputs and then finally it
provides the output. As we know that our
brain consists of multiple connected
neurons called the neural network. We
can also have a network of artificial
neurons called the perceptron to form a
deep neural network. Now coming to the
next question, what is the role of
weights and biases? Now for a
perceptron, there can be one or more
input called bias. While the weights
determine the slope of the classifier
line, the bias allows us to shift the
line towards left or right. And normally
bias is treated as another weighted
input with the input value x. In our
case, if you have a look at a typical
perceptron, what it receives is a set of
input. Now these inputs are not just
input which it gathers. So weights are
an additional input which it takes and
according to that it computes and
provides an output. Now which brings us
to the next question which is what
exactly are activation functions? So
activation function translates the
inputs into outputs and it uses a
threshold to produce an output. So the
activation function decides whether a
neuron should be activated or not by
calculating the weighted sum and further
adding the bias with it and the purpose
of the activation function is to
introduce a nonlinearity into the output
of a neuron. There can be many
activation functions like linear or
identity. We have the binary step. We
have sigmoid. We have the tan. We have
relu and soft max. These are a lot of
activation functions which are being
heavily used in the deep learning
industry. So one should actually know
about all of these things. Now talking
about perceptron, our next question what
an interviewer might ask is explain the
learning of a perceptron. So basically a
perceptron has four steps of learning.
So the first steps is initializing the
weights and threshold. So just now as I
mentioned initializing the weights and
the threshold so to the perceptron so
that it can activate a neuron by
calculating the weighted sum and further
adding the bias in and all. This is the
first step and the second step is
providing the input and calculating the
output using the activation functions.
And according to that what we do is the
third step involves updating the
weights. Now once a particular
perceptron learns something it has to
update the weights so that it could
learn much more things in a new manner.
And the next step what comes is just
repeat the step number two and three
which is provide the input and calculate
the output and then update the weights
accordingly. Now if you have a look at
the equation here we have wj t + 1 that
equals wj of t plus n of t - yx the wj
of t + 1 is the updated weight whereas
wj of t is the old weight d is the
desired output y is the actual output
and x is the input. So this is the
equation of the learning of a
perceptron. Now the next question is
what is the significance of a cost or a
loss function. So a cost function is a
measure of accuracy of the neural
network with respect to a given training
sample and expected output. It provides
the performance of a neural network as a
whole. And in deep learning the goal is
to minimize the cost function. So for
that we use the concept of gradient
descent. Now which brings us to the next
question which is what exactly is
gradient descent and what are its
various types. So gradient descent is an
optimization algorithm which is used to
minimize some function by iteratively
moving in the direction of the steepest
descent as defined by the negative of
the gradient. Now think of it as a bowl
in which you start from any particular
point and the goal is to reach the
bottom of the bowl which is the gradient
descent. So there are uh three types of
gradient descent which are the
stochastic batch and the mini batch. So
stochastic gradient descent it uses only
single training example to calculate the
gradient and update the parameters
accordingly. Whereas the batch gradient
descent calculates the gradients for the
whole data set and performs just one
update at each iteration. Now mini batch
gradient descent is a variation of the
stochastic gradient descent where
instead of single training example mini
batch of samples are used and it is one
of the most popular optimization
algorithm. Now if we talk about mini
batch gradient descent one might ask is
what are the benefits of the mini batch
gradient descent or how is it useful
than the others. Now the mini batch
gradient descent is more efficient when
compared to the stoastic gradient
descent and the generalization is done
by finding the flat minima which allows
to help approximate the gradient of the
entire training set which help us to
avoid the local minima. Now this is why
many batch gradient descent is
considered or is preferred over the
regular gradient descent algorithm which
is the stoastic gradient descent. Now
one might ask what are the steps for
using a gradient descent algorithms. So
first of all what you need to do is
initialize some random weight and bias
and after that you need to do is pass an
input through the network and get values
from the output layer. Next what you're
going to do is calculate the error
between the actual value and the
predicted value. Now this can be done in
number of ways. Now the next step
involves is to go to each neurons which
contributes to the error and change its
respective values to reduce the error
which is basically our goal is to reduce
the cost of any particular function or
any particular model. So after that what
you do is reiterate until you find the
best weights of the network and you find
the lowest cost of the particular
network. So one might ask you to write
any gradient descent program or write
the pseudo code of any grain descent
program. So what you need to do first of
all what we do is define the parameters
which are the weights the hidden weights
the weight output the bias hidden and
the bias output. We define a function
std with arguments as cost the
parameters what we have discussed and
the learning rate. Now what we do is
then we then define the gradients of our
parameters with respect to the cost
function. So here we use the theano
library to find the gradients and we
import theo as t and finally iterated
through all the parameters to find out
the updates for all the possible
parameters. So you can see that we use
vanilla gradient descent here and as you
can see it returns the updates and what
we do is update the parameters and the
cost in this particular equation. The
ultimate goal of any grain descent
algorithm is to minimize the cost. Now
talking about perceptron what are the
shortcomings of a single layer
perceptron. So well there are two major
problems. Now first of all is that the
single layer perceptron cannot classify
nonlinear separable data points and the
second point is that the complex
problems that involve a lot of
parameters cannot be solved by a single
layer perceptron.
Now consider an example here and the
complexity which arises when the
parameters are involved to take a
decision by a marketing team. So first
of all we have the categories which are
the email direct paid refer program or
the organic and inside these category we
have subcategories which are the Google,
Facebook, LinkedIn, Twitter we have
Instagram now and inside that we have
the type of subcategory which are the
search ads, remarketing ads, interested
ads, lookike ads and again if we do a
subdivision we have the parameters to
consider which are the customer
acquisition cost, we have the money
spent and the click rate or the lead
generated the customer generated and the
time taken to become a customer. So one
neuron cannot take in so many inputs and
that is why more than one neuron would
be used to solve this problem. Now which
brings us to the question what is a
multi-layer perceptron. So a multi-layer
perceptron or MLP is a class of feed
forward artificial neural network and it
is composed of more than one perceptron.
They are composed of an input layer to
receive the signal. An output layer that
makes a decision or the prediction about
the input and in between these two an
arbitrary number of hidden layers that
are the true computational engine of any
multi-layer perceptron. Now one might
ask what are the different parts of any
multi-layer perceptron or a neural
network. So first of all what we have
are input nodes. So uh the input nodes
provide information from the outside
world to the network and are together
referred as the input layer. No
computation is performed in any of the
input nodes. They just pass the
information to the hidden layers. Now
hidden nodes have no direct connection
with the outside world. Hence the name
hidden. And what they do is they perform
computation and transfer the information
from the input nodes to the output
nodes. Now a collection of hidden nodes
forms the hidden layer and while a
network will only have a single input
layer and a single output layer it can
have zero to n number of hidden layers
and a multi-layer perceptron has more
than one hidden layer. Now if we talk
about output nodes, the output nodes are
collectively referred to as the output
layer and are responsible for the
computation and transferring information
from the network to the outside world
and hence and hence they are also
responsible for the prediction. Now
coming to our next question, what
exactly is data normalization and why do
we need it? Now data normalization is a
very important pre-processing step which
is to normalize the data. The data
should not be either left skewed or
right skewed. It should be normal and is
used to rescale the values to fit in a
specific range to assure the better
convergence during back propagation. And
in general it boils down to subtracting
the mean of each data point and dividing
by its standard deviation so that we get
a normally distributed data and it makes
computation easy in terms of the back
propagation in case of neural networks.
So this is a very important part of any
deep neural network. Now talking about
deep neural networks so or neural
networks in general. Coming to our next
question which is now what is better the
deep networks or the shallow ones and
why? Now both the networks be it shallow
or deep are capable of approximating any
function. What matters is how precise
that network is in terms of getting the
result. Now a shallow network works with
only a few features as it cannot extract
more. But a deep network goes deep by
computing efficiently and working on
more features or the parameters. Now
deeper networks are able to create deep
representation at every layer. The
network learns a new more abstract
representation of the input and hence
deep neural networks are better than the
shallow ones. So what exactly is weight
initialization in a neural network. Now
as we saw we had weight initialization
in perceptron. So weight initialization
is one of the very important steps. A
bad weight initialization can prevent a
network from learning but good weight
initialization can help it in giving
quicker convergence and a better overall
error. Now biases can be generally
initialized to zero. The rule for
setting the weights is to be close to
zero without being too small because
every time the weight is being
multiplied to the inputs, the result
gets smaller and smaller. Now talking
about neural networks, what is the
difference between a feed forward and a
back propagation neural network? Now a
feed forward neural network is a type of
neural network architecture where the
connections are fed forward that is they
do not form cycles. The term feed
forward is also used when you input
something at the input layer and it
travels from the input to the hidden and
from the hidden to the output layer. The
values are fed forward. Now brack
propagation is a training algorithm
which consists of two steps majorly. The
first one is feed forwarding the values
and the second one is to calculate the
error and propagate it back to the
earlier layers. So to be precise forward
propagation is a part of back
propagation algorithm but it comes
before the back propagation.
So one might ask the question which is
one of the most important questions is
that what are the hyperparameters in a
neural networks and name a few of these
hyperparameters.
So hyperparameters are the variables
which determine the network structure
that is for example the number of hidden
units and or the hidden layers and the
variables which determine how the
network is trained for example the
learning rate. Now there are two types
of hyperparameters usually one are the
network parameters which are associated
to the network. In that case we have the
number of layers we have the network
weight initialization. we have the
activation function and in the training
parameters we have the learning rate, we
have momentum, number of epochs, we have
the batch size and much more. Now a lot
of hyperparameters also differ when we
work along with different types of
neural networks. So as in CNN we get
extra parameters to work on when
considering CNN which are the
convolutional neural networks and
sometimes we have to deal with less
number of hyperparameters. It all
depends upon the type of neural network
which you are using. So uh which brings
us to the next question is that explain
the different hyperparameters related to
networking and training. So in training
we have first of all we have the number
of hidden layers. So hidden layers are
the layers between the input and the
output layers as we just discussed and
many hidden units within a layer with
regularization technique can increase
the accuracy as smaller number of units
may cause underfitting. Now another
important aspect is network weight
initialization. So ideally it may be
better to use different weight
initialization schemes according to the
activation function used on each layer.
Mostly uniform distribution is used or
the normal distribution. Now if we talk
about activation function so they are
also used to introduce nonlinearity to
the models. They're also used to
introduce nonlinearity to the models
which allows deep learning models to
learn nonlinear prediction boundaries.
Now generally the rectifier activation
function or the relu is the most
popular. Now if you talk about the
training parameters. So these were the
network parameters which have to be
initialized to a deep neural network
before the training begins. And just
before the training we have the training
parameters which are the learning rate.
So the learning rate defines how quickly
a network updates its parameter. Low
learning rate slows down the learning
process but converts smoothly. A larger
learning rate speeds up the learning but
may not converge as smooth as a low
learning rate. Usually a decaying
learning rate is preferred so that we
get the best of both worlds and we get
the best expected output. Now another
hyperparameter is momentum. So momentum
helps us to know the direction of the
next step with the knowledge of the
previous step. Now it helps to prevent
oscillation and a typical choice of
momentum is between 0.5 to 0.9. Now if
we talk about the number of epochs. So
epoch is basically iteration. So number
of epochs is the number of times the
whole training data is shown to the
network while training. So increase the
number of epochs until the validation
accuracy starts decreasing even when
tearing accuracy is increasing. So that
results in sometimes overfitting. And if
we talk about the batch size, so mini
batch size is the number of subsamples
given to the network after which
parameters update happen. So a good
default for batch size might be 32 or 16
64. It depends upon the size of you know
the data you have. It can be any
arbitrary number but it's always better
to have it in the power of two right. So
while we were talking about overfitting
which brings us to our next question
which is what exactly is a dropout. So
dropout is a regularization technique to
avoid overfitting which is to increase
the validation accuracy thus increasing
the generalization power. Now generally
use a small dropward value of 20% to 50%
of the neurons with 20% providing a good
starting point and a probability too low
has minimal effect and a value too high
results in underarning by the network.
So first of all what you need to do is
use a large network and you are likely
to get better performance when the
dropout is used on larger network giving
the model more of an opportunity to
learn independent representation. Now
our next question is in a neural network
you notice that the loss does not
decrease in the few starting epochs. So
what could be the possible reason for
this to happen? Now the correct answer
is the reason for this could be the
learning rate is low first of all or it
might be the regularization parameter is
high or it can be it is stuck at local
minima. So it might take certain
iteration to go out of that local minima
and finally reach the lowest point. So
it might happen in some cases that it is
stuck at local minima. So another
approach to that sort of problem must be
initiated at that particular point of
time. Now talking about deep learning,
one might ask to name you a few deep
learning frameworks which are being used
in the industry. So first of all the
foremost and the most amazing deep
learning library is the tensorflow.
Followed by we have cafe. We have the
Microsoft cognitive toolkit which is the
CNTK. We have Torch or PyTorch which is
giving a good battle or it's standing
out from the crowd and people are
sometimes preferring PyTorch over
TensorFlow. Now MXNet is another deep
learning framework. We have Chainer and
we have KAS. Now, Kiras as you know can
be integrated with Theano as well as
TensorFlow and KAS has been considered
one of the best or the simplest deep
learning framework when it comes to deep
learning. Now, one might ask what
exactly are tensors? So, tensors are
nothing but a de factor for representing
the data in deep learning. What I meant
to say that tensors are just
multi-dimensional arrays that allows you
to represent the data having higher
dimensions. In general deep learning you
deal with highdimensional data sets
where dimensional refer to the different
features present in the data set. So
what you need is a multi-dimensional
sort of array or a data structure what
you could say. So that's what exactly
tensor is and in fact the name
tensorflow has been derived from the
operations which the neural network
perform on tensor. So it's literally a
flow of tensor. Now talking about
TensorFlow one might ask since it's the
most popular deep learning framework and
companies prefer people having the
knowledge of TensorFlow and been working
on it. So what are the few advantages of
TensorFlow? So first of all it has the
platform flexibility.
It is easily trainable on CPU as well as
GPU for distributed computing. Now,
TensorFlow has auto differentiation
capabilities and it has advanced support
for threads, asynchronous computation
and it is a customizable and open-source
framework. And most importantly, if we
talk about the latest TensorFlow 2.0 O
which has just been released. So those
come up with a lot of interesting
features and it has adopted kas as its
highle API fully so that the coding
aspect of it is much simplified and
eager execution is now by default so
that you do not have to write loads and
loads of line of code and if you want to
know more about tensorflow 2.0 know and
why it's the best deep learning
framework in the industry right now.
Just go ahead and check our TensorFlow
2.0 video. I'll leave the link in the
description box below. Go check it out
guys and understand how exactly is it
better from the previous version and why
it is the best deep learning framework
right now. Now talking about
computational graphs, one might ask what
exactly they are. So well a
computational graph is a series of
tensorflow operations arranged as nodes
in the graph. Now each node takes zero
or more tensors as input and produces a
tensor as output. Now basically one can
think of a computational graph as an
alternative way of conceptualizing
mathematical calculation that take place
in a tensorflow program. Now the
operations assigned to the different
nodes of a computational graph can be
performed in parallel thus providing
better performance in terms of
computation. So one might ask what
exactly is a convolution neural network.
Now a convolution neural network or CNN
or connet is a class of deep learning
neural networks which is most commonly
applied to analyzing the visual imagery.
So CNN use a variation of the
multi-layer proceptron uh designed to
require minimal processing. Now one
might ask the next question if you are
going for an interview which requires
you to work with a lot of images or
videos. So in that case CNN's are very
much used. So having a good knowledge of
CNN is always better in that case. So
the next question what we have here is
what are the various layers of CNN? Now
there are four layered concepts everyone
should understand in convolutional
neural networks are first the
convolutional layer the second is the
relu layer and finally we have the
pooling layer and finally we end up with
the full connectedness or the full
connected layer. Now if we talk about
CNN we have to talk about RNN also. So
one might ask what exactly is RNN? So
RNN or the recurren networks are a type
of artificial neural networks which are
designed to recognize the patterns in
the sequence of data such as text
genomes handwriting the spoken word
numerical time series data from sensors
the stock markets and the government
agencies. So recurrent neural networks
use back propagation algorithm for
training but it is applied for every
time stamp. It is commonly known as back
propagation through time which is BTT.
Now our next question is what are some
issues faced while training an RNN? So
recurrent neural networks use back
propagation algorithm as I just
mentioned for training but it is applied
for every time stamp and there are some
issues with back propagation such as
vanishing gradient or the exploring
gradient where the gradient vanishes or
it is too much to handle which brings us
to the next set of questions. The first
of which is what exactly is a vanishing
gradient and how is it harmful? Now when
we do back propagation that is move
backward in the network and calculating
gradients of loss which is the error
with respect to the weights the
gradients tend to get smaller and
smaller as we keep on moving backward in
the network. Now this means that the
neurons in the earlier layers learn very
slowly as compared to the neurons in the
later layers in the hierarchy. Now the
earlier layers in the networks are the
slowest to train. Now how is this
harmful? So earlier layers in the neural
networks are important because they are
responsible to learn and detect the
simple patterns and are actually the
building blocks of our neural network.
Obviously if they give improper and
inaccurate result then how can we expect
the next layer and the complete network
to perform nicely and produce the
accurate result. So the training process
takes too long and the prediction
accuracy of the model will decrease. Now
another question here arises is what
exactly is then exploding gradient
descent. Now this is just the opposite
of vanishing gradient descent. So
exploding gradients are a problem when
large error gradients accumulate and
result in very large updates to the
neural network model weights during
training. So the gradients are used
during the training to update the
network weights. But typically when this
process works best is when this weights
are small and controlled when the
magnitudes of the gradients accumulate
and the unstable network is likely to
occur. Now which causes a poor
prediction and results or even a model
that reports nothing useful whatsoever.
So vanishing gradient and the exploding
gradient are two problems which occur
while the back propagation happens in a
recurrent neural network. So our next
question is what are LSTM? So long
short-term memory which are the LSTM is
an artificial recurrent neural network
architecture used in the field of deep
learning and unlike standard feed
forward neural networks the LSTM has
feedback connection that make it a
generalpurpose computer. Now it can not
only process single data points but also
the entire sequence of data. They are a
special kind of RNN or the recurren
neuron network which are capable of
learning long-term dependencies.
Now one might ask what are capsules in a
capsule neural network. So capsules are
vector or what we can say an element
with a size and a direction specifying
the features of the object and its
likelihood. Now these features can be
any of the instantiation parameters like
the pose. We have the position, size,
orientation deformationation velocity
the albido which is the light
reflection, hue, texture and much more.
A capsule can also specify its
attributes like angle and size. So it
can represent with the same genic
information. Now just like a neural
network has layers of neurons, a capsule
network can have layers capsules. So
there could be higher capsules
representing the group of objects or the
capsules below them. Now this helps in
getting deeper knowledge of a particular
object or a particular data set and
having the knowledge from different
aspects or different angles. So the next
question arises is explain autoenccoders
and its uses. So an autoenccoder neural
networks is an unsupervised machine
learning algorithm that applies the back
propagation setting the target values to
be equal to the inputs. So autoenccoders
are used to reduce the size of our
inputs into smaller representation and
if anyone needs the original data they
can reconstruct it from the compressed
data. Now one might ask the question how
does autoenccoder differ from PCA? So an
autoenccoder can learn from nonlinear
transformation with a nonlinear
activation function and multiple layer.
It does not have to learn tense layers.
It can use convolution layers to learn
which is better for video, image and
series data. It is more efficient to
learn several layers with an
autoenccoder rather than learn one huge
transformation with the PCA. An
autoenccoder provides a representation
of each layer as the output and can take
the use of pre-trained layers from other
model to apply transfer the learning to
enhance the encoder or the decode. So
these are few of the reasons why
autoenccoders are better from PCA as we
know both of them perform the same task
which is mostly dimensionality
reduction. Now give some real life
examples where autoenccoders can be
applied. So the first of all we talk
about dimensionality reduction or the
first thing that should pop up in your
mind is dimensionality reduction. So the
recrossected image is the same as our
input image but with reduced dimensions.
Now it helps in providing similar image
with reduced pixel value and it can be
used in various areas where we have
limited storage or we have limited
processing power. So when there is a
high input or an image or a data with
high dimension or which has higher
values pixel values it can compress and
provide the same image with a lower
pixel value. Right? Or colors are used
for converting any black and white
picture into a colored image. Believe it
or not and depending on what is in the
picture it is possible to tell what the
color should be. Now feature variation.
If we talk about feature variation, it
attracts only the required features of
an image and generates the output by
removing any unnecessary noise or
unnecessary interruption. And if we talk
about dnoising image, the input seen by
an autoenccoder is not the raw input but
a stochastically corrupted version. A
dinoising autoenccoder is thus train to
reconstruct the original input from the
noisy version. Now talking about
autoenccoders, one might ask about the
different layers of the autoenccoders.
So basically an autoenccoder consists of
three layers which is the encoder, we
have the code and the decoder which
brings us to the next question explain
the architecture of an autoenccoder. If
you talk about the three layers which
are encoder, code and decoder. So if we
talk about encoder this part of the
network compresses the input into a
latent space representation. Now the
encoder layer encodes the input images
as a compressed representation in a
reduced dimension and the compressed
image is the distorted version of the
original image. Now coming to the middle
part which is the code. So this part of
the network represents the compressed
input which is fed to the decoder is
basically the channel. And if you talk
about decoder, this layer decodes the
encoded image back into the original
dimension. And a decoded image is a
lossy reconstruction of the original
image and it's reconstructed from the
latent space representation. Now one
might ask what exactly is bottleneck in
an autoenccoder and why is it used. Now
the layer between the encoder and the
decoder that is the code is also known
as bottleneck. So this is a
well-designed approach to decide which
aspect of the observed data are relevant
information and what aspects can be
discarded. It does this by balancing two
criterias. The first the compactness of
the representation measured as the
compressibility and second it retains
some behaviorally relevant variables
from the input. Now one might ask are
there any variation of autoenccoders?
Surely there are. So there are
conventional autoenccoders, we have
sparse autoenccoders, we have deep
autoenccoders, we have contractive
autoenccoders.
All of these autoenccoders have a
different structure or the different
code layer. If you talk about the
convolutional autoenccoder, we have the
convolutional CNN algorithm sort of
structure in that particular
autoenccoder with encoder in one side.
We have the convolution layers, the ReLU
layer, the pooling layer inside it and
then finally we have the decoding layer.
So another question what might pop into
the interviewer's mind is what are deep
autoenccoders? So the extension of
simple autoenccoders is a deep
autoenccoders. The first layer of the
deep autoenccoders is used for first
order features in the raw input. Now the
second layer is used for second order
features corresponding to the patterns
in the appearance of the first order
features. So the deeper layers of the
deep autoenccoders tend to learn even
higher order features. So a deep
autoenccoders is composed of two
symmetrical deep belief networks. first
four or five shallow layers representing
the encoding half of the net and the
second set of four or five layers that
make up the decoding half. Interesting,
right? So another important topic in
deep learning are the restricted bolts
in machines. So one might ask what
exactly is an RBM or restricted W
machine. So, RBM is an undirected
graphical model that plays a major role
in deep learning framework in recent
times and it is an algorithm which is
used for dimensionality reduction. Not
only that, it is used for
classification regression
collaborating filtering feature
learning and topic model. So when we
talk about RBM being useful for
dimensionality reduction, another
question might arise is how does RBM
differ from the autoenccoders? So
autoenccoders is a simple three-layer
neural network where output units are
directly connected back to the input
units. Typically the number of hidden
units is much less than the number of
visible ones and the task of training is
to minimize an error or the
reconstruction that is find the most
efficient compact representation for the
input data. So RBM share a similar idea
but it uses stochastic units with
particular distribution instead of
deterministic distribution. The task of
our training is to find out how these
two set of variables are actually
connected to each other. One aspect that
distinguishes RBM from the autoenccoders
is that it has two biases. The hidden
bias helps the RBM produce the
activations on the forward pass while
the visible layer biases help the RBM
learn the reconstruction on the backward
pass. Now this brings us to the final
questions of our deep learning interview
is that what are some limitations of
deep learning? I bet you weren't
thinking of this one but there are some
limitations. So deep learning usually
requires large amounts of training data
and deep neural networks are easily
fooled. Now the success of deep learning
are purely empirial. Deep learning
algorithms have been criticized as
uninterpretable black boxes because one
important thing about deep learning is
that you do not specify what you are
looking for. Right? The algorithm learns
on its own. So that is one of the
shortcomings of deep learning and deep
learning thus far has not been well
integrated with prior knowledge. So a
lot of people still don't feel it as a
way to solve their problem as a way to
approach to their problems because a lot
of people don't understand what exactly
is deep learning how it works how to
initialize all of the variables which
are the hyperparameters per se. These
all things are some limitations of deep
learning as of now. And we hope by the
time technology advances people get to
know more about what deep learning is,
how artificial intelligence can be
achieved through deep learning. They'll
be more open to this and all of these
limitations will be laid off. So guys uh
that's it from my side and I hope you
got to know a lot about deep learning
interview questions which might help you
in cracking the interviews and landing a
great job as data scientists, machine
learning engineers or artificial
intelligence engineers as a matter of
fact and one important thing what I
would like to say is that data scientist
role are somewhat you know industry
specific or I would say if you are
working in healthcare you should know
about healthcare industry too rather
than just knowing about the datas and
the numbers. So if you're working in
suppose imagery so you should know about
images you should know what you're
dealing with. So a good knowledge of the
particular industry which you're working
for will also provide you a great
advantage over other competitors and
since you know a lot of these stuff with
this video I'm sure you might be able to
land a job a great job in any of these
industries. And with this we have come
to an end to this agentic AI course for
beginners. If you enjoyed listening to
this course please be kind enough to
like it and you can comment on any of
your doubts and queries. We will reply
to them at the earliest and do look out
for more videos and playlist and
subscribe to Idurka's YouTube channel to
learn more. Thank you for watching and
happy learning.
๐ฅAgentic AI Training Course - Master AI Agents: https://www.edureka.co/agentic-ai-training-course Explore the future of artificial intelligence with Agentic AI! In this video, we dive into the exciting developments and advancements that are set to shape the industry in 2025. We will learn what Agentic AI is and how it goes beyond traditional AI by acting with purpose and autonomy. Youโll discover 5 powerful secrets behind Agentic AIโfrom multi-agent collaboration to real-world tool integration and ethical guardrails. By the end, youโll understand how Agentic AI is transforming industries and why itโs the future of intelligent systems. Join us as we discuss the latest trends, innovations, and predictions for Agentic AI in 2025. Whether you're an AI enthusiast, a business leader, or simply curious about the future of technology, this video is for you. Stay ahead of the curve and discover what's next for Agentic AI! 00:01:28 What is Agentic AI? 00:12:20 Agentic AI vs Generative AI 00:33:33 Alexa+ Powered by Generative AI 00:44:47 Agentic AI Roadmap 00:51:15 Introduction to Artificial Intelligence 01:10:03 Introduction to Deep Learning 02:27:26 Artificial Neural Network 02:59:26 Transformers Explained Using Generative AI 03:07:05 Transformers Neural Networks Explained 03:14:29 What are Large Language Models? 03:35:28 What is Multimodal AI? 03:52:46 LLM vs SLM 03:58:14 What is LangChain? 04:15:29 Langchain Agents Explained 04:22:44 What is RAG? 04:45:32 LLMOps: The Future of AI Development 04:51:55 Prompt Engineering 05:06:09 Natural Language Processing (NLP) & Text Mining using NLTK 05:44:52 KNN Algorithm using Python 06:03:11 Alibabaโs Qwen 2.5-Max Just Beat GPT-4 & DeepSeek? 06:07:35 DeepSeek Training Cost: How China Built AI for Less 06:13:04 DeepSeek vs OpenAI: Who Wins the AI Race? 06:25:36 Deep Learning Interview Questions โ Subscribe to our channel to get video updates. Hit the subscribe button above: https://goo.gl/6ohpTV ๐Feel free to share your comments below.๐ ๐๐๐ฎ๐ซ๐๐ค๐ ๐๐ง๐ฅ๐ข๐ง๐ ๐๐ซ๐๐ข๐ง๐ข๐ง๐ ๐๐ง๐ ๐๐๐ซ๐ญ๐ข๐๐ข๐๐๐ญ๐ข๐จ๐ง๐ฌ ๐ต DevOps Online Training: http://bit.ly/3VkBRUT ๐ AWS Online Training: http://bit.ly/3ADYwDY ๐ต React Online Training: http://bit.ly/3Vc4yDw ๐ Tableau Online Training: http://bit.ly/3guTe6J ๐ต Power BI Online Training: http://bit.ly/3VntjMY ๐ Selenium Online Training: http://bit.ly/3EVDtis ๐ต PMP Online Training: http://bit.ly/3XugO44 ๐ Salesforce Online Training: http://bit.ly/3OsAXDH ๐ต Cybersecurity Online Training: http://bit.ly/3tXgw8t ๐ Java Online Training: http://bit.ly/3tRxghg ๐ต Big Data Online Training: http://bit.ly/3EvUqP5 ๐ RPA Online Training: http://bit.ly/3GFHKYB ๐ต Python Online Training: http://bit.ly/3Oubt8M ๐ Azure Online Training: http://bit.ly/3i4P85F ๐ด ๐๐๐ฎ๐ซ๐๐ค๐ ๐๐จ๐ฅ๐-๐๐๐ฌ๐๐ ๐๐จ๐ฎ๐ซ๐ฌ๐๐ฌ ๐ต DevOps Engineer Masters Program: http://bit.ly/3Oud9PC ๐ Cloud Architect Masters Program: http://bit.ly/3OvueZy ๐ต Data Scientist Masters Program: http://bit.ly/3tUAOiT ๐ Big Data Architect Masters Program: http://bit.ly/3tTWT0V ๐ต Machine Learning Engineer Masters Program: http://bit.ly/3AEq4c4 ๐ Business Intelligence Masters Program: http://bit.ly/3UZPqJz ๐ต Python Developer Masters Program: http://bit.ly/3EV6kDv ๐ด ๐๐๐ฎ๐ซ๐๐ค๐ ๐๐ง๐ข๐ฏ๐๐ซ๐ฌ๐ข๐ญ๐ฒ ๐๐ซ๐จ๐ ๐ซ๐๐ฆ๐ฌ ๐ต Post Graduate Program in DevOps with Purdue University: https://bit.ly/3Ov52lT ๐ Advanced Certificate Program in Data Science with E&ICT Academy, IIT Guwahati: http://bit.ly/3V7ffrh ๐ต Advanced Certificate Program in Cloud Computing with E&ICT Academy, IIT Guwahati: https://bit.ly/43vmME8 ๐๐๐๐ฅ๐๐ ๐ซ๐๐ฆ: https://t.me/edurekaupdates ๐๐๐ฐ๐ข๐ญ๐ญ๐๐ซ: https://twitter.com/edurekain ๐๐๐ข๐ง๐ค๐๐๐๐ง: https://www.linkedin.com/company/edureka ๐๐๐ง๐ฌ๐ญ๐๐ ๐ซ๐๐ฆ: https://www.instagram.com/edureka_learning/ ๐๐ ๐๐๐๐๐จ๐จ๐ค: https://www.facebook.com/edurekaIN/ ๐๐๐ฅ๐ข๐๐๐๐ก๐๐ซ๐: https://www.slideshare.net/EdurekaIN ๐๐๐๐ฌ๐ญ๐๐จ๐ฑ: https://castbox.fm/networks/505?country=IN ๐๐๐๐๐ญ๐ฎ๐ฉ: https://www.meetup.com/edureka/ ๐๐๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ: https://www.edureka.co/community/ - - - - - - - - - - - - - - What is Agentic AI? Agentic AI refers to artificial intelligence systems that can autonomously make decisions, take actions, and pursue goals with minimal human intervention. Unlike traditional AI, which adheres to predetermined rules, agentic AI may dynamically adapt to new situations. It is widely utilized in robotics, virtual assistants, self-driving cars, and sophisticated decision-making processes. What are the prerequisites for this Agentic AI Training Course? In order to complete this course successfully, participants need to have a basic understanding of the Python programming language, machine learning, deep learning, natural language processing, generative AI, and prompt engineering concepts. For more information, please write back to us at sales@edureka.in or call us at IND: 9606058406 / US: +18885487823(toll-free). #agenticai #aiagents #agenticaicourse #generativeai