Loading video player...
Today we're going to do a crash course
on Langchain, the Python framework for
working with and building AI agents. It
makes it super simple to interact with
standalone models, build complex agents,
and integrate all sorts of other
components like embedding models or
vector stores. And all of this without
caring about specific differences in API
definitions of the various providers.
The goal of this video today is to cover
as much as possible in a short amount of
time. We're going to start by talking
about Langchain and its ecosystem. Then
we're going to take a look at a couple
of simple examples for building agents
and working with standalone models.
We'll learn how to work with message
histories, stream responses, use tools,
generate structured output, handle
multimodal input, pass context, and keep
track of memory. After that, we'll also
build a simple rag example. And finally,
we're going to take a look at
Langchain's powerful middleware, as well
as some interesting use cases for it.
That's quite a few things to cover, and
I think you can learn a lot today,
especially since we're going to work
with the latest version of Langchain
1.0. If you like this video, let me know
by hitting the like button and
subscribing. But now, let us get right
into it.
[Music]
>> All right. So, we're going to cover
quite a lot today, which is why I'm
going to try to speed this up to cover
as much as possible in a concise way so
we don't waste too much time here. I
want to start by just briefly talking
one or two minutes about Langchain, the
Langchain ecosystem, and also its
development history. Now, langchain
itself as I mentioned is a Python
framework for building and working with
AI agents. The main use case or the main
benefit of using Langchain is that we
can use all these models and related
tools in an abstract way. So, regardless
of the provider, regardless of whether
you're using OpenAI or Enthropic or
Google, you basically have the same
classes and methods everywhere. And if
you have a system built on langchain
that uses vector stores, embedding
models, uh AI models, agents, whatever,
you can easily just swap out the
underlying technology and you can keep
the code the same for the most part. So
that's what I would consider the main
selling point of Langchain that you have
this abstract highlevel way of working
with agents and related tooling. Now in
addition to lang, we also have lang
graph which is more low-level. We have
more granular control. We can build
complex graph-based event-driven agents
or agentic systems. We're not going to
cover Langraph in this video today. I do
have videos on this channel where we use
Langraph, but it's not going to be the
topic of this crash course. And finally,
we also have Lang Smith, which is like a
collection of tools for observing,
evaluating, monitoring, and deploying
models. We're not going to cover that
either. We're going to focus fully on
Langchain today. Speaking of which,
Langchain is now on version 1.0. So,
some of you guys watching this might
have already worked with Langchain in
the past. Maybe you're already familiar
with the package to some degree. But if
you take a look at 1.0, you will notice
that some things have changed. So, when
I first used Langchain, the structure
was a little bit different. We had
Langchain, then we had Langchain core,
and langchain community. And then you
also had langai, langchain anthropic as
separate import packages. So you would
do something like from langchain core
import something from langchain openai
import something. Now with 1.0 it seems
like if you look in the documentation
everything is going through the main
langchain package. So from langchain do
something we import stuff. The second
thing you will notice is that langchain
is now much more focused on being a
library for agents. Not just for
integrating tools not just for using
models and vector stores and all that
but actually for building agents. Which
means that we now have here langchain
agents with create agent and in the past
in my opinion it was more like langchain
is this general toolkit for working with
these tools and lang graph was more
about the agent side. So we can still
use the packages like langchain core and
langchain community but for the most
part it's enough to just go with the
main lang package that is the modern way
of working with this framework. Cool. So
now that we covered that let us go ahead
and set up our environment. In my case
I'm going to use uv which is a rustbased
python package manager. Of course, feel
free to use pip or pip 3 install. Feel
free to use virtual environments with
virtual env or ven, whatever you want to
choose. In my case, I'm going to go to
my tutorial directory, say UV init, and
then I'm also going to say UV at
langchain. Now, one thing that's
important is if you want to use lang
chain together with some AI provider,
which you usually want to do, you need
to also provide it in square brackets.
So for example, if I want to install
lang chain with the openai dependencies,
what I will do is I will say open AI in
square brackets here. So in my case, UV
add langchain openai. If you use pip,
you do pip or pip 3 install langchain
openai. And this will also install
packages like of course openai. And you
have to do that basically for all the
providers that you want to use. So you
want to do that for enthropic, you want
to do that for mistral AI. I'm going to
actually just do it now to show you how
this works. Mistral AI and you also want
to do this for Google genai if you plan
to use all these models. Now what we're
also going to need is API keys. If you
want to use models from providers of
course you have to authenticate
yourself. So we're going to create a
file called and this file will contain
our keys. So these keys will be the
open_appi_key.
It's going to also be the Mistral API
key, the Anthropic
API key, and I'm not sure what the
correct name is for the Google one, but
I'm just going to say Google API key,
even though that's probably not the
correct one. So, what you want to do is
you want to go to the providers if you
have accounts there if you have API keys
and you want to paste them here. So, I'm
going to go to OpenAI console to Mistral
Console, Enthropic Console, and to my
Google Cloud Platform. There I'm going
to take the API keys, copy paste them
here. I'm not going to show how to
obtain them or maybe I'm going to show
them for one of them. So yeah, here for
example, we go to
platform.openai.com/api
keys or you just go to API keys here on
the left and here you can create a new
secret key or API key. You do that for
the various providers. Then you just
copy paste the keys here like this or
you use quotation marks if you want to
and that is our end file. So in my case
I already did this and now I'm going to
install an additional package called
python-.end.
This is going to allow us to load these
API keys into our Python script. Now, to
get autocomp completion, I'm also going
to activate this environment here. This
is something I just have to do for Neoim
to get autocomp completion. So, you can
ignore that if you're not coding in the
terminal. And then we're going to go
into our main py file and get started
with a first simple example. So, instead
of just covering the concepts one by
one, I'm going to show you how to build
a simple agent. And we're going to cover
a couple of concepts while we're doing
that. So, we're going to start with the
import. Let's import requests. Then also
as I said from we're going to import
load.n this is for loading the API keys
into the environment. And then from
langchain
agents as we saw already in the docs
we're going to import the create agent
function. This is like the central
function that is going to create agents
that we then use to do stuff. And in
this function we can do a lot of things.
So we can provide middleware, we can
provide tools, we can do a lot of things
in that function or with that function.
In addition to that, I'm also going to
say now from langchain.tools
import tool. This is a decorator that we
can use to annotate a function to
basically make that function a tool. We
can also provide some uh information
like a description or the name of the
function. So for example, we can say add
tool and I want to define a tool that is
called get weather. This is going to
obviously get the weather and I can
provide a description here for the agent
to know what this function is about or
what this tool can do. In my case, I'm
going to say here return weather
information for a given city.
Theoretically, you can also say return
direct is equal to true if you want to
have just the output of that tool
immediately returned to the user. In
this case, set this to true. We can also
explicitly set it to false. I think it's
the default, but just so we know it
exists, we're going to set it to false.
And then we're going to say now the
function is get weather. It's going to
get a city, which is going to be a
string as input. And we're going to make
it very simple here. Here I don't want
to use a mock function. So I'm actually
going to ping a weather API. But this
one is actually open source and free to
use or maybe not open source, but you
don't need an account to use it. So
we're going to say here response is
equal to requests.get.
And we're going to use https/wtr.in/
city question mark format equals j1. And
of course this needs to be an fstring
otherwise we cannot parse or we cannot
format the city into the string here.
But this basically gives us a JSON
object with temperature and the weather
information for a given city. And all we
want to do here is we want to return the
JSON object that we get here. So return
response.json and then the model the
agent can do whatever it wants with that
information or whatever it thinks is the
most reasonable thing to do. So to keep
it simple we're going to have just this
one tool. And now we're going to say
agent is equal to create agent. And the
first thing we want to do here is we
want to provide the model. Now the model
can just be a string. For example, I can
say GPT-40.
Now 40 doesn't support tool use. So
we're not going to use that. But I can
go with 4.1 mini. For example, this
would be open AI. It is automatically
recognized as OpenAI. So if I don't have
Langchain OpenAI installed, it's not
going to work. So for this, a couple of
things need to be present. One is an
OpenAI API key in the environment. And
the other thing is langchain open AAI
needs to be installed. So these two
things need to be given so that we can
actually use this model. After that I
can just say give me a list of tools and
this list will contain just get weather
as an entity. So we're not calling the
function we're passing it. And then we
can provide a system prompt like you are
a helpful weather assistant who always
cracks jokes and is humorous while
remaining
helpful. So just to give you an example
so that you can see that the system
prompt actually affects the agent and
then in order to get something from this
agent in order to send something to the
agent I should say we can just invoke
it. So I can say agent.invoke invoke and
here what we do is we pass a dictionary
that contains a field messages and this
messages has to point so this key
messages has to point to the value which
is a list and this list will contain all
the messages as dictionaries with ro and
content as we know it from the typical
API. So ro is going to be user and
content is going to be whatever we want
to ask for. For example, what is the
weather like in Vienna? Question mark.
So I need to then save this as a
response. And down here I can then print
the entire response. So print response
or I can say if I want to get just the
message, I can get from response the
messages. Maybe let me do that here with
square brackets and a string. And then I
want to have the last message. So
negative one. And then from this
message, I'm interested in the content.
So I can run this now. And I get a
problem because because I'm of course
not loading the environment variables.
So I need to import load. But I also
need to call it to load the data from
the end file. So let's go ahead and say
uvun main py. I'm doing it outside here
so I can see all the output. And what
you see up here is the raw response
object which contains the entire message
history also with all the data that was
provided by the API. And as a result
down here I get the actual message. The
weather in Vienna right now is partly
cloudy with a comfortable temperature of
about 15° C and then some information
about the wind speed and humidity and so
on. So actually quite comprehensive.
We're going to build on top of this
example and extend it later on. But
before we go deeper into agents, let me
remove all of this. I want to show you
how to use standalone models. So maybe
you don't want to have an agent. You
just want to interact with a simple
model and that's it. And you want to do
that in a more abstract way. So you can
replace the models that you're using.
For this you can also use lang chain but
not agents. You want to use lang chain
chat models. And you want to import the
function init chat model. And for this
we're also not going to use tools. We're
just going to do it like this. And it's
actually quite simple and
straightforward. I just say model is
equal to init chat model. Then I can
provide basic parameters like again the
model identifier which is going to be
4.1 mini again. And theoretically if I
want to I can do stuff like temperature
0.1 for example. And once I'm done with
all this I can just say response is
equal to model.invoke and here I just
provide a prompt now like hello what is
python question mark and then I can
print the entire response object or I
can just print the response content. So
in this case, response.content since we
didn't pass a message history or
conversation history. And of course, one
more time, I forgot the load. So we're
going to add this and run this again.
And there you go. We get an answer that
tells us what Python is, a highle
interpreted programming language. And we
also have again this entire response
object if we want to access different
fields like the total amount of tokens
and stuff like this. But you can see how
easily this is done in lenchain. And we
can also just swap the model if we want
to use a different one. So instead of
using GPT I can say I want to use
Mistral medium. I just have to provide
the proper string from the website and
then I can run this and everything else
in the application stays the same. I'm
now just using Mistral uh medium instead
of GPT4.1 mini and I'm going to get the
response and everything's going to work
in the same way. There you go. It's a
quite comprehensive response but we get
the answer here from Mistl. Let me now
switch back to 4.1 mini. Now if we want
to pass a conversation history and not
just a single prompt, we can do that as
well with this list and dictionary sort
of notation that we used before. But we
can also import specific classes for
that. I can also say from langchain dot
messages import human message AI message
and system message. This makes it then
super simple to work with. I can just
say conversation is equal to a list and
in here I can say I first have a system
message. For example, you are a helpful
assistant for questions regarding
programming. Then the second message
could be something that a human asks.
Like for example, what is Python?
Maybe let's stay consistent here with a
quotation marks. Let's use single
quotations everywhere. And then we're
going to say an AI already answered
that. We're going to say that it told us
Python is an interpreted
programming language. not question mark
but period and then we're going to say
the human has a follow-up question which
relates to the previous messages and
this is when was it released question
mark. So now instead of invoking on a
string I can also invoke on a
conversation and actually I'm missing a
t here and here as well and now the rest
stays the same and we have a
conversation. So Python was first
released in 1991 by gofo fun rosesome.
Now, we saw with Mistl that responses
can be quite long and we need to wait
for them to be finished before we can
start reading them. If we don't want
that, if we want to read them in real
time as they're generated, we can also
stream the response. So, let us maybe go
back here to the prompt and let us also
change the model back to mistrol. So,
here I'm going to say mistrol medium 258
and we're going to ask the same
question, but instead of just getting
the response here, we're going to stream
it. So we're not going to invoke. We're
going to say model stream and we're
going to iterate over this generator
here to generate the chunks. So I'm
going to say for chunk in model stream
I'm going to print the chunk.ext.
I'm going to have no line breaks after
each print and I'm going to say flush is
equal true equal to true so I can see
the output in real time. And we're going
to delete that and run this. And now you
can see how this is generated in real
time and I can read while it's still
generating. So let us now come back to
our initial example with the agent here
and the weather function. We're now
going to extend it to incorporate more
concepts. So on the one hand I want to
have structured output. I want to have
the output message or summary and I also
want to have some key information like
the temperature or the humidity. Also I
want the agent to be able to realize
what location I'm asking this question
from. So I don't have to specify the
city. So I just want to say what's the
weather like? And the agent should
realize that I'm asking from a specific
city based on mock database entries that
we're going to provide here. And with
this context, it's going to then
retrieve the proper information. And
finally, I would also like to add memory
to this agent so it can remember that we
had a conversation and we're now
continuing that conversation. So we're
going to add some imports for all of
this. First of all, from core Python,
we're going to add here from data
classes the data class. And for
langchain here, we're going to say from
langchain.models
or chat models importit
chat model. In addition to tool, we're
also going to import tool runtime. And
finally here from lang graph. Now I said
we're not going to do lang graph. We're
not going to cover the langraph
framework, but we're going to use uh one
specific class from there. It's from
checkpoint memory the inmemory saver
which is going to be important for
remembering the message history. Cool.
So now let us create two data classes.
One is going to be for the context.
We're going to keep track of the user ID
that the model is communicating with so
that we can actually look up the
location of that user from our database
which we're going to just model as a
match case statement in a function. And
then we're also going to have a data
class for the response format. So we're
going to say here at data class and then
it's going to be a class called context.
Quite simple and it's just going to have
a user ID which is a string. And then
for the response format we're going to
say data class response format. And here
I want to have a summary which is going
to be a string. I want to have a
temperature in Celsius which is going to
be a float. I want to have the same in
Fahrenheit. And I also want to have a
humidity whatever the unit is here.
Cool. So now what we're going to do is
we're going to add an additional tool
and this tool is going to be locate
user. So the name is going to be locate
user and it's going to have the
following description. Look up a user's
now I have to use double quotations
here. Look up a user's city based on the
context. Now the interesting thing is
we're not going to pass the user ID as a
parameter. We're going to have a tool
runtime which contains context. So the
context is going to contain the user ID
and we're going to get it from this
context. How do we do that? We say
deflocate user and here we have a
runtime. This runtime is going to be a
tool runtime and we're going to pass
here in square brackets the context
class the data class we just created.
And here what we're going to do is we're
going to get the user ID from
runtime.context.
And I'm going to use a match statement.
So match runtime dot context dot user
ID. And depending on the value, we're
going to return a different city. So
let's make up some cases here. If the
user ID is ABC123,
I'm just going to return Vienna. Another
case could be if the user ID is XYZ456,
then I'm going to say that we're in
London. Then another case could be HJKL
for Vim and then 111. That would return
Paris. And if it's none of these, if
it's unknown, we're going to say case
default is just going to return unknown.
Now, you can define unknown behavior in
multiple ways. You can provide it in the
description. You can even provide it in
the return value itself. You can provide
it in the system prompt. You can also
add some custom logic. But basically, we
need to somehow instruct the model that
if it's unknown, just say it's unknown.
Maybe it can do it automatically as
well. Uh, wherever you want to put that,
put that somewhere how you want to
handle unknown values. But we're going
to look up based on the user ID in the
context. So remember the connection here
we have the runtime the tool runtime
passed to this tool which is based on
the data class context which we defined
up here which contains the user ID. So
now we're going to go down here create
the model to show you that we can also
pass a model instance. So model is going
to be init chat model GPT-4.1-
mini and the temperature not this sort
of temperature but the model temperature
is going to be 0.3 for example and then
we're going to create a checkpointer
which is going to be an in-memory saver.
As I said this is for remembering
conversations. We're going to add to the
agent invocations a thread ID and this
is going to determine the conversation
that we're focusing on. So we can keep
asking questions about the same
conversation. And now we can combine all
of this into the create agent function.
So we're going to say agent is equal to
create agent. Model is equal to model.
Tools is equal to get weather and locate
user. System prompt can stay the same.
Now new stuff here is context schema.
This is going to be the class the data
class of our context. So just context.
Then also response format. I think it's
not surprising that this will be our
class response format. And finally,
checkpointer also not surprising is
going to be the checkpoint. So what our
agent now does is it has access to a
model GPT4.1 mini. It has access to two
tools. One for getting weather
information about the city, one for
locating the user, so getting the city
of the user based on the user ID. We
have a system prompt here. We now also
have the ability to pass context. So for
this we use the data class context which
again contains the user ID. Then we have
a response format which means that our
model is forced now to answer in a
specific format. This format is going to
be a summary string and then three
floats for temperature and humidity. And
finally we add memory to the model so it
can keep track of conversations based on
a thread ID. So now when we invoke
something we also need to pass context
and thread ID. For this we're going to
start by saying config is equal to
dictionary which is going to have a key
called configurable. And this
configurable is going to point to
another dictionary which contains thread
ID which itself points to one for
example. And now for the invocation I'm
going to say what is the weather like
without specifying a city but I'm going
to specify context and I'm going to pass
the configuration. So for the
configuration just config equals config
and for the context we're going to say
context is equal to context an instance
of the data class where we set the user
ID to be equal to ABC123.
So that would result in Vienna again. We
should get the same response. And since
we're now working with this response
format, since we're forcing structured
output, we're going to not access just
messages negative1 content. We're going
to print the entire response object if
we want to. Actually, I don't want to do
that. I just want to get the structured
output itself. And for that, I'm going
to say here response structured
response. And this will give us the
entire response object. If I'm
interested in specific parts of that, I
can just say dot summary or dot
temperature Celsius for example. So now
I can run this and we have a problem
because we're not closing this curly
bracket early enough. So of course it
belongs to this message history. But
these are now just keyword arguments. So
let me run this again. And we can see
the current weather in Vienna is partly
cloudy. And then I also get 15.0 zero
for the Celsius temperature. Now, if I
change my user ID to something else like
XYZ, what was it? 456, then I should get
the weather for London. There you go.
The weather in London is currently sunny
with a temperature of 12. Then, if I try
something completely different like
this, something it doesn't recognize,
probably it's going to tell me unknown
or it's going to do something else. I
couldn't find your location, so I can't
tell the weather, but hey, if you tell
me your city, I'll fetch the weather.
and zero is the default value for
temperature. And to show you that this
actually works with a follow-up, if I
provide here again a valid user ID like
this for Vienna here, I can also follow
up with the same config to keep track of
this conversation. So I can just copy
this here, paste it down here and I can
say and is this usual question mark. So
when I run this now and of course maybe
before running this I should also print
the result. So just copy this from up
here. print structured response summary
and now it should keep track of the
information. So we have this inmemory
saver. So we have still the information
that uh the weather in Vienna is what it
is and then yes the weather in Vienna
being partly cloudy with mild
temperatures around 15° C and so on is
usual for this time of the year.
However, you will notice that if I take
this and I change this before I do that.
So if I say now the thread ID is two,
it's no longer going to be related to
that thread. So, it doesn't know what
I'm talking about. So, here I get the
information about Vienna. And now I
would need to know the specific weather
conditions or location you're referring
to in order to determine if it's usual
or not. So, since we're in a different
threat, it doesn't know what we talked
about up until this point. If I may for
a second, I would like to plug myself in
as the sponsor of my own video. If you
go to my website, neural9.com, you will
find a tab services and a tab tutoring.
Here you can hire me for all sorts of
stuff like data science, machine
learning, web development. If you need
help with something in a project here,
you can book me for one-on-one tutoring.
If you want me to teach you personally
something that you don't understand, if
you like my teaching style on both pages
at the bottom, you can contact me via
mail and also via LinkedIn. Just wanted
to let you know about this. Next, I want
to show you how we can work with
multimodal input. So, how can we pass to
a model not just text, but for example,
image data. For this, I'm going to say
here model is equal to in a chat model.
I'm going to use again GPT-4.1-
mini. And we're now going to create a
message in the dictionary format. Again,
I'm going to show you a different way in
a second as well. The role for this
message is going to be user. And the
content field now is going to have
multiple values. So, we're going to say
content is pointing to a list. And this
list will contain multiple pieces of
content. For example, the first one will
be of type text and we're going to say
that the actual text content. So again
text here as a key not as a value will
be describe the contents of this image.
Now we can copy that and we can say type
image and now we have two ways to
provide image content. One is by using
URL. So this basically points to an
image somewhere on the web or we can
also pass base 64 encoded image bytes.
Now we're going to do both but I'm going
to start with a URL and for that I have
here a link from my website. So just
neural9.com and the logo on my website.
I'm going to pass this here as image
content. And what I'm going to do then
is I'm going to say model.invoke.
I'm going to pass a list of messages and
just my one message in here. That's
going to be the response. And then I can
just say print response.content.
So if I run this, this will take a look
at my image and tell me that this is a
logo that reads neural 9 and orange text
on a black background. The text is
stylized with a number one. I think this
is just a mistake in the font. And it
explains what my logo looks like
essentially. So this is what we want.
And we can do the same thing with an
image from disk. So if I open the
sidebar here, you can see I have the
logo.png. png. We can also load this and
encode it with b 64. So in this case
here just b 64 as the field as the key
here and I'm also going to say from b 64
import b 64 encode. So the idea is we
load the bytes we encode them with b 64
and then we decode it into a string. So
we're going to say here b64 encode. What
are we encoding? We're opening a file
from disk called logo.png
in reading bytes mode. Then we're
reading the content of that file. We're
encoding it with B 64 and then we're
decoding it into a string. So we can
actually pass it here to the model. Now
what we also need to pass here if we use
B 64 is a mime type. So let's actually
format it like this. And the mime type
is going to be image / PNG. And actually
I think we need to use underscore not
dash. So now when I run this you can see
the image shows a logo with a text
neural 9 written in orange and basically
the same thing as before. Now we can
also do it with the message classes but
we have to do it basically in the same
way. So I can say langchain do messages
import human message and the only thing
I would change is I would get rid of the
role but I would still keep the content
as it is. So I would say here the
message is equal to human message and
then I would say content is equal to the
list. So that is the small difference
here. So context would be equal to the
list of these two things and then we
would close that with an ordinary
bracket but basically the rest stays the
same. So when I run this we should get
the same response. It's just a different
way to write it. Now for the next
example we're going to build a simple
rack use case. So retrieval augmented
generation. Basically we're going to use
a vector store and an embedding model to
find the most similar pieces of content.
In our case, simple messages or simple
statements, let's say. And for this,
we're going to use lang chain in the old
school way. So, we're actually going to
say up here from langchain openai import
openai embeddings. This is for the
embeddings model. And then we're also
going to use a vector store. So, a
vector database in my case face. And for
this, we're going to say from langchain
community
import. And we're going to import uh
face. But actually not from community
directly but from community.
Vector stores and since I'm not getting
autocomp completion I assume I have to
install this separately. So I'm going to
say here now uv at and then
langchain-ash
community. So this installs now
additional packages langchain community
langchain classic langchain textlitter.
So if I now go back into the code
hopefully if I type dot something here
there you go. We can see all the modules
here. So, langin community vector stores
import face. Now, I think actually for
face, we also need to install the face
package. So, let me leave this and let's
say UV add face. And I'm just going to
go save here with the CPU. So, I don't
have to care too much about GPU stuff.
This is just face CPU. And now, if I go
back into the code, we should be able to
use them. So, the basic idea is I'm
going to have a list of statements.
These statements will be stuff like I
love apples or I like oranges or I like
pears and then something about computers
also related to Apple but semantically
different because Apple is a company.
Apple is also a fruit. So we're going to
see if the embeddings can distinguish
the concepts and we're going to retrieve
the most similar statements from the
vector store. So let us start by saying
embeddings is going to be equal to
openAI embeddings and we're going to
provide the embedding model text-
embedding-large
three. I think this was the identifier
and we're going to say of course that
this is the model. So model is equal to
text embedding three large. Yeah. So
actually three large not large three.
Then I'm going to copy paste some very
very simple statements. Nothing too
fancy. Apple makes very good computers.
Whether that's true or not, I'm just
stating it here. I believe Apple is
innovative. I love apples. I'm a fan of
MacBooks. I enjoy oranges. I like Lenovo
ThinkPads. I think pairs taste very
good. So, theoretically, this message
here, I like Lenovo ThinkPads, should be
closer to I'm a fan of MacBooks. And
even to stuff like Apple makes very good
computers. uh it should be closer and
more related than I love apples because
that has nothing to do with apple only
syntactically only in terms of like the
name it has to do something with apple
but it's a different concept. So we're
going to create a vector store from
these texts. I'm going to say vector
store is going to be equal to face dot
from texts and we're going to pass the
list of texts here. Now in order for
this to actually work we need to pass an
embedding model as well. So embedding is
going to be equal to embeddings. And
this is going to basically take all
these texts, embed them into vector
space and then store them in the vector
store. So to see how this works, we can
say print vector store dot similarity
search and then I can add something new
like apples
are my favorite
food. K equals 7 to get all of them uh
just in a specific ranking. And then I'm
going to do the same thing with Linux is
a great operating system. Of course,
Linux is just a kernel, but you get the
idea. So, I'm going to comment this out
and run the first one alone to see what
are the most similar statements in the
vector store. So, you can see here the
most similar was I love apples. Then the
second one was I think pears taste very
good. So, even though the word apple
occurred, we or actually there you go. I
enjoy oranges was before that then I
think pears taste very good and then
only it gives us Apple makes very good
computers. I know this is not the best
view. Uh but we have these individual
documents the most similar the second
most similar the third most similar and
you can see that everything related to
fruits was ranked as more important and
more relevant than apple. Even though
that apple here and apple here is the
same string it recognizes that the
concepts are different. So let's see
what we get for Linux. We get I like
Lenovo ThinkPads. I'm a fan of MacBooks.
And then we get I love apples. For
whatever reason, the company Apple is
ranked lower. Whatever. So that is how
you can use the vector store as a
separate component. So just interact
with embeddings and vector stores. But
now what we want to do is we want to
make this part of agents with lang. So
we want to take this capability of using
this rack feature of doing a similarity
search. We want to take this and build
it into our agent. For this, I want to
use a slightly different example. So,
the code is the same. We still have the
same structure here of embeddings,
texts, and then from text similarity
search, but we have different content.
We have I love apples. I enjoy oranges.
I think pears taste very good. I hate
bananas. I dislike raspberries. I
despise mangoes. I love Linux. I hate
Windows. And then we have here again the
vector store. And here we look for what
fruits does the person like and what
fruits does the person hate. So if I run
this, you can see that these two lookups
give us the information that we want. I
enjoy oranges. I love apples. I think
pears taste very good. Then I despise
mangoes. I hate bananas. I dislike
raspberries. So we have the information
by using the similarity search. What we
can do now with this vector store is we
can turn it into a retriever for our
agent. So I can say down here retriever
is equal to vector store as retriever.
And we're just going to pass the search
keyword arguments. So search_quarks
is going to be equal to a dictionary
where K is going to be equal to three.
So this basically tells us just give me
the top three answers. We're hard coding
this to keep it simple here. But
basically this now is a retriever that
we can use in our agent. But in order to
actually use this retriever in our
agent, we need to turn it into a tool.
So what we're going to do is another old
school import up here from
langchain_core.
tools. We're going to import the create
retriever tool. So the idea is we pass a
retriever and we turn it into a tool
that the agent can use. So quite simple,
the retriever
tool is going to be equal to create
retriever tool. We pass the retriever
and then we pass the name which is going
to be our fruit search I guess or
actually let's call it knowledgebased
search. So KB search and then the
description is search the small
product/fruit
database for information. Let me call
this knowledge base and then we can
close that. And now the cool thing is
this is now just another tool. We can
just add it to the list of tools when we
create an agent. So for this we need to
import again from langchain. This is now
the modern way of doing things from
langchain.agents agents import create
agent and now I can just say as we did
before agent is equal to create agent
the model is going to be GPT4.1 mini
again then we can say tools is equal to
retriever tool then I'm going to copy
paste the system prompt nothing too
fancy here just explaining again your
helpful assistant and if there's any
questions about Macs apples laptops
whatever use the tool that you have
first the retriever tool basically
retrieve the context answer in a concise
way and the hint that maybe you have to
use this tool multiple times because if
I'm asking a question that needs to
combine information, you might have to
do it multiple times, which the agent
can do since it's not just a prompt.
It's an agent that can use tools, then
think, then use tools again, and so on.
So then we do what we already did
before. We pass a prompt. I'm just going
to say here, agent invoke, what three
fruits does the person like and what
three fruits does the person dislike?
And then we get the answer. So, I'm
asking two things in a single prompt.
And up here, we use two different
similarity searches for that. Let's see
if the agent can handle that. I'm going
to run this. And it says here, the
person likes oranges, apples, and pears.
The person dislikes mangoes,
raspberries, and bananas. You can see
also the retrieval um or actually cannot
see it up here. You can see the
retrieval that was done one. And then a
second call to the tool was done down
here. And this gave us the information
for this final response. Now we get to a
very powerful concept in lang chain. The
middleware. Middleware basically meaning
it sits between request and response.
And middleware allows us to do a lot of
different things to enhance the
capabilities of our agents. For example,
we can choose different system prompts
depending on the context. We can choose
different models based on certain
criteria. We can summarize stuff. We can
have rate limits. All sorts of things
can be done in between this window of
getting a request and sending a
response. We can do a lot of stuff
behind the scenes. And instead of
talking about this too long in a
theoretical way, what I want to do here
is I just want to show you a couple of
examples of custom middleware of already
existing middleware of different use
cases and then you can just go and
explore it yourself. So I want to get
started with a simple example right
away. I want to show you how we can swap
the system prompt based on the level of
expertise that the user has. So for this
we're going to use context again and I'm
going to start by importing from
langchain.agents.m
middleware the following things. model
request, model response and in this case
now the decorator called dynamic prompt.
So this is middleware that is already
implemented. So we don't have to build
it and define it ourselves. But what we
can do now is we can say that if the
user has a certain role like he's an
expert or he's a beginner or maybe even
he's a child, we're going to say adjust
your answer, adjust your style based on
that. So we're going to create a data
class again called context. And actually
we need to again import here from data
classes import data class. And this
context this time will not contain the
ID but the user role. So I'm going to
say here data class user ro is going to
be a string. And now we're going to
write a function that takes in the
context from the model request and
returns a system prompt dynamically
based on the user role. So I'm going to
use here the decorator dynamic prompt.
And we're going to call this function
user ro prompt. For example, it will get
a request as input which is a model
request and then we return the string.
So that is the structure and now we can
take the context from the request. So I
can say here the user role is going to
be request runtime context do user role
and we can do the same thing as before
with the user ID just a match case
statement. So match user role and then
we can have different cases. For
example, the user can be an expert. And
in this case, we will return a different
system prompt. Now, what I want to do
here is I want to define a base prompt.
So, we're going to have just a base
prompt. You are a helpful and very
concise assistant. And now, if it's an
expert, we're going to say here, give me
the base prompt. And then just add to
it, provide detail
technical responses. On the other hand,
if we have a beginner, we're going to
say keep your explanations
simple and basic. And finally, to see
the biggest difference, we're going to
say case child. And it's going to
basically be explain everything
as if you were literally
talking to a 5year-old. And then the
default case would just be okay. If it's
none of that, then we're just going to
return the base prompt. Cool. So now
basically we do the same thing as
before, but this time we don't pass it
as context. We don't pass it as a tool.
We pass it as middleware. So I say here
create agent. The model is going to be
the same as before. GBT4.1
mini. And now I'm going to say here
middleware is equal to and that's going
to be a list and it's going to contain
the user role prompt function. But of
course I still have to pass the context
otherwise it of course doesn't work. So
context schema still has to be this data
class. So then I can say response is
equal to agent.invoke.
I'm going to have to provide the
dictionary again here with messages role
is user and content is explain PCA. And
here now I pass the context. The context
is going to be that the user role let's
say in the beginning is going to be
beginner. So user role is equal to
beginner. Print response. And we can see
here content PCA principal component
analysis is the technique used to reduce
the number of variables in data while
keeping the most important information.
It transforms the original data into new
variables and so on. So quite simple but
still not something a 5-year-old would
understand. So let's go and see what the
export explanation would be. And there
we can see principal component analysis
is a dimensionality reduction technique
used to transform a large set of
correlated variables into a smaller set
of uncorrelated variables called
principal components. Yeah. Then we get
here projection and component selection.
Still not I mean I can decomposition.
Okay, it's a little bit more technical.
But if I go now to the lowest level here
which is the child then I should get
something really really simple. So here
okay imagine you have a big box of
crayons with lots of colors. Sometimes
you want to choose just a few crayons
that can still help you color many
pictures nicely. PCA is like a magic
helper. Yeah, you get the idea. So this
is something we can do with middleware
here. In between the request and
response, we can have this dynamic
prompt choice based on the user role,
based on the context. We can say uh that
we want to do different things. We want
to instruct the model to do different
things. Now, another thing that we can
do is we can not only select a prompt
dynamically, we can also select a model
dynamically. For this here, I'm also
going to import from langchain.hat
models our trusted init chat model
function. I'm going to get rid of
dynamic prompt and I'm going to import
wrap model call. This basically means it
happens around the model call whenever
we want to call the model that is part
of the agent. We're going to run our
middleware in that moment or I should
say in between the model request and the
model response. So the idea here would
be I want to have a basic model that
would be in it chat model and then I can
say for example model is equal to GPT4
mini and then I would also have an
advanced model which could be 4.1 mini.
The idea now is I define a function
called dynamic model selection. This
function takes in a request returns a
response. So I take in a model request
and the output of that function is a
model response. In addition to that we
also get a handler as parameter here and
I have to annotate this now with wrap
model call. So this actually happens
when the model is called. In case you're
interested in that, you can also go to
the langchain documentation. And there
you can see how this basically works.
Request comes in. Then we have before
agent, before model, wrap tool call,
wrap model call, after model, after
agent, result. And we're now doing this
here, wrapping the model call with our
middleware. So what kind of logic you
want to apply here is up to you. In my
case, I'm going to keep it simple. If I
have more than three messages, I'm going
to use the stronger model. Otherwise,
I'm going to use the basic model. Not
the most intelligent choice here, but
we're going to do it like this. So
message count is going to be equal to
length of request state and then
messages and then just if the message
count is greater than three the model is
going to be equal to basic model
otherwise we're going to say it's going
to be the advanced model and then we say
request domodel is equal to model and we
return a handler for that request. So
basically we're just taking the request
and setting the model to something else
depending on some criteria. This could
be something more intelligent and then
we just handle the request with this
change. So that's super easy to
integrate. Again agent is going to be
equal to create agent. Model is going to
be basic model by default. And then the
middleware is going to be our dynamic
model selection. Now to keep things a
little bit more beautiful here, I'm
going to import again from langchain
messages the system message, the human
message, and the AI message. And then
down here, I'm going to say response is
equal to agent invoke. I'm going to
invoke here on the following dictionary.
Messages is going to point to a list of
messages. I'm going to start with a
system message. You are a helpful
assistant. Human message. What is
oneplus 1? Keeping it very basic here
and essentially we can take a look at
two things. Uh on the one hand you can
get the idea if you look at the quality
of the output. I mean oneplus 1 is
something that every model should be
able to do. But you can say here again
print response messages
-1.content
to just get the answer but then you can
also take a look at the actual model
that was used to produce this response.
So instead of saying content you can say
response metadata and you can target the
field called model name. And if we run
this now we can see that we use the
wrong model. Why is that? Because we're
we have to swap this. Of course if we
have more messages we're using the
advanced model otherwise the basic
model. But in theory it works. So if I
run this now I should see that we're
using 40 mini. And if I add more
messages. So, for example, let's just
add the same message a couple of times
here. This is going to trigger the
change and now we're using 4.1 mini.
Now, in general, I want to show you how
you can define your own custom
middleware by using these hooks. So,
we're going to do that with a class
here. For this, I'm going to create a
class called hooks. Let's call it hooks
demo. It's going to inherit from a class
that we need to import called agent
middleware. So, we're now defining our
own agent middleware as a class here.
And we can just override these methods
that are representing the hooks. So in
order to do that, let me get rid of all
of this. We're also going to need
something called the agent state. And
basically the constructor is going to be
the init method taking self as a
parameter here calling the constructor
of the parent class. So super init. And
what we're going to do is we're going to
set a start time equal to 0.0. So this
is not going to be an actually useful
example. I'm just going to show you how
we can trigger when the different parts
are triggered. And now we can just
overwrite before agent. For example,
before agent is obviously before we get
to the agent, before the request comes
to the agent, we're going to be running
this. It takes self and an agent state.
So state, which is agent state, and also
the runtime. And in my case, what I want
to do here now is I want to say self dot
start time is going to be equal to and
now I need to import from core python
here time. I'm just going to say time.
So we're importing core python package
time. self.start time is equal to time.
And then just so we know it happened.
I'm going to say here before agent
triggered. Now I'm going to copy paste
it here. For the other ones, it's just
before model, after model, and after
agent. Here we're going to do something
else. So I didn't copy it. But basically
here we're just printing before model,
after model. Whatever you want to do
with that. It's basically also you have
wrap and so wrap tool call, wrap model
call. We saw that already. We're just
going to keep doing it with these four.
And what I'm going to do here is I'm now
going to calculate the time difference.
And I'm going to print the result. So
I'm going to say here print
after agent and then colon time. So the
current time minus self start time. And
now we can go and say agent is equal to
create agent GPT4.1 mini. And then we
can say middleware is equal to and now
we're going to pass hooks demo as an
instance here. So we're creating
instance of this class. This is another
way to use middleware. And then I can
just do the usual stuff from before just
agent invoke some question. And now when
I run this we get a problem of course
because I cannot just call the module
time. Of course I need to say time or
time perf counter whatever you want to
use here but that shouldn't be too much
of an issue. So now we get before agent
triggered before model. I forgot to add
trigger here. But then we should get at
some point after model and then we
should get after agent and the
difference here in time. And since PCA
is probably quite complicated uh for a
model to put into words. It takes some
time but you can see 14 seconds is what
we measured here. This is essentially
what you can do with these hooks in um
lang chain when it comes to middleware.
You can customize what happens at what
point in the workflow in the cycle. So
now, last but not least, I would just
like to show you a couple of examples of
already existing middleware. We're not
going to go too deep into this. I'm
going to just copy paste one of these
examples here into this uh file, which
is going to be summarizing conversation.
We're not going to actually use this.
I'm just showing you how to use this.
Basically, you go from langchain agents
middleware summarization middleware. One
of the many choices that you have. What
this basically does is you have a couple
of parameters here. You can specify the
model for the summarization, which is
not necessarily the same as the model
that the agent uses. The idea is that
after a certain number of tokens, you
summarize the conversation. You keep the
last 20 messages after 4,000 tokens and
GPT40 mini in this case summarizes the
important key points of the conversation
up until this point. And then you just
continue the conversation. That is one
middleware that we can use. And then on
the middleware page in the
documentation, you can also find some
examples. For example, here human in the
loop middleware. This basically means on
certain events on certain tool users, we
get an interrupt and then the user has
to manually say okay continue, edit,
approve, reject, whatever. This is of
course important in key steps. If you
have to do some payment, if you have to
send an email or something, it makes
sense to have this human in the loop
middleware. Then we also have model call
limit. So basically saying we cannot
call the model too often per run or per
threat. Same thing exists for tool
calls. Basically just a limiter for how
often you can call a tool in a run or in
a limit. Same thing also exists for tool
calls. Basically how often can you call
a tool in a specific thread or run. Then
also some interesting stuff here like
model fallback. If something doesn't
work, if you cannot use a model, fall
back to a different one. This can be
quite useful here. Also PII detection.
Basically personally identifiable
information. You can make it redact
certain key pieces of information for
compliance reasons for example and much
more other stuff like retrying tools or
having a to-do list. So basically a
planning middleware. You can take a look
at these examples but basically
middleware allows you to extend the
capabilities of your model or of your
agent I should say because you can tell
it what to do in certain scenarios make
it more dynamic make it more reactive so
to say which is very useful. So that's
it for this video today. I hope you
enjoyed it and hope you learned
something. If so, let me know by hitting
a like button and leaving a comment in
the comment section down below. Also, if
you're interested on my website, you
will find a tutoring tab and also a
services tab. There you can hire me
basically for one-on-one tutoring, for
one-on-one teaching you something or
also for services, machine learning,
backend development, consulting,
freelancing, whatever. If you're
interested in that, check it out. You
can contact me via mail or LinkedIn. And
besides that, don't forget to subscribe
to this channel and hit the notification
bell to not miss a single future video
for free. Other than that, thank you
much for watching. See you in the next
video and bye.
This video today is a full crash course on LangChain, the number one Python framework for building and working with AI agents. ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 💼 Services 💼 💻 Freelancing & Tutoring: https://www.neuralnine.com/services 🖥️ Setup & Gear 🖥️: https://neuralnine.com/extras/ 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🎙 Discord: https://discord.gg/JU4xr8U3dm Timestamps: (0:00) Intro (1:10) LangChain Ecosystem (3:50) Environment Setup (6:26) Simple AI Agent Example (11:45) Standalone Model Inference (13:40) Conversations (15:04) Streaming Responses (16:00) Advanced Agent Example (Context, Memory, Structured Output) (25:53) Multimodal Input (29:27) RAG Example (Embeddings, Vector Stores, Retrieval) (38:04) Dynamic System Prompts (43:21) Dynamic Model Choice (47:14) Custom Agent Middleware (50:14) LangChain Middleware Examples (52:28) Outro