LangChain Full Crash Course - AI Agents in Python | DailyDevLists

Loading video player...

Full Transcript

10,535 words • EN

Today we're going to do a crash course

on Langchain, the Python framework for

working with and building AI agents. It

makes it super simple to interact with

standalone models, build complex agents,

and integrate all sorts of other

components like embedding models or

vector stores. And all of this without

caring about specific differences in API

definitions of the various providers.

The goal of this video today is to cover

as much as possible in a short amount of

time. We're going to start by talking

about Langchain and its ecosystem. Then

we're going to take a look at a couple

of simple examples for building agents

and working with standalone models.

We'll learn how to work with message

histories, stream responses, use tools,

generate structured output, handle

multimodal input, pass context, and keep

track of memory. After that, we'll also

build a simple rag example. And finally,

we're going to take a look at

Langchain's powerful middleware, as well

as some interesting use cases for it.

That's quite a few things to cover, and

I think you can learn a lot today,

especially since we're going to work

with the latest version of Langchain

1.0. If you like this video, let me know

by hitting the like button and

subscribing. But now, let us get right

into it.

[Music]

>> All right. So, we're going to cover

quite a lot today, which is why I'm

going to try to speed this up to cover

as much as possible in a concise way so

we don't waste too much time here. I

want to start by just briefly talking

one or two minutes about Langchain, the

Langchain ecosystem, and also its

development history. Now, langchain

itself as I mentioned is a Python

framework for building and working with

AI agents. The main use case or the main

benefit of using Langchain is that we

can use all these models and related

tools in an abstract way. So, regardless

of the provider, regardless of whether

you're using OpenAI or Enthropic or

Google, you basically have the same

classes and methods everywhere. And if

you have a system built on langchain

that uses vector stores, embedding

models, uh AI models, agents, whatever,

you can easily just swap out the

underlying technology and you can keep

the code the same for the most part. So

that's what I would consider the main

selling point of Langchain that you have

this abstract highlevel way of working

with agents and related tooling. Now in

addition to lang, we also have lang

graph which is more low-level. We have

more granular control. We can build

complex graph-based event-driven agents

or agentic systems. We're not going to

cover Langraph in this video today. I do

have videos on this channel where we use

Langraph, but it's not going to be the

topic of this crash course. And finally,

we also have Lang Smith, which is like a

collection of tools for observing,

evaluating, monitoring, and deploying

models. We're not going to cover that

either. We're going to focus fully on

Langchain today. Speaking of which,

Langchain is now on version 1.0. So,

some of you guys watching this might

have already worked with Langchain in

the past. Maybe you're already familiar

with the package to some degree. But if

you take a look at 1.0, you will notice

that some things have changed. So, when

I first used Langchain, the structure

was a little bit different. We had

Langchain, then we had Langchain core,

and langchain community. And then you

also had langai, langchain anthropic as

separate import packages. So you would

do something like from langchain core

import something from langchain openai

import something. Now with 1.0 it seems

like if you look in the documentation

everything is going through the main

langchain package. So from langchain do

something we import stuff. The second

thing you will notice is that langchain

is now much more focused on being a

library for agents. Not just for

integrating tools not just for using

models and vector stores and all that

but actually for building agents. Which

means that we now have here langchain

agents with create agent and in the past

in my opinion it was more like langchain

is this general toolkit for working with

these tools and lang graph was more

about the agent side. So we can still

use the packages like langchain core and

langchain community but for the most

part it's enough to just go with the

main lang package that is the modern way

of working with this framework. Cool. So

now that we covered that let us go ahead

and set up our environment. In my case

I'm going to use uv which is a rustbased

python package manager. Of course, feel

free to use pip or pip 3 install. Feel

free to use virtual environments with

virtual env or ven, whatever you want to

choose. In my case, I'm going to go to

my tutorial directory, say UV init, and

then I'm also going to say UV at

langchain. Now, one thing that's

important is if you want to use lang

chain together with some AI provider,

which you usually want to do, you need

to also provide it in square brackets.

So for example, if I want to install

lang chain with the openai dependencies,

what I will do is I will say open AI in

square brackets here. So in my case, UV

add langchain openai. If you use pip,

you do pip or pip 3 install langchain

openai. And this will also install

packages like of course openai. And you

have to do that basically for all the

providers that you want to use. So you

want to do that for enthropic, you want

to do that for mistral AI. I'm going to

actually just do it now to show you how

this works. Mistral AI and you also want

to do this for Google genai if you plan

to use all these models. Now what we're

also going to need is API keys. If you

want to use models from providers of

course you have to authenticate

yourself. So we're going to create a

file called and this file will contain

our keys. So these keys will be the

open_appi_key.

It's going to also be the Mistral API

key, the Anthropic

API key, and I'm not sure what the

correct name is for the Google one, but

I'm just going to say Google API key,

even though that's probably not the

correct one. So, what you want to do is

you want to go to the providers if you

have accounts there if you have API keys

and you want to paste them here. So, I'm

going to go to OpenAI console to Mistral

Console, Enthropic Console, and to my

Google Cloud Platform. There I'm going

to take the API keys, copy paste them

here. I'm not going to show how to

obtain them or maybe I'm going to show

them for one of them. So yeah, here for

example, we go to

platform.openai.com/api

keys or you just go to API keys here on

the left and here you can create a new

secret key or API key. You do that for

the various providers. Then you just

copy paste the keys here like this or

you use quotation marks if you want to

and that is our end file. So in my case

I already did this and now I'm going to

install an additional package called

python-.end.

This is going to allow us to load these

API keys into our Python script. Now, to

get autocomp completion, I'm also going

to activate this environment here. This

is something I just have to do for Neoim

to get autocomp completion. So, you can

ignore that if you're not coding in the

terminal. And then we're going to go

into our main py file and get started

with a first simple example. So, instead

of just covering the concepts one by

one, I'm going to show you how to build

a simple agent. And we're going to cover

a couple of concepts while we're doing

that. So, we're going to start with the

import. Let's import requests. Then also

as I said from we're going to import

load.n this is for loading the API keys

into the environment. And then from

langchain

agents as we saw already in the docs

we're going to import the create agent

function. This is like the central

function that is going to create agents

that we then use to do stuff. And in

this function we can do a lot of things.

So we can provide middleware, we can

provide tools, we can do a lot of things

in that function or with that function.

In addition to that, I'm also going to

say now from langchain.tools

import tool. This is a decorator that we

can use to annotate a function to

basically make that function a tool. We

can also provide some uh information

like a description or the name of the

function. So for example, we can say add

tool and I want to define a tool that is

called get weather. This is going to

obviously get the weather and I can

provide a description here for the agent

to know what this function is about or

what this tool can do. In my case, I'm

going to say here return weather

information for a given city.

Theoretically, you can also say return

direct is equal to true if you want to

have just the output of that tool

immediately returned to the user. In

this case, set this to true. We can also

explicitly set it to false. I think it's

the default, but just so we know it

exists, we're going to set it to false.

And then we're going to say now the

function is get weather. It's going to

get a city, which is going to be a

string as input. And we're going to make

it very simple here. Here I don't want

to use a mock function. So I'm actually

going to ping a weather API. But this

one is actually open source and free to

use or maybe not open source, but you

don't need an account to use it. So

we're going to say here response is

equal to requests.get.

And we're going to use https/wtr.in/

city question mark format equals j1. And

of course this needs to be an fstring

otherwise we cannot parse or we cannot

format the city into the string here.

But this basically gives us a JSON

object with temperature and the weather

information for a given city. And all we

want to do here is we want to return the

JSON object that we get here. So return

response.json and then the model the

agent can do whatever it wants with that

information or whatever it thinks is the

most reasonable thing to do. So to keep

it simple we're going to have just this

one tool. And now we're going to say

agent is equal to create agent. And the

first thing we want to do here is we

want to provide the model. Now the model

can just be a string. For example, I can

say GPT-40.

Now 40 doesn't support tool use. So

we're not going to use that. But I can

go with 4.1 mini. For example, this

would be open AI. It is automatically

recognized as OpenAI. So if I don't have

Langchain OpenAI installed, it's not

going to work. So for this, a couple of

things need to be present. One is an

OpenAI API key in the environment. And

the other thing is langchain open AAI

needs to be installed. So these two

things need to be given so that we can

actually use this model. After that I

can just say give me a list of tools and

this list will contain just get weather

as an entity. So we're not calling the

function we're passing it. And then we

can provide a system prompt like you are

a helpful weather assistant who always

cracks jokes and is humorous while

remaining

helpful. So just to give you an example

so that you can see that the system

prompt actually affects the agent and

then in order to get something from this

agent in order to send something to the

agent I should say we can just invoke

it. So I can say agent.invoke invoke and

here what we do is we pass a dictionary

that contains a field messages and this

messages has to point so this key

messages has to point to the value which

is a list and this list will contain all

the messages as dictionaries with ro and

content as we know it from the typical

API. So ro is going to be user and

content is going to be whatever we want

to ask for. For example, what is the

weather like in Vienna? Question mark.

So I need to then save this as a

response. And down here I can then print

the entire response. So print response

or I can say if I want to get just the

message, I can get from response the

messages. Maybe let me do that here with

square brackets and a string. And then I

want to have the last message. So

negative one. And then from this

message, I'm interested in the content.

So I can run this now. And I get a

problem because because I'm of course

not loading the environment variables.

So I need to import load. But I also

need to call it to load the data from

the end file. So let's go ahead and say

uvun main py. I'm doing it outside here

so I can see all the output. And what

you see up here is the raw response

object which contains the entire message

history also with all the data that was

provided by the API. And as a result

down here I get the actual message. The

weather in Vienna right now is partly

cloudy with a comfortable temperature of

about 15° C and then some information

about the wind speed and humidity and so

on. So actually quite comprehensive.

We're going to build on top of this

example and extend it later on. But

before we go deeper into agents, let me

remove all of this. I want to show you

how to use standalone models. So maybe

you don't want to have an agent. You

just want to interact with a simple

model and that's it. And you want to do

that in a more abstract way. So you can

replace the models that you're using.

For this you can also use lang chain but

not agents. You want to use lang chain

chat models. And you want to import the

function init chat model. And for this

we're also not going to use tools. We're

just going to do it like this. And it's

actually quite simple and

straightforward. I just say model is

equal to init chat model. Then I can

provide basic parameters like again the

model identifier which is going to be

4.1 mini again. And theoretically if I

want to I can do stuff like temperature

0.1 for example. And once I'm done with

all this I can just say response is

equal to model.invoke and here I just

provide a prompt now like hello what is

python question mark and then I can

print the entire response object or I

can just print the response content. So

in this case, response.content since we

didn't pass a message history or

conversation history. And of course, one

more time, I forgot the load. So we're

going to add this and run this again.

And there you go. We get an answer that

tells us what Python is, a highle

interpreted programming language. And we

also have again this entire response

object if we want to access different

fields like the total amount of tokens

and stuff like this. But you can see how

easily this is done in lenchain. And we

can also just swap the model if we want

to use a different one. So instead of

using GPT I can say I want to use

Mistral medium. I just have to provide

the proper string from the website and

then I can run this and everything else

in the application stays the same. I'm

now just using Mistral uh medium instead

of GPT4.1 mini and I'm going to get the

response and everything's going to work

in the same way. There you go. It's a

quite comprehensive response but we get

the answer here from Mistl. Let me now

switch back to 4.1 mini. Now if we want

to pass a conversation history and not

just a single prompt, we can do that as

well with this list and dictionary sort

of notation that we used before. But we

can also import specific classes for

that. I can also say from langchain dot

messages import human message AI message

and system message. This makes it then

super simple to work with. I can just

say conversation is equal to a list and

in here I can say I first have a system

message. For example, you are a helpful

assistant for questions regarding

programming. Then the second message

could be something that a human asks.

Like for example, what is Python?

Maybe let's stay consistent here with a

quotation marks. Let's use single

quotations everywhere. And then we're

going to say an AI already answered

that. We're going to say that it told us

Python is an interpreted

programming language. not question mark

but period and then we're going to say

the human has a follow-up question which

relates to the previous messages and

this is when was it released question

mark. So now instead of invoking on a

string I can also invoke on a

conversation and actually I'm missing a

t here and here as well and now the rest

stays the same and we have a

conversation. So Python was first

released in 1991 by gofo fun rosesome.

Now, we saw with Mistl that responses

can be quite long and we need to wait

for them to be finished before we can

start reading them. If we don't want

that, if we want to read them in real

time as they're generated, we can also

stream the response. So, let us maybe go

back here to the prompt and let us also

change the model back to mistrol. So,

here I'm going to say mistrol medium 258

and we're going to ask the same

question, but instead of just getting

the response here, we're going to stream

it. So we're not going to invoke. We're

going to say model stream and we're

going to iterate over this generator

here to generate the chunks. So I'm

going to say for chunk in model stream

I'm going to print the chunk.ext.

I'm going to have no line breaks after

each print and I'm going to say flush is

equal true equal to true so I can see

the output in real time. And we're going

to delete that and run this. And now you

can see how this is generated in real

time and I can read while it's still

generating. So let us now come back to

our initial example with the agent here

and the weather function. We're now

going to extend it to incorporate more

concepts. So on the one hand I want to

have structured output. I want to have

the output message or summary and I also

want to have some key information like

the temperature or the humidity. Also I

want the agent to be able to realize

what location I'm asking this question

from. So I don't have to specify the

city. So I just want to say what's the

weather like? And the agent should

realize that I'm asking from a specific

city based on mock database entries that

we're going to provide here. And with

this context, it's going to then

retrieve the proper information. And

finally, I would also like to add memory

to this agent so it can remember that we

had a conversation and we're now

continuing that conversation. So we're

going to add some imports for all of

this. First of all, from core Python,

we're going to add here from data

classes the data class. And for

langchain here, we're going to say from

langchain.models

or chat models importit

chat model. In addition to tool, we're

also going to import tool runtime. And

finally here from lang graph. Now I said

we're not going to do lang graph. We're

not going to cover the langraph

framework, but we're going to use uh one

specific class from there. It's from

checkpoint memory the inmemory saver

which is going to be important for

remembering the message history. Cool.

So now let us create two data classes.

One is going to be for the context.

We're going to keep track of the user ID

that the model is communicating with so

that we can actually look up the

location of that user from our database

which we're going to just model as a

match case statement in a function. And

then we're also going to have a data

class for the response format. So we're

going to say here at data class and then

it's going to be a class called context.

Quite simple and it's just going to have

a user ID which is a string. And then

for the response format we're going to

say data class response format. And here

I want to have a summary which is going

to be a string. I want to have a

temperature in Celsius which is going to

be a float. I want to have the same in

Fahrenheit. And I also want to have a

humidity whatever the unit is here.

Cool. So now what we're going to do is

we're going to add an additional tool

and this tool is going to be locate

user. So the name is going to be locate

user and it's going to have the

following description. Look up a user's

now I have to use double quotations

here. Look up a user's city based on the

context. Now the interesting thing is

we're not going to pass the user ID as a

parameter. We're going to have a tool

runtime which contains context. So the

context is going to contain the user ID

and we're going to get it from this

context. How do we do that? We say

deflocate user and here we have a

runtime. This runtime is going to be a

tool runtime and we're going to pass

here in square brackets the context

class the data class we just created.

And here what we're going to do is we're

going to get the user ID from

runtime.context.

And I'm going to use a match statement.

So match runtime dot context dot user

ID. And depending on the value, we're

going to return a different city. So

let's make up some cases here. If the

user ID is ABC123,

I'm just going to return Vienna. Another

case could be if the user ID is XYZ456,

then I'm going to say that we're in

London. Then another case could be HJKL

for Vim and then 111. That would return

Paris. And if it's none of these, if

it's unknown, we're going to say case

default is just going to return unknown.

Now, you can define unknown behavior in

multiple ways. You can provide it in the

description. You can even provide it in

the return value itself. You can provide

it in the system prompt. You can also

add some custom logic. But basically, we

need to somehow instruct the model that

if it's unknown, just say it's unknown.

Maybe it can do it automatically as

well. Uh, wherever you want to put that,

put that somewhere how you want to

handle unknown values. But we're going

to look up based on the user ID in the

context. So remember the connection here

we have the runtime the tool runtime

passed to this tool which is based on

the data class context which we defined

up here which contains the user ID. So

now we're going to go down here create

the model to show you that we can also

pass a model instance. So model is going

to be init chat model GPT-4.1-

mini and the temperature not this sort

of temperature but the model temperature

is going to be 0.3 for example and then

we're going to create a checkpointer

which is going to be an in-memory saver.

As I said this is for remembering

conversations. We're going to add to the

agent invocations a thread ID and this

is going to determine the conversation

that we're focusing on. So we can keep

asking questions about the same

conversation. And now we can combine all

of this into the create agent function.

So we're going to say agent is equal to

create agent. Model is equal to model.

Tools is equal to get weather and locate

user. System prompt can stay the same.

Now new stuff here is context schema.

This is going to be the class the data

class of our context. So just context.

Then also response format. I think it's

not surprising that this will be our

class response format. And finally,

checkpointer also not surprising is

going to be the checkpoint. So what our

agent now does is it has access to a

model GPT4.1 mini. It has access to two

tools. One for getting weather

information about the city, one for

locating the user, so getting the city

of the user based on the user ID. We

have a system prompt here. We now also

have the ability to pass context. So for

this we use the data class context which

again contains the user ID. Then we have

a response format which means that our

model is forced now to answer in a

specific format. This format is going to

be a summary string and then three

floats for temperature and humidity. And

finally we add memory to the model so it

can keep track of conversations based on

a thread ID. So now when we invoke

something we also need to pass context

and thread ID. For this we're going to

start by saying config is equal to

dictionary which is going to have a key

called configurable. And this

configurable is going to point to

another dictionary which contains thread

ID which itself points to one for

example. And now for the invocation I'm

going to say what is the weather like

without specifying a city but I'm going

to specify context and I'm going to pass

the configuration. So for the

configuration just config equals config

and for the context we're going to say

context is equal to context an instance

of the data class where we set the user

ID to be equal to ABC123.

So that would result in Vienna again. We

should get the same response. And since

we're now working with this response

format, since we're forcing structured

output, we're going to not access just

messages negative1 content. We're going

to print the entire response object if

we want to. Actually, I don't want to do

that. I just want to get the structured

output itself. And for that, I'm going

to say here response structured

response. And this will give us the

entire response object. If I'm

interested in specific parts of that, I

can just say dot summary or dot

temperature Celsius for example. So now

I can run this and we have a problem

because we're not closing this curly

bracket early enough. So of course it

belongs to this message history. But

these are now just keyword arguments. So

let me run this again. And we can see

the current weather in Vienna is partly

cloudy. And then I also get 15.0 zero

for the Celsius temperature. Now, if I

change my user ID to something else like

XYZ, what was it? 456, then I should get

the weather for London. There you go.

The weather in London is currently sunny

with a temperature of 12. Then, if I try

something completely different like

this, something it doesn't recognize,

probably it's going to tell me unknown

or it's going to do something else. I

couldn't find your location, so I can't

tell the weather, but hey, if you tell

me your city, I'll fetch the weather.

and zero is the default value for

temperature. And to show you that this

actually works with a follow-up, if I

provide here again a valid user ID like

this for Vienna here, I can also follow

up with the same config to keep track of

this conversation. So I can just copy

this here, paste it down here and I can

say and is this usual question mark. So

when I run this now and of course maybe

before running this I should also print

the result. So just copy this from up

here. print structured response summary

and now it should keep track of the

information. So we have this inmemory

saver. So we have still the information

that uh the weather in Vienna is what it

is and then yes the weather in Vienna

being partly cloudy with mild

temperatures around 15° C and so on is

usual for this time of the year.

However, you will notice that if I take

this and I change this before I do that.

So if I say now the thread ID is two,

it's no longer going to be related to

that thread. So, it doesn't know what

I'm talking about. So, here I get the

information about Vienna. And now I

would need to know the specific weather

conditions or location you're referring

to in order to determine if it's usual

or not. So, since we're in a different

threat, it doesn't know what we talked

about up until this point. If I may for

a second, I would like to plug myself in

as the sponsor of my own video. If you

go to my website, neural9.com, you will

find a tab services and a tab tutoring.

Here you can hire me for all sorts of

stuff like data science, machine

learning, web development. If you need

help with something in a project here,

you can book me for one-on-one tutoring.

If you want me to teach you personally

something that you don't understand, if

you like my teaching style on both pages

at the bottom, you can contact me via

mail and also via LinkedIn. Just wanted

to let you know about this. Next, I want

to show you how we can work with

multimodal input. So, how can we pass to

a model not just text, but for example,

image data. For this, I'm going to say

here model is equal to in a chat model.

I'm going to use again GPT-4.1-

mini. And we're now going to create a

message in the dictionary format. Again,

I'm going to show you a different way in

a second as well. The role for this

message is going to be user. And the

content field now is going to have

multiple values. So, we're going to say

content is pointing to a list. And this

list will contain multiple pieces of

content. For example, the first one will

be of type text and we're going to say

that the actual text content. So again

text here as a key not as a value will

be describe the contents of this image.

Now we can copy that and we can say type

image and now we have two ways to

provide image content. One is by using

URL. So this basically points to an

image somewhere on the web or we can

also pass base 64 encoded image bytes.

Now we're going to do both but I'm going

to start with a URL and for that I have

here a link from my website. So just

neural9.com and the logo on my website.

I'm going to pass this here as image

content. And what I'm going to do then

is I'm going to say model.invoke.

I'm going to pass a list of messages and

just my one message in here. That's

going to be the response. And then I can

just say print response.content.

So if I run this, this will take a look

at my image and tell me that this is a

logo that reads neural 9 and orange text

on a black background. The text is

stylized with a number one. I think this

is just a mistake in the font. And it

explains what my logo looks like

essentially. So this is what we want.

And we can do the same thing with an

image from disk. So if I open the

sidebar here, you can see I have the

logo.png. png. We can also load this and

encode it with b 64. So in this case

here just b 64 as the field as the key

here and I'm also going to say from b 64

import b 64 encode. So the idea is we

load the bytes we encode them with b 64

and then we decode it into a string. So

we're going to say here b64 encode. What

are we encoding? We're opening a file

from disk called logo.png

in reading bytes mode. Then we're

reading the content of that file. We're

encoding it with B 64 and then we're

decoding it into a string. So we can

actually pass it here to the model. Now

what we also need to pass here if we use

B 64 is a mime type. So let's actually

format it like this. And the mime type

is going to be image / PNG. And actually

I think we need to use underscore not

dash. So now when I run this you can see

the image shows a logo with a text

neural 9 written in orange and basically

the same thing as before. Now we can

also do it with the message classes but

we have to do it basically in the same

way. So I can say langchain do messages

import human message and the only thing

I would change is I would get rid of the

role but I would still keep the content

as it is. So I would say here the

message is equal to human message and

then I would say content is equal to the

list. So that is the small difference

here. So context would be equal to the

list of these two things and then we

would close that with an ordinary

bracket but basically the rest stays the

same. So when I run this we should get

the same response. It's just a different

way to write it. Now for the next

example we're going to build a simple

rack use case. So retrieval augmented

generation. Basically we're going to use

a vector store and an embedding model to

find the most similar pieces of content.

In our case, simple messages or simple

statements, let's say. And for this,

we're going to use lang chain in the old

school way. So, we're actually going to

say up here from langchain openai import

openai embeddings. This is for the

embeddings model. And then we're also

going to use a vector store. So, a

vector database in my case face. And for

this, we're going to say from langchain

community

import. And we're going to import uh

face. But actually not from community

directly but from community.

Vector stores and since I'm not getting

autocomp completion I assume I have to

install this separately. So I'm going to

say here now uv at and then

langchain-ash

community. So this installs now

additional packages langchain community

langchain classic langchain textlitter.

So if I now go back into the code

hopefully if I type dot something here

there you go. We can see all the modules

here. So, langin community vector stores

import face. Now, I think actually for

face, we also need to install the face

package. So, let me leave this and let's

say UV add face. And I'm just going to

go save here with the CPU. So, I don't

have to care too much about GPU stuff.

This is just face CPU. And now, if I go

back into the code, we should be able to

use them. So, the basic idea is I'm

going to have a list of statements.

These statements will be stuff like I

love apples or I like oranges or I like

pears and then something about computers

also related to Apple but semantically

different because Apple is a company.

Apple is also a fruit. So we're going to

see if the embeddings can distinguish

the concepts and we're going to retrieve

the most similar statements from the

vector store. So let us start by saying

embeddings is going to be equal to

openAI embeddings and we're going to

provide the embedding model text-

embedding-large

three. I think this was the identifier

and we're going to say of course that

this is the model. So model is equal to

text embedding three large. Yeah. So

actually three large not large three.

Then I'm going to copy paste some very

very simple statements. Nothing too

fancy. Apple makes very good computers.

Whether that's true or not, I'm just

stating it here. I believe Apple is

innovative. I love apples. I'm a fan of

MacBooks. I enjoy oranges. I like Lenovo

ThinkPads. I think pairs taste very

good. So, theoretically, this message

here, I like Lenovo ThinkPads, should be

closer to I'm a fan of MacBooks. And

even to stuff like Apple makes very good

computers. uh it should be closer and

more related than I love apples because

that has nothing to do with apple only

syntactically only in terms of like the

name it has to do something with apple

but it's a different concept. So we're

going to create a vector store from

these texts. I'm going to say vector

store is going to be equal to face dot

from texts and we're going to pass the

list of texts here. Now in order for

this to actually work we need to pass an

embedding model as well. So embedding is

going to be equal to embeddings. And

this is going to basically take all

these texts, embed them into vector

space and then store them in the vector

store. So to see how this works, we can

say print vector store dot similarity

search and then I can add something new

like apples

are my favorite

food. K equals 7 to get all of them uh

just in a specific ranking. And then I'm

going to do the same thing with Linux is

a great operating system. Of course,

Linux is just a kernel, but you get the

idea. So, I'm going to comment this out

and run the first one alone to see what

are the most similar statements in the

vector store. So, you can see here the

most similar was I love apples. Then the

second one was I think pears taste very

good. So, even though the word apple

occurred, we or actually there you go. I

enjoy oranges was before that then I

think pears taste very good and then

only it gives us Apple makes very good

computers. I know this is not the best

view. Uh but we have these individual

documents the most similar the second

most similar the third most similar and

you can see that everything related to

fruits was ranked as more important and

more relevant than apple. Even though

that apple here and apple here is the

same string it recognizes that the

concepts are different. So let's see

what we get for Linux. We get I like

Lenovo ThinkPads. I'm a fan of MacBooks.

And then we get I love apples. For

whatever reason, the company Apple is

ranked lower. Whatever. So that is how

you can use the vector store as a

separate component. So just interact

with embeddings and vector stores. But

now what we want to do is we want to

make this part of agents with lang. So

we want to take this capability of using

this rack feature of doing a similarity

search. We want to take this and build

it into our agent. For this, I want to

use a slightly different example. So,

the code is the same. We still have the

same structure here of embeddings,

texts, and then from text similarity

search, but we have different content.

We have I love apples. I enjoy oranges.

I think pears taste very good. I hate

bananas. I dislike raspberries. I

despise mangoes. I love Linux. I hate

Windows. And then we have here again the

vector store. And here we look for what

fruits does the person like and what

fruits does the person hate. So if I run

this, you can see that these two lookups

give us the information that we want. I

enjoy oranges. I love apples. I think

pears taste very good. Then I despise

mangoes. I hate bananas. I dislike

raspberries. So we have the information

by using the similarity search. What we

can do now with this vector store is we

can turn it into a retriever for our

agent. So I can say down here retriever

is equal to vector store as retriever.

And we're just going to pass the search

keyword arguments. So search_quarks

is going to be equal to a dictionary

where K is going to be equal to three.

So this basically tells us just give me

the top three answers. We're hard coding

this to keep it simple here. But

basically this now is a retriever that

we can use in our agent. But in order to

actually use this retriever in our

agent, we need to turn it into a tool.

So what we're going to do is another old

school import up here from

langchain_core.

tools. We're going to import the create

retriever tool. So the idea is we pass a

retriever and we turn it into a tool

that the agent can use. So quite simple,

the retriever

tool is going to be equal to create

retriever tool. We pass the retriever

and then we pass the name which is going

to be our fruit search I guess or

actually let's call it knowledgebased

search. So KB search and then the

description is search the small

product/fruit

database for information. Let me call

this knowledge base and then we can

close that. And now the cool thing is

this is now just another tool. We can

just add it to the list of tools when we

create an agent. So for this we need to

import again from langchain. This is now

the modern way of doing things from

langchain.agents agents import create

agent and now I can just say as we did

before agent is equal to create agent

the model is going to be GPT4.1 mini

again then we can say tools is equal to

retriever tool then I'm going to copy

paste the system prompt nothing too

fancy here just explaining again your

helpful assistant and if there's any

questions about Macs apples laptops

whatever use the tool that you have

first the retriever tool basically

retrieve the context answer in a concise

way and the hint that maybe you have to

use this tool multiple times because if

I'm asking a question that needs to

combine information, you might have to

do it multiple times, which the agent

can do since it's not just a prompt.

It's an agent that can use tools, then

think, then use tools again, and so on.

So then we do what we already did

before. We pass a prompt. I'm just going

to say here, agent invoke, what three

fruits does the person like and what

three fruits does the person dislike?

And then we get the answer. So, I'm

asking two things in a single prompt.

And up here, we use two different

similarity searches for that. Let's see

if the agent can handle that. I'm going

to run this. And it says here, the

person likes oranges, apples, and pears.

The person dislikes mangoes,

raspberries, and bananas. You can see

also the retrieval um or actually cannot

see it up here. You can see the

retrieval that was done one. And then a

second call to the tool was done down

here. And this gave us the information

for this final response. Now we get to a

very powerful concept in lang chain. The

middleware. Middleware basically meaning

it sits between request and response.

And middleware allows us to do a lot of

different things to enhance the

capabilities of our agents. For example,

we can choose different system prompts

depending on the context. We can choose

different models based on certain

criteria. We can summarize stuff. We can

have rate limits. All sorts of things

can be done in between this window of

getting a request and sending a

response. We can do a lot of stuff

behind the scenes. And instead of

talking about this too long in a

theoretical way, what I want to do here

is I just want to show you a couple of

examples of custom middleware of already

existing middleware of different use

cases and then you can just go and

explore it yourself. So I want to get

started with a simple example right

away. I want to show you how we can swap

the system prompt based on the level of

expertise that the user has. So for this

we're going to use context again and I'm

going to start by importing from

langchain.agents.m

middleware the following things. model

request, model response and in this case

now the decorator called dynamic prompt.

So this is middleware that is already

implemented. So we don't have to build

it and define it ourselves. But what we

can do now is we can say that if the

user has a certain role like he's an

expert or he's a beginner or maybe even

he's a child, we're going to say adjust

your answer, adjust your style based on

that. So we're going to create a data

class again called context. And actually

we need to again import here from data

classes import data class. And this

context this time will not contain the

ID but the user role. So I'm going to

say here data class user ro is going to

be a string. And now we're going to

write a function that takes in the

context from the model request and

returns a system prompt dynamically

based on the user role. So I'm going to

use here the decorator dynamic prompt.

And we're going to call this function

user ro prompt. For example, it will get

a request as input which is a model

request and then we return the string.

So that is the structure and now we can

take the context from the request. So I

can say here the user role is going to

be request runtime context do user role

and we can do the same thing as before

with the user ID just a match case

statement. So match user role and then

we can have different cases. For

example, the user can be an expert. And

in this case, we will return a different

system prompt. Now, what I want to do

here is I want to define a base prompt.

So, we're going to have just a base

prompt. You are a helpful and very

concise assistant. And now, if it's an

expert, we're going to say here, give me

the base prompt. And then just add to

it, provide detail

technical responses. On the other hand,

if we have a beginner, we're going to

say keep your explanations

simple and basic. And finally, to see

the biggest difference, we're going to

say case child. And it's going to

basically be explain everything

as if you were literally

talking to a 5year-old. And then the

default case would just be okay. If it's

none of that, then we're just going to

return the base prompt. Cool. So now

basically we do the same thing as

before, but this time we don't pass it

as context. We don't pass it as a tool.

We pass it as middleware. So I say here

create agent. The model is going to be

the same as before. GBT4.1

mini. And now I'm going to say here

middleware is equal to and that's going

to be a list and it's going to contain

the user role prompt function. But of

course I still have to pass the context

otherwise it of course doesn't work. So

context schema still has to be this data

class. So then I can say response is

equal to agent.invoke.

I'm going to have to provide the

dictionary again here with messages role

is user and content is explain PCA. And

here now I pass the context. The context

is going to be that the user role let's

say in the beginning is going to be

beginner. So user role is equal to

beginner. Print response. And we can see

here content PCA principal component

analysis is the technique used to reduce

the number of variables in data while

keeping the most important information.

It transforms the original data into new

variables and so on. So quite simple but

still not something a 5-year-old would

understand. So let's go and see what the

export explanation would be. And there

we can see principal component analysis

is a dimensionality reduction technique

used to transform a large set of

correlated variables into a smaller set

of uncorrelated variables called

principal components. Yeah. Then we get

here projection and component selection.

Still not I mean I can decomposition.

Okay, it's a little bit more technical.

But if I go now to the lowest level here

which is the child then I should get

something really really simple. So here

okay imagine you have a big box of

crayons with lots of colors. Sometimes

you want to choose just a few crayons

that can still help you color many

pictures nicely. PCA is like a magic

helper. Yeah, you get the idea. So this

is something we can do with middleware

here. In between the request and

response, we can have this dynamic

prompt choice based on the user role,

based on the context. We can say uh that

we want to do different things. We want

to instruct the model to do different

things. Now, another thing that we can

do is we can not only select a prompt

dynamically, we can also select a model

dynamically. For this here, I'm also

going to import from langchain.hat

models our trusted init chat model

function. I'm going to get rid of

dynamic prompt and I'm going to import

wrap model call. This basically means it

happens around the model call whenever

we want to call the model that is part

of the agent. We're going to run our

middleware in that moment or I should

say in between the model request and the

model response. So the idea here would

be I want to have a basic model that

would be in it chat model and then I can

say for example model is equal to GPT4

mini and then I would also have an

advanced model which could be 4.1 mini.

The idea now is I define a function

called dynamic model selection. This

function takes in a request returns a

response. So I take in a model request

and the output of that function is a

model response. In addition to that we

also get a handler as parameter here and

I have to annotate this now with wrap

model call. So this actually happens

when the model is called. In case you're

interested in that, you can also go to

the langchain documentation. And there

you can see how this basically works.

Request comes in. Then we have before

agent, before model, wrap tool call,

wrap model call, after model, after

agent, result. And we're now doing this

here, wrapping the model call with our

middleware. So what kind of logic you

want to apply here is up to you. In my

case, I'm going to keep it simple. If I

have more than three messages, I'm going

to use the stronger model. Otherwise,

I'm going to use the basic model. Not

the most intelligent choice here, but

we're going to do it like this. So

message count is going to be equal to

length of request state and then

messages and then just if the message

count is greater than three the model is

going to be equal to basic model

otherwise we're going to say it's going

to be the advanced model and then we say

request domodel is equal to model and we

return a handler for that request. So

basically we're just taking the request

and setting the model to something else

depending on some criteria. This could

be something more intelligent and then

we just handle the request with this

change. So that's super easy to

integrate. Again agent is going to be

equal to create agent. Model is going to

be basic model by default. And then the

middleware is going to be our dynamic

model selection. Now to keep things a

little bit more beautiful here, I'm

going to import again from langchain

messages the system message, the human

message, and the AI message. And then

down here, I'm going to say response is

equal to agent invoke. I'm going to

invoke here on the following dictionary.

Messages is going to point to a list of

messages. I'm going to start with a

system message. You are a helpful

assistant. Human message. What is

oneplus 1? Keeping it very basic here

and essentially we can take a look at

two things. Uh on the one hand you can

get the idea if you look at the quality

of the output. I mean oneplus 1 is

something that every model should be

able to do. But you can say here again

print response messages

-1.content

to just get the answer but then you can

also take a look at the actual model

that was used to produce this response.

So instead of saying content you can say

response metadata and you can target the

field called model name. And if we run

this now we can see that we use the

wrong model. Why is that? Because we're

we have to swap this. Of course if we

have more messages we're using the

advanced model otherwise the basic

model. But in theory it works. So if I

run this now I should see that we're

using 40 mini. And if I add more

messages. So, for example, let's just

add the same message a couple of times

here. This is going to trigger the

change and now we're using 4.1 mini.

Now, in general, I want to show you how

you can define your own custom

middleware by using these hooks. So,

we're going to do that with a class

here. For this, I'm going to create a

class called hooks. Let's call it hooks

demo. It's going to inherit from a class

that we need to import called agent

middleware. So, we're now defining our

own agent middleware as a class here.

And we can just override these methods

that are representing the hooks. So in

order to do that, let me get rid of all

of this. We're also going to need

something called the agent state. And

basically the constructor is going to be

the init method taking self as a

parameter here calling the constructor

of the parent class. So super init. And

what we're going to do is we're going to

set a start time equal to 0.0. So this

is not going to be an actually useful

example. I'm just going to show you how

we can trigger when the different parts

are triggered. And now we can just

overwrite before agent. For example,

before agent is obviously before we get

to the agent, before the request comes

to the agent, we're going to be running

this. It takes self and an agent state.

So state, which is agent state, and also

the runtime. And in my case, what I want

to do here now is I want to say self dot

start time is going to be equal to and

now I need to import from core python

here time. I'm just going to say time.

So we're importing core python package

time. self.start time is equal to time.

And then just so we know it happened.

I'm going to say here before agent

triggered. Now I'm going to copy paste

it here. For the other ones, it's just

before model, after model, and after

agent. Here we're going to do something

else. So I didn't copy it. But basically

here we're just printing before model,

after model. Whatever you want to do

with that. It's basically also you have

wrap and so wrap tool call, wrap model

call. We saw that already. We're just

going to keep doing it with these four.

And what I'm going to do here is I'm now

going to calculate the time difference.

And I'm going to print the result. So

I'm going to say here print

after agent and then colon time. So the

current time minus self start time. And

now we can go and say agent is equal to

create agent GPT4.1 mini. And then we

can say middleware is equal to and now

we're going to pass hooks demo as an

instance here. So we're creating

instance of this class. This is another

way to use middleware. And then I can

just do the usual stuff from before just

agent invoke some question. And now when

I run this we get a problem of course

because I cannot just call the module

time. Of course I need to say time or

time perf counter whatever you want to

use here but that shouldn't be too much

of an issue. So now we get before agent

triggered before model. I forgot to add

trigger here. But then we should get at

some point after model and then we

should get after agent and the

difference here in time. And since PCA

is probably quite complicated uh for a

model to put into words. It takes some

time but you can see 14 seconds is what

we measured here. This is essentially

what you can do with these hooks in um

lang chain when it comes to middleware.

You can customize what happens at what

point in the workflow in the cycle. So

now, last but not least, I would just

like to show you a couple of examples of

already existing middleware. We're not

going to go too deep into this. I'm

going to just copy paste one of these

examples here into this uh file, which

is going to be summarizing conversation.

We're not going to actually use this.

I'm just showing you how to use this.

Basically, you go from langchain agents

middleware summarization middleware. One

of the many choices that you have. What

this basically does is you have a couple

of parameters here. You can specify the

model for the summarization, which is

not necessarily the same as the model

that the agent uses. The idea is that

after a certain number of tokens, you

summarize the conversation. You keep the

last 20 messages after 4,000 tokens and

GPT40 mini in this case summarizes the

important key points of the conversation

up until this point. And then you just

continue the conversation. That is one

middleware that we can use. And then on

the middleware page in the

documentation, you can also find some

examples. For example, here human in the

loop middleware. This basically means on

certain events on certain tool users, we

get an interrupt and then the user has

to manually say okay continue, edit,

approve, reject, whatever. This is of

course important in key steps. If you

have to do some payment, if you have to

send an email or something, it makes

sense to have this human in the loop

middleware. Then we also have model call

limit. So basically saying we cannot

call the model too often per run or per

threat. Same thing exists for tool

calls. Basically just a limiter for how

often you can call a tool in a run or in

a limit. Same thing also exists for tool

calls. Basically how often can you call

a tool in a specific thread or run. Then

also some interesting stuff here like

model fallback. If something doesn't

work, if you cannot use a model, fall

back to a different one. This can be

quite useful here. Also PII detection.

Basically personally identifiable

information. You can make it redact

certain key pieces of information for

compliance reasons for example and much

more other stuff like retrying tools or

having a to-do list. So basically a

planning middleware. You can take a look

at these examples but basically

middleware allows you to extend the

capabilities of your model or of your

agent I should say because you can tell

it what to do in certain scenarios make

it more dynamic make it more reactive so

to say which is very useful. So that's

it for this video today. I hope you

enjoyed it and hope you learned

something. If so, let me know by hitting

a like button and leaving a comment in

the comment section down below. Also, if

you're interested on my website, you

will find a tutoring tab and also a

services tab. There you can hire me

basically for one-on-one tutoring, for

one-on-one teaching you something or

also for services, machine learning,

backend development, consulting,

freelancing, whatever. If you're

interested in that, check it out. You

can contact me via mail or LinkedIn. And

besides that, don't forget to subscribe

to this channel and hit the notification

bell to not miss a single future video

for free. Other than that, thank you

much for watching. See you in the next

video and bye.

LangChain Full Crash Course - AI Agents in Python

NeuralNine

6 days ago

53:20

AI Framework Development

Rank #1

Description

This video today is a full crash course on LangChain, the number one Python framework for building and working with AI agents. ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 💼 Services 💼 💻 Freelancing & Tutoring: https://www.neuralnine.com/services 🖥️ Setup & Gear 🖥️: https://neuralnine.com/extras/ 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🎙 Discord: https://discord.gg/JU4xr8U3dm Timestamps: (0:00) Intro (1:10) LangChain Ecosystem (3:50) Environment Setup (6:26) Simple AI Agent Example (11:45) Standalone Model Inference (13:40) Conversations (15:04) Streaming Responses (16:00) Advanced Agent Example (Context, Memory, Structured Output) (25:53) Multimodal Input (29:27) RAG Example (Embeddings, Vector Stores, Retrieval) (38:04) Dynamic System Prompts (43:21) Dynamic Model Choice (47:14) Custom Agent Middleware (50:14) LangChain Middleware Examples (52:28) Outro

Video Details

Category

AI Framework Development

Featured Date

November 12, 2025

Quality Rank

#1

AI Recommended