Loading video player...
In this video, we're building an
advanced AI agent in Python using
Langraph. Now, this isn't going to be a
basic chatbot. This is a multi-step deep
research agent that will pull live data
from sources like Google, Bing, and
Reddit. Now, this tutorial is not for
beginners. I'm going to be covering
advanced Python concepts, complex
architecture, and best practices for
building agents that go far beyond just
a single prompt response. Now, for this
video, like I said, we're using
Langraph, Python, and I will also be
using bright data to access real time
web data. Either way, even if you don't
want to build a web-based agent, you can
still learn a lot from this video
because I'm going to build really a
pretty complex agent and show you how to
structure that in Python. So, by the
end, you're going to have a fully
functioning project and the skills to
build powerful production AI systems.
So, with that said, let's dive in and
let me show you a quick demo of the
project. So I'm in PyCharm here and I'm
just going to give you a quick demo of
what the finished project will look
like. Keep in mind that you can adjust
this a ton and this is really meant to
give you a solid base. Anyways, let's
have a look at what we can do. So as you
can read here, this is a multissource
research assistant, but you can use it
for a lot of different tasks. It says
ask me anything. So I said, should I
move to Dubai from Canada? And then what
we do is we start in parallel searching
both Google, Bing, and Reddit. Okay, so
we can actually do this at scale and
search for like thousands of different
results. So you can see we searched Bing
for this, we searched Google for this,
we searched Reddit for this, and then
what we start doing is waiting for the
results. So essentially what Bright Data
is going to be doing is it's going to be
pulling all of these results into a
large snapshot for us. See, we get all
the results here. We download it and
then we're able to actually, for
example, parse through all of the Reddit
posts. In this case, I just did 75 of
them. We analyze the Reddit post. We
find the titles that actually make sense
for our search query. We then go, we
find those Reddit uh posts, sorry. We
download all of the comments from those
two Reddit posts. You can see right here
it says it downloaded 26 Reddit
comments. We then take all of that data
and we analyze it. So we analyze the
Google results, the Bing results, and
the Reddit discussions. And we
synthesize all of that data into one
final answer. And then it gives us the
response right here. So it says, you
know, taxfree income, cost of living,
lifestyle, and environment. And if we
scroll over here, it tells us where it
actually got this data from. So from
Google and old assets. Here it found
this from Reddit r/duby and it gives us
kind of all of these sources and quotes
where it found this information from.
Same thing I asked it should I buy AMD
or Nvidia stock? Same process. It goes
through finds all of this relevant web
data finds all of these different posts
and then as we scroll down you can see
here in this case there was a lot more
comments that it analyzed and then it
gave us this comprehensive response
breaking down whether we should invest
in Nvidia or AMD. And ultimately it
gives us a conclusion here and says
Nvidia appears to be a more
straightforward choice for those
prioritizing immediate and robust growth
in AI who favor it strong financial and
marketing position. Okay, so there you
go. And then I just stopped the agent
here which is why we got that kind of
interrupt message. So overall this is a
very powerful agent and the interesting
thing about this is the way that we're
going to write this is it's very
scalable. So I could run this agent
thousands of times in parallel. I could
scrape thousands of different posts and
nothing really changes here. You'll see
how we do that later on. But because of
the technology that we're using and the
way that we write this, we really can
run this actually in production in a
scalable format and it's going to be
pretty fast to actually execute. It
doesn't take a very long time due to the
services that we're using. Anyways, with
that said, I want to quickly talk a
little bit more about search because
that's a really important part of what
we're doing here. And then we'll get
back onto the computer and start coding
all of this out. So, now that we've
looked at a quick demo, I want to
briefly touch the current problems that
exist with search because that's what
this agent is going to do. it's going to
be searching the web. Now, most AI agent
systems today share a pretty major
limitation. They can't access the full
range of data that's actually relevant
to the problem. So, relying on a basic
search API, or just a single plug-in
means that you're usually only seeing a
fraction of what's actually available on
the public web. So, important sources
like live social media sentiment, real
time crawling, or historical trend data
often go untapped. Now without them the
results are incomplete and the decisions
these agents make can be based on
outdated information and opportunities
are going to be easily missed. Now on
top of that many setups require
stitching together multiple APIs that
don't work smoothly together and they
leave the agent with a narrow or
fragmented view of the entire world kind
of based on the web. Now with that said
that's why for this video I mentioned
we're going to be using Bright Data. Now
Bright Data has been a long-term sponsor
of this channel. I've worked with them
on many videos now and what they have
here is a web discovery API that
provides a much easier way to access a
wide range of public data. So rather
than just basic search results, it can
pull all kinds of information like live
SER data from engines like Google, Bing,
and other search engines. Realtime
sentiment from platforms like Twitter,
Reddit, and Tik Tok. In this video,
we're going to be scraping Reddit, for
example. They have historical web data
that goes back years. and then insights
from answer engine such as perplexity,
Gemini, chat, GBT, etc. Now, that's
because Bright Data handles the
crawling, parsing, and does this
reliably in one unified API so that we
can focus on building the agent rather
than trying to set up these really
complex web scrapers. Again, I've used
sprite data many, many times in the
past. Essentially, just a smarter,
easier way to be able to scrape the web.
And we're going to integrate that tool
into our agent here in this multi-step
kind of orchestrated flow so that our
agent essentially grabs all of this real
web data based on what we're asking for,
analyzes it using models like chat GBT,
and then gives us a really competent
response that's kind of followed the
strict protocol that we've set up. So
with that said, let's get onto the
computer here. Let me start explaining
the architecture of our agent. and we're
going to start kind of scaffolding this
out, building it, and then going step by
step because I won't lie to you, there
is a lot of code here. And also, I'll
quickly mention that all of the code for
this video will be available from the
link in the description. Anyways, let's
dive in. So, we're back on the computer
now and we're going to start coding this
out. Now, first I want to quickly
explain the Langraph architecture that
we're going to use for this agent just
so you can understand what it is that
we're about to build. And with that
said, if you're unfamiliar with
Langraph, I do suggest you have some
background with that before following
along in this video. Otherwise, it might
be a little bit confusing. So, I'm going
to put a video on screen right now that
explains Langraph in depth that you can
follow along with to get some context
before going into more of a complex
project. Okay. So, anyways, we're going
to be using Langraph and essentially
what's going to happen is the following.
The user is going to ask some kind of
question and we're going to
simultaneously in parallel go and search
Google, Bing, and Reddit for information
based on their question. Now, in theory,
we could search a lot more sources as
well, but I'm just keeping it a little
bit slim for this video, so it doesn't
take us 25 hours to code this out. Okay.
Now, after that, what we're going to do
is we're going to wait to analyze the
Reddit posts. The reason for this is
that the Google results and the Bing
results are extremely fast because
Bright Data already has them indexed.
So, we get them in like a few seconds.
Whereas the Reddit post can take a
little bit longer because we're doing
manual web scraping. Well, Bright Data
is doing that for us and pulling all of
the relevant posts. So, in this case, we
have to wait for the Reddit post. Takes
a second. And then once we get the
Reddit post, what we're going to do is
retrieve all of the posts that are
actually related to the prompt that we
passed in. So, we're going to search
Reddit. It's going to return a bunch of
different results for us. Then we're
actually going to analyze those Reddit
posts. We're going to pull out the ones
that make sense. And then from those
posts, we're going to retrieve all of
the comments on those particular posts.
Hopefully that makes sense. But that's
kind of these two steps here. Now, after
that happens, we're then going to
analyze all of the results that we got.
So, we're going to analyze the Google
results, the Bing results, and the
Reddit results. This is so that we cut
down the amount of information that we
have before we synthesize that all
together into one larger prompt. So we
have these three kind of smaller prompts
that are focused on pulling information
that we need from each source. Then we
take all of that and we pass that to
kind of a synthesizer which is then
going to take all three results from
here and synthesize that into one final
answer which we'll end up getting here.
Okay. So this is kind of the
architecture or the graph that we're
going to build. And of course, when I
say lang graph, that's referencing this
graph, right? Like we're building this
graph essentially where we're flowing
data through this and eventually getting
this final answer. Okay, so that's what
we're going to build. So what I'm going
to do now is go over to PyCharm and
start setting this up. Now, for this
video, you can use any IDE that you
want, but I do typically recommend
PyCharm for larger Python projects,
especially when you're working with AI
or modules like Langraph, Langchain,
etc. And I do actually have a long-term
partnership with PyCharm and you can
check them out and start using it for
free from the link in the description. I
always recommend at least try it. If you
don't like it, you can switch to
something else. But personally for me,
it is my favorite for larger Python
projects and it is literally designed
for Python, hence the name PyCharm.
Anyways, let's get started here. So,
what I've done is I've opened up a new
folder and I've just called this AI
search agent. You can call this anything
you want. And then from this folder,
we're going to initialize a new UV
project and create our virtual
environment in Python. So in order to do
that, I'm going to type UV init and then
dot in my terminal. This is going to
initialize a new UV project. And from
here, we're going to install the
dependencies that we need. So I'm going
to type uv and I'm going to add lang
chain.
Okay. and then lang graph and then we're
going to install lang chain and then
dash open ai because we need to use gpt
here and then we're going to install
python env. Okay, these are the four
dependencies that we're going to need to
install. So let's go ahead and press
enter here and they all get installed in
our environment. Now feel free to use
pip or any other virtual environment
that you want. But if you want to use UV
and you're not familiar with it, I'll
leave a video on screen that teaches you
how to use UV because it's kind of the
standard now. It's very fast for
managing dependencies and environments
in Python. Okay, so now that we've got
UV installed, we're going to start
setting up our project. So, there is
quite a bit of setup and there's going
to be a lot of code here. If at any
point you're getting lost or you just
want to copy something that I'm writing,
you can do that by clicking the link in
the description. There'll be a GitHub
repository that contains all of the code
for this project. So, what I'm going to
do now is I'm going to make a new file
inside of this folder called env. Now,
this is going to store some environment
variables that we need. I'm just going
to ignore this for right now.
Specifically, our OpenAI API key and our
bright data API key, which we're going
to get later on or in a few minutes. So,
we'll just start by writing the
variables that we're going to need. So,
first is going to be bright data
API_key.
Okay, we'll fill that in later. And then
next is going to be the open AI API_key.
And again, let's spell open correctly.
We can get that later. Okay, so we have
our environment variables defined. Now,
we're going to go into our main.py. py
file. Just create a new one if you don't
have one here and we'll start writing
some code. Now, PyCharm is also
prompting me just to configure my
interpreter. So, let me just select the
correct one and then I'll be right back.
All right, so I've configured the
correct interpreter and now what I'm
going to do is start scaffolding my
project. Now, when I say scaffolding,
what I mean is I'm going to write all of
the functions and logic that we'll
eventually implement. But for now, I'm
going to just kind of connect it
together so that we understand the
architecture and the flow that we're
going to follow along with. And then
once we have that, it's a little bit
easier to go write each individual
function. This is typically how I plan
larger projects. So I'm going to kind of
walk you through my thought process in
this video and you'll see how we plan it
out. So first we're going to say from So
let's do this correctly. Env
import load.env. This is going to allow
us to load in the environment variables
that we've defined in this. Envile. I'm
going to say from typing and I'm going
to import annotated. Okay. I'm gonna say
from
lang graph and then this is going to be
dotg graph. I'm going to import the
state graph and the start and the end
nodes. Now again if you're not super
familiar with lang graph essentially
this allows us to build a graph which is
a bunch of different nodes that are
connected to each other and to flow some
state or some data through that graph
where each kind of node in the graph can
modify or update that data. So we're
going to have some state which is going
to store all of the information that our
agent needs to have access to. And as we
run through these different stages or
nodes in our graph, we'll be populating
that state where then at the end of our
graph, we have this final answer which
we can present to the user. So it's kind
of a really unique way to build AI
agents in a bit more of a kind of
predictable flow. Rather than just
giving a set of tools to a model and
letting it go crazy, we actually kind of
walk through this manual process that's
a lot more consistent where we update
this state kind of stage by stage. Okay,
so next we're going to say from lang
graph and this is going to beg graph.
Okay, dot message we're going to import
add messages. We're then going to say
from langchain dot and this is going to
be chat_models.
We're going to import the init chat
model which is a really quick way to
initialize an LLM in langraph. We're
then going to say from typing extensions
and we're going to import the typed
dictionary. Okay, these are just some
typings that we need in Python. And then
we're going to say from pi dantic. Okay,
and we're going to import the base model
and the field. And then we're going to
say from typing import list. And
actually I realized we can just put this
up here because we need list and
annotated and they come from the same
package. Okay. So that's most of our
imports. Later we'll import a few other
things but for now we can start with
this. And then we're going to call this
load.env function. When you call.load
env
file and it loads these variables for us
that we can start using them inside of
our Python code. Okay. Now, next step,
we're going to say llm is equal to init
chat model. And for the chat model,
we're going to put the name of the LLM
we want to init, which in my case is
just going to be GPT40. Now, you can put
pretty much anything that you want here.
You just need to make sure that if you
put a different model from another
provider that you pass the correct API
key in this file. So, automatically when
we try to load GPT40, Langraph is going
to look for the presence of this OpenAI
API key variable and then use that as
our API key. Okay, so we just put GPT40.
You can put a newer model, whatever you
want. Just make sure you have the
correct API key. Okay, so now that we've
done that, what I want to start with is
actually writing out the state that
we're going to pass through our graph.
When we have the state, we'll kind of
understand the data that we need to come
up with and find. And then I can start
creating this graph and making the
connections between these nodes. And
then we can start actually kind of
populating the graph by writing the
different implementations.
So for now we're going to say class
state and this is going to inherit from
the typed dictionary. Okay. Inside of
here we're going to start by having a
list of messages. Now these messages are
essentially the messages that our user
is sending into this graph that we'll
then process and start getting the
information for coming up with an answer
for. So we're going to say messages is
annotated and then this is going to be
list and add underscore messages. When
we do this, we mean that okay, messages
is type list. And when we want to add a
new message, we call this add messages
function, which will uh essentially
modify the messages for us. Okay. Next,
we're going to have the user question.
And this is going to be string or none.
Okay. Then we're going to have the
results. And this is going to be string
or none. And you'll notice that all of
these are going to be or none because at
some point in time, we may not have
these results. and then later we'll
populate it. So then next we're going to
have the Bing results. This is going to
be string or none. Then we're going to
have the Reddit results. Again, string
or none. Okay. Then we're going to have
the selected_reddit
URLs. This is going to be list of type
string or none. Now, the reason why I
have selected Reddit URLs is because
we're going to get a bunch of results
from Reddit. And then we're going to
pass these results to an LLM where it's
going to select which of these URLs we
actually want to process further just to
avoid us looking at data that we don't
need. Then we're going to have
Reddit_post
data which is going to be the data for
those selected URLs. This is going to be
list or none. We are then going to have
the Google_analysis.
This is going to be string or none. This
is the kind of LLM analysis. After we
get our results, we're going to have the
Bing analysis which is string or none.
And then the Reddit analysis which is
string or none. And then finally the
final answer which again is string or
none. Okay. So this is essentially our
state. This is what we're going to be
flowing through the graph. And we'll
start populating these one by one as we
kind of go through all of the nodes that
we have in our lang graph. All right.
All right. Now, that's great. What we
need to do next is we need to start
defining the nodes that we have in our
graph. So, if we look here, this is what
we need, right? We need to do Google
search, Bing search, Reddit search,
analyze the Reddit post. All of these
are essentially just functions that need
to execute in some order. So, what we're
going to do now is we're going to write
all of these empty functions without the
implementation. And then later, we're
going to implement all of these
functions. This way we can build this
flow and then later we can go and we can
actually add all of the different pieces
and kind of test it step by step. So
what I'm going to do is I'm going to
start defining a bunch of functions. So
the first function that I'm going to
have is going to be called Google
search. So I'm going to say define
Google search and then for all of these
functions what they take as a parameter
is simply the state. Okay, so we say
state and then that's going to be equal
to this right here. they take in this
state and then what they're going to do
is return something that modifies the
state. So for right now, we're just
going to say return. We don't need to
return anything right now. And we're
just going to keep making a bunch of
functions with all the different
operations that we need to perform. So
after we have Google search, then we
have the Bing search. Okay, that's the
next function or the next note. Then
we're going to have the Reddit search.
And again, we're just going to populate
all of these and then finish them later.
Okay, then we're going to have analyze
Reddit posts. So let's go analyze
Reddit posts. Okay. Next function is
going to be retrieve Reddit posts. So
let's change this. Retrieve Reddit
posts.
Okay. And I'm just going to add some
spaces between here because you can see
PyCharm is kind of linting this and
telling us that we should have two
spaces between our functions. So that's
fine. After this, we're going to have
analyze Google results. So analyze
results. We need to analyze the Bing
results as well. So in fact, let's just
copy this function
and paste this down here. Analyze
Bing results. Okay. Then we need one
more for analyzing the Reddit results.
Okay. So Reddit results are a little bit
different than the Reddit posts. So
we're going to have Reddit results and
then we're going to synthesize this. So,
we're going to say synthesize_analysis
like that. And then this is going to
actually take all of the results that we
had here and synthesize it into one
larger result. Okay. So, we have a bunch
of functions. We're almost done with
functions. What we're going to do now is
we are actually going to create the
graph where we connect these nodes
together. So, again in Lang graph
essentially we have nodes. Nodes are
really just operations or functions. So
we've defined now what all of our
operations are going to be. What we need
to do though is we need to connect them
to one to one each other. Sorry. So we
know that we start for example with
Google search then we do the Bing search
then we go to Reddit search then we
analyze this and we need to just create
these connections or create the graph.
So what we're going to do is we're going
to say our graph builder is equal to and
this is going to be state graph. And for
the state graph we simply pass our
state. Okay. And now that we have that,
what we're going to do is we're going to
add all of these nodes into our graph.
Now, the way that you do that is the
following. You say graph_builder
add node. And then what we're going to
do is give the node a name. So we're
going to say something like Google
search. And then this is going to be
pointing to the Google search function.
But make sure you don't call the
function. All right. Now, this is a
little bit tedious, but essentially you
just need to give a name to every
function or every node that we have. So
you see like the next autocomplete is
we're going to add the Bing search and
that's going to be the Bing search
function. We have the Reddit search
that's going to be the Reddit search
function. And I'm just going to keep
going here and using my autocomplete. So
we're going to have analyze Reddit post.
That's going to be to analyze Reddit
post function. Analyze Google results.
And you get the idea. And I'm just going
to go through here and analyze results
results is definitely not the name of a
function, is it? Um okay. If that is
then I definitely spelled something
wrong. So we don't want analyze results
result. We want analyze Reddit results.
Okay, so let's fix that. And let's go
here to analyze
Reddit results and analyze Reddit
results. Okay, good job. We caught that.
And then we have the last one, which is
the synthesis. So, let's add that here.
Okay, it looks like we're actually just
missing one here, which is retrieving
the Reddit post. So, let me add this.
We're going to say graph builder.add add
node and this is going to be be retrieve
okay Reddit post like that and then
we're going to fix this to be
retrieved_reddit
post. So we analyze the Reddit post we
retrieve the Reddit post and then we
have the rest of them here which I think
is all good. Okay. And ignore the yellow
highlight. We're going to fix that later
on. It's just because we're returning
none from these functions. All right.
All right. So at this point, what we've
done now is we've created the nodes. So
they exist, but they're not yet
connected. So now that we've created
them, we need to connect them to one to
one each other. Sorry. So the way that
we do this is we can say graph_builder.
And we can say add edge like this. And
an edge is a connection between the
nodes. And we're going to connect the
start to the first node, which is going
to be our Google search. So what this
says is okay when we start the graph the
first thing that we're going to do is
we're going to go to Google search. Now
what we're going to do is we're going to
connect the start to multiple of these.
So we're going to connect to Google
search as well as Bing search as well as
the Reddit search. So this way what will
happen is at the exact same time we'll
execute all three of these operations.
So we'll run them in parallel. So as
soon as we start or as soon as the user
gives us some message or some request,
we searched on Google, we searched on
Bing and we searched on Reddit at the
same time. Okay, cool. So next thing
we're going to do here is we're going to
connect the next steps. So after Google
search, what do we do? After Bing
search, what do we do? You get the idea.
Okay, so in order to do this, we're
going to say the graph builder added.
And the first edge that we're going to
add might seem a little bit weird, but
from Google search, we're actually going
to connect the analyze Reddit posts.
Okay, let me make this a little bit
smaller so you guys can see this. Now,
the reason we're doing this is because
if we follow our architecture, right,
after we do all of the searches, we need
to wait a second to get all of the
Reddit results before we can move any
further. So, all of these connect to the
analyze Reddit post, which is what we're
doing right now. Then, we'll retrieve
the Reddit post. then we'll go and do
all of this analysis. Now, there's
actually some ways that we can make this
a little bit more efficient, but for
right now, this is just a simpler
architecture that I want to follow. Um,
you probably know what I mean if you're
thinking like, okay, how can I make this
a bit more efficient? But for right now,
I don't want to make it too complex. So,
we're just going to go with this. Okay,
so after the Google search, we go here.
After the Bing search, we're going to go
here as well. So, we kind of wait at
this stage. And then, of course, after
the Reddit search, we need to go here,
too.
Okay, so we've now made this next
connection. So we have start to Google
search, start to Bing search, start to
Reddit search, and then from each of
these we wait at the analyze Reddit
posts. Okay, now after we analyze the
Reddit post, what we need to do, so
let's put one more in here, is we need
to retrieve the Reddit post. So we're
going to go analyze Reddit post to
retrieve Reddit post and add that edge
between those two. All right. Now, the
next thing that we're going to do is
we're going to say graphbuilder.add
edge. Let's get rid of all of this. And
we're going to start now from the
retrieve Reddit post. And after the
retrieve Reddit post, what we're going
to do is we're going to start analyzing
all of our results. So we're going to
say analyze. And then the first thing
that we're going to do is the Google
results. So underscore Google results
like that. Let's spell analyze
correctly. Okay. Now let's copy this and
go down here. Now, after retrieve Reddit
post, we are also going to go and
analyze the Bing results. Okay? And then
we're going to go one more down here.
After we retrieve the Reddit post, we're
also going to analyze the Reddit
results. All right? So, hopefully this
is making sense. But again, if we go
back to the architecture, we wait for
the Reddit post, we retrieve them. Then
after we retrieve them, we go and we
analyze the Google results, the Bing
results, and the Reddit results. That's
what I've just written, right? We go
Google, Bing, Red. So, analyze those
three at once. Okay. Now, after those
three, what we need to do is we need to
synthesize our analysis. So, we're going
to do this again. We're going to add
another edge. Now, we're going to start
from analyze Google results. And the end
key here is going to be the synthesize
analysis.
Then, we're going to copy this. And I'm
just going to copy it twice because
we're going to need it here. After we
analyze the Bing results, we're also
going to go there. And after we analyze
the Reddit results, we're going to go
here as well. Okay, so those three then
connect to this next node where we're
synthesizing everything. And then
lastly, we add one more. So we say graph
builder added edge and we're going to
say from the synthesize analysis, we're
going to go to the end. Okay, so this is
how you set it up. You need to always
have a start key, which we do right
here, and an end key or an end node. And
then this creates this graph that you
just saw in this diagram. Okay. Now,
after we create the graph, what we can
do is compile it. So, we're going to say
graph is equal to graph_builder and
notbuild, but compile. This is going to
actually execute the graph for us. So,
we're able to run it. And then what we
can do is we can essentially pass a
message to this graph. it will run
through all of the different nodes. The
state will get updated and then we can
print out that state. So what I'm going
to do is I'm going to write the function
that would allow us to execute this
graph. And then of course before we can
do that, we need to start writing all of
these different functions. So we're
going to make a function here. I'm going
to call this run chatbot. Okay, what
this is going to do is start executing
our uh what you call agent so that we
can actually run through this graph. So,
what we're going to do is we're going to
do a print statement and we're just
going to say a multi- source research
agent like that. I'm going to print a
line and I'm going to say, you know,
type
exit to quit. Okay. And then back
slashn.
All right, that's good. And then what
I'm going to do is have a while loop.
So, I'm going to say while true. And
we're going to keep asking the user to
give us some input. So we're going to
say user input is equal to and then
we're going to say input I'm going to
say ask me anything colon like that
we're going to say if the user input.
equal to exit then print by and we can
break the loop. Okay. Otherwise what
we're going to do is we're going to say
state is equal to and we need to
initialize a kind of starting state. So
in order to do that, we could say state
is equal to the following and we can say
messages and then for the message we
just need to put the message that we
want the uh bot to reply to. So we're
going to say roll is user and we're
going to say content is the user input.
Okay, we're then going to say that the
user question is the user input. We're
going to say the Google So we need to
actually have this in a string. The
Google results is just equal to none.
And then we're going to do the same for
all the rest. So the Bing results is
equal to none. The Reddit results is
none. The Google analysis is none. The
Bing analysis is none. The Reddit
analysis is none. Final answer is none.
And then of course there's a few other
ones that we missed. So let's go here.
After Google results, Bing results,
Reddit results. We also have the
selected
underscore Reddit URLs. Okay, this is
going to be none. We then have the
Reddit post data. So let's do this.
Reddit_post
data and that is none as well. Okay. And
I think that's all of the state that we
need. So again, kind of at the
beginning, we need to initialize the
state that we're going to be passing
through the graph. So that's what we've
just done. We've plugged in the user
input. So the question that they've
asked us and then what we're able to do
is start running the graph. So what we
can do here is we can do a print
statement and we can say back slashn and
then something like starting parallel
okay and let's go here research
process
dot dot dot then we can do another print
and we can say something like launching
Google this is just for logging by the
way but I think it looks nice bing and
reddit searches dot dot dot okay I'll do
a back slashn here as well to kind of
separate this out. And then I'm just
going to print a dash*
80 uh just so that we get kind of some
separation here between what is
appearing. Okay, so actually I just
missed something. So up here what I'm
going to do is I'm going to say the
final state is equal to graph.invoke.
I'm going to invoke the graph with my
state. And then what I'm going to do
down here is I'm going to say if the
final state and then I'm going to say
get and this is going to be the final
answer. So if that does exist then I'm
going to print out the answer. So I'm
going to say print. Okay. I'm going to
do an fstring. I'm going to say back
slashn. I'm going to say final answer
like that. I'm going to say back slashn
again. And then I'm going to put the
final state. Okay. Get final answer like
that. and then we'll put a back slashn.
Okay, so essentially what I'm saying is
all right, we're going to invoke the
graph. This is how you invoke it. We
pass our initial state. Again, don't
worry too much about that. We'll fix
that later on. We say if the final state
does contain a final answer, then we'll
print out the final answer and it will
just print kind of a separation here so
that if we run this again, we know, you
know, which run was which. Then lastly,
we need to execute this function. So we
can say if name is equal to main, then
we can run the chatbot. And then that
actually would be a finished program
assuming that all of the nodes were
completed which of course they're not
and we're going to need to write. So let
me quickly just kind of summarize what
we've done here. I just want to zoom out
a little bit so you guys can read this.
Essentially we started with all of our
imports. We loaded the environment
variable file. We initialized our LLM
which we still need the API key for
which we're going to get in one second.
We created our initial state. Okay. And
then what we did is we kind of stubbed
all of these different operations or
nodes that we're going to have in our
graph. So Google search, Bing search,
Reddit search, analyze Reddit post,
retrieve the Reddit post, analyze Google
results, analyze Bing results, analyze
Reddit results, synthesize the analysis,
and then we created the graph. So we
added all of the nodes where we
connected functions to the node name.
That's essentially what we did here. We
then added all of the edges. So we
started with these three running in
parallel. We then kind of connected them
to this analyze Reddit post node. We get
the Reddit post here after we analyze
them. And then we go through the rest of
the flow until we eventually synthesize
all of the results. We then have this uh
while loop that just allows user to type
something in and essentially run it
through our graph. And that is the lang
graph component kind of done. What we
need to do now is we need to start
updating the state as we go through
these various nodes. So let's get into
that. And that's going to allow us of
course to start searching the web using
the SER API, all of that kind of stuff
that I'm going to show you. So for now,
because I want to be able to test this
step by step, I'm going to start filling
in some of the outputs that we're going
to have from these functions so that
even if they're not fully complete,
we'll be able to execute the graph and
kind of test the Google search first,
then the Bing search and see what
results we're getting. So from our
Google search function, I'm going to say
the user question is equal to state.get
and this is going to be the user
question or an empty string. Okay, so
because I have state in all of these
functions, I can pull out the state.
Then what I'm going to do is I'm going
to have a print and I'm going to say f
string. So, and this is going to be
searching Google for and then we'll do a
colon. So, searching Google for the user
question. Okay, I'm then going to say
the Google results
are equal to an empty list and later
we'll actually get the uh Google
results, but for now we'll just make it
an empty list. And then I'm going to
return the following which is Google
results is equal to the Google results.
Now whenever you return something from
these functions it needs to match what
you have in the state. So in this case
we have Google results matching with our
Google results. So this Google results
will get updated to be equal to whatever
this is. And then in the next function
we have access to these updated Google
results. That's how the state kind of
flows through here. you return a partial
update to the state from one of these
nodes and then it gets updated here
where it's continually passed to all of
the next nodes in the sequence.
Hopefully that makes sense. But that's
kind of the idea. Now for the Reddit
search or the sorry the Bing search,
it's effectively the same thing. So I'm
just going to paste this here and rather
than searching Google, I'm going to
search Reddit or not Reddit, Bing. I
keep messing these up. And then rather
than the Google results, this is just
going to be the Bing results. So change
this
change this and change this here.
Okay. And then for the Reddit search,
again, it's pretty much going to be the
same thing except just named Reddit. So,
we're going to paste this here. We're
going to say searching Reddit.
And then just update all these variables
to say Reddit.
Okay. Reddit and
Reddit like that. Okay. Now, we're going
to go to the analyze uh Reddit posts.
From here, essentially what we're going
to do right now is just return some fake
data. So we're going to say the
selected_reddit
urls. And for now, this is just going to
be equal to an empty list. Later, we can
populate that, but for now, that's all
we need. Now, same thing when we talk
about the Reddit post data, we're just
going to say return Reddit post data is
equal to an empty list. And then for the
analysis, same thing. We're going to
return. Okay. And this is going to be
the Google analysis. And for now, just
going to return an empty string.
And then we can do the same thing for
Bing. So return the Bing analysis empty
string.
Okay. Return the
Reddit analysis empty string and then
the final answer. So we're going to say
return
final answer.
Okay. And then empty string. just so
that all of these functions work
properly and they return the correct
format. Okay, cool. So, I just saved
this now and what we can actually do
just to test and make sure that the
logic is set up correctly is we can just
run this file. It should ask us to type
something in and it should just give us
no response. So, what I'm going to do is
just press on run here and we'll see if
we get any errors. It should just prompt
us to type something. So, it says ask me
anything. So just go hello and then you
can see that it just kind of gives us
this output and then doesn't say
anything and says ask me anything. So
that to me means that this is working
again type hello and you can see that it
just kind of doesn't give us anything
but we get some output here and that's
actually exactly what we were looking
for. So I'm going to stop this here
because it means the flow is working
properly. And I did notice that we have
a few spacing issues. So let me kind of
fix that here. Looks like yeah we
randomly printed out a quote. So, let's
get rid of that and fix the quote here.
And I think we're kind of good to go.
So, the next step is going to be to set
up our LLM as well as to set up Bright
Data to start doing these search
operations. I want to start by searching
Google, then searching Bing, and kind of
walk through these and do one search
operation at a time so you understand
how they work. So, what we're going to
do is we're going to go and get these
API tokens. So, we're going to create a
Bright Data account in an OpenAI
account. Let me go to my browser and
let's set that up. All right. All right.
So, let's start by getting our OpenAI
API key so we can use GPT and then we'll
get the bright data one. So, what we're
going to do is go over to
platform.openai.com.
From here, we can just go to our
settings, then to our API keys, and we
can create a new key. So, from here, I'm
just going to go with AI agent as the
name. Okay. And then obviously, you
don't want to leak this key. So, I'm
going to copy it and I'm going to paste
it here in my environment variable file.
Okay. And now that we have that, we need
the Bright Data credentials. So, if you
don't already have an account, you're
going to need to create a new one on
Bright Data. I'll leave a link below in
the description and you should be able
to get some free credits. So, you do not
need to pay to use this for the
tutorial. Now, I quickly want to show
you a few of the services that we're
going to use here from Bright Data
because they have a lot of options when
it comes to getting web data
specifically for AI agents. So, for
example, they have a chat GPT scraper,
right? where you can actually scrape the
conversations from chat GBT, the
responses, the user queries, etc. We're
not going to use that here because we
don't really need it for this specific
tutorial, but in other ones, it's quite
useful. Now, we also have, let's go
here, social media scraper. So, this is
the one that we're specifically going to
use to scrape Reddit data. You can also
get stuff from Facebook, Instagram, Tik
Tok, YouTube, which is notoriously very
difficult to scrape. If you've ever
tried to build your own scrapers before,
you've likely seen that it's very
complicated to actually get this data
and you get blocked by captas, IP bands,
uh, etc. Whereas Bright Data can
actually overcome and bypass all of that
for you and just give you the data in a
very easy format. So, for example, you
can get Instagram profiles, posts, X,
LinkedIn. In our case, we're using
Reddit, which I think makes a lot of
sense for this particular agent, but
obviously you can pick pretty much
anything you want. And then we have the
SER API or the search engine API where
we can really quickly scrape all of the
major search engines like Duck.go,
Google, Bing, etc. This also works for
things like Google flights, right? Uh,
and all of those other services that
come from those search engines. Yeah,
like maps, images, hotels. Pretty cool.
I've done some other projects in the
past where I've used this. And again,
for this one, we're just going to use
the standard kind of Google Bing search
engines. They also have things like a
web archive. So, for example, if I go to
the documentation here, you can see that
you can actually scrape all of the
previous web data. So, you can get like
years back and you can kind of see
trends and historical data. Again, not
going to use that for this video, but we
could add that if we wanted to make it
more complex. Okay, so for now, we need
to make a new account or log into our
existing account. So, go to the link
that I have in the description. I'm
going to log in because I already have
an account here. For you, you are likely
going to create a new one. So, once
you've signed into your account, you
should be brought to a page that looks
like this. They actually recently added
this feature where you can just ask the
AI here and it can tell you how to do
what you want to do. Uh I'm not going to
use that though. What I'm going to do is
go to proxies and scraping from the left
hand side here. And what we're going to
do is create a new SER API. Now of
course there's a lot of other features
here as well like web scrapers. We'll
use this later to actually collect the
Reddit comments and the posts as you can
see that I was kind of doing already.
But for now, we go back to proxies and
scraping and we're going to go to add up
here and we're going to create a new SER
API, search engine API. Okay. So, press
on this. From here, we can give it a
name. I'm just going to call this AI
agent uh two because I already have one
called AI agent. Can give it a
description if you want. And then in
this case, I'm just going to leave this
at standard, but you could go maximized
if you care about actually retrieving
the ads. Okay. There's a few advanced
settings as well. We don't really need
to modify any of those currently. And we
can just go ahead and press on add
again. You should have some free credit.
So this should be free to use for you.
And then later obviously you can pay for
it if you want to use the service. Okay.
So I'm going to go yes create this new
zone. Once this is created it should
give us access to an API key which we
can actually see here and show us how we
can call this API. So notice here we
have method API and then we can do for
example Python and it gives us an
example of how to call this. What I'm
going to do for now is I am just going
to copy the API key which is right here.
And we're going to take that and put
that into our uh file into our
environment variable file. So let's go
here and paste the API key. And
obviously don't leak that to anyone. I'm
going to delete that after this video.
And then we'll be able to start using
this service, the search engine API. Now
if you go to the playground, you can
actually mess around with it here and
you can uh kind of test out your
different searches. So for example, we
could search all of these different
engines here. We can choose the keyword
that we want to search. And then there's
a bunch of other information that we can
add. So we can search a specific Google.
So dot, you know, France.AE
dot whatever, right? And then we can add
all these other settings like do we want
to look on desktop or mobile? Do we want
to add specific headers? Do we want to
actually get a page nated response? Do
we want geoloccation? There's all these
different parameters that we can add.
And there's some examples that you can
view here as well on exactly how to do
this. In our case, it's going to be
pretty simple. We're just going to
search Google. Um, and that's kind of
it. So, what we'll do from now is we'll
go back to main.py. And I'm actually
going to make a new file where I'm going
to call this webcore operations. And
inside of this file, which will be a
Python file, I'm going to start
implementing all of the operations
related to the web scraping and to using
the bright data service. So, inside of
here, we're going to start with just our
basic search. So, we're going to be
searching Google. But in order to do
that, I'm going to set up some reusable
functions that will make our life a
little bit easier in the future because
we'll be sending quite a few requests
over here to the bright data SER API. So
what I'm going to do is I'm going to say
from enenv import load.env again because
we need to import our environment
variables. I'm going to say import OS
import request
and I'm going to say from
URL lib.parse parse import the quote
underscore plus which is going to allow
us to turn a normal string into a string
that we could include in a query
parameter in our URL which you will see
in a minute. For now, I'm going to call
load.enb to load theenv function. And
I'm going to make a simple function here
called make API request which will just
be a reusable function that we can use
anytime we want to send a request to
bright data so we automatically can
include the correct headers for our
authentication. So I'm going to say
define_make
API
request like that. I'm going to take in
the URL and starst star quarks like
that. Here I'm going to say API_key is
equal to os.get env. And we're going to
get the bright data API key like that.
Okay. So we're going to get the API key
and we're going to create a set of
headers because we need to send these
headers to tell bright data who we are.
So we're going to say authorization is
equal to fstring and then bearer space
and then our API key and then we're
going to say the content dash type is
going to be the application JSON because
that's what we want to get back. Okay.
From here we're going to do a simple try
accept block where we send a request to
whatever the URL is that was provided
here. So we're going to say try and this
is going to be response is equal to
request.post.
We're going to post to the URL with
headers equal to headers like that and
pass our star star quarks. We're then
going to say response.rafor
status. What this means is we're going
to raise an exception if we don't get an
okay status. And then otherwise we're
going to return the response.json.
Okay. Then we're going to say accept and
this is going to be request.exceptions
exceptions dot and this is going to be
the request exception as E and we're
going to say print like this an F string
and we're going to say API
request failed and then we'll put E
inside of parenthesis or inside of
braces and then we can return none
because this didn't give us a response.
Then we can have another exception. So
we can say except any general exception
as E. We can say print f unknown error
and then we can print out e and again
return none. I'm just doing some more
advanced exception handling here just so
that if it's related to the network
request we can handle that. If it's not
related to the network request then we
deal with it here. So we kind of know
which exception or what error was
actually causing the problem. Okay. So
now we have a general function that can
send a request to brightite data. What
we need to do next is implement our SER
function. Okay. So we're going to say
define and this is going to be SER
search. And what this is going to do is
take in a query and an engine which by
default right now is going to be equal
to Google. Now I'm going to write this
kind of dynamically because this will
allow us to actually search any search
engine that bright data supports. So
something like Bing, Google. So we can
reuse this function multiple times. All
right. What I'm going to do is I'm going
to say if the engine
is equal to Google, then I'm going to
say the base URL is equal to
https/google.com/arch.
And this needs to be ww.google.com/arch.
We just add a new line here. We're going
to say l if the engine is equal to bin
or not bin, bing. Then the base URL is
https www.bing.com/ bing.com/arch
and then else we're going to raise an
error and we're going to tell them hey
this engine is not supported. So we're
going to say raise value error and then
unknown engine and then whatever engine
they passed. Okay. Now we're going to
say the URL is equal to https slash and
this is going to be api.brightdata.com/
request. Okay, because this is where
we're going to send the request and
we're going to pass essentially the
search URL that we want to search and
get the data back from. Now, we're going
to say our payload is equal to zone and
the zone is going to be the name of the
zone that we created, which is AI agent
2. So, if we go back here and we look at
our overview, we should be able to see
the zone name. You can see it's right
here, AI agent 2. Okay, so that's our uh
zone name and you can kind of see the
information down here as well. So,
anyways, we need to pass the zone. We
also need to pass the URL. So the URL is
going to be the following. We're going
to put an F string. We're going to put
our base URL, which is the search engine
that we want to search essentially. And
then we're going to say question mark Q
is equal to and we're going to say quote
unl.
Now what this is going to do is it's
going to take whatever the user typed
in. That's going to be our search
string. It's going to turn it into a
format that we can actually pass
correctly in a query parameter for this
URL. And then we're going to put an and
we're going to say BRD_JSON
equals 1. Now what this means is bright
data JSON enabled. So essentially we
want to get our responses back in JSON
format. So Bright Data is actually able
to parse all of the responses from the
search engines and then return it to us
in a digestible format. In this case,
JSON. And then we're going to say the
format is raw like that. Okay. So this
is the payload which is essentially how
we send this search request. This will
hit the search engine API and then it
will give us back a response. So what
we're going to do here is we're going to
say the full underscore response is
equal to underscore make API request.
We're going to pass our URL right which
is the URL right here. So that's where
we're sending the request to. And then
we're going to say actually the JSON is
equal to payload which will be another
query parameter that we pass there along
with our request. We're going to say if
not
full response then we're going to return
none. Otherwise what we're going to do
is extract data out of this response. So
the bright data response is going to
give us a ton of information. It's going
to give us the sponsored post, the
organic post. It's gonna give us a bunch
of stuff, but we only care about a few
sections of that response. Now, if you
want to look at this response, you can
just mess with it right here. You can go
to the playground, right? And we can
kind of run this request and see the
result that we get. But what I want to
do is I just want to pull out a few
pieces of information. So, right, it's
giving us the full kind of preview of
the page. We can actually look at the
JSON format and you'll see it has like
general, input, navigation, it has all
of these other fields, right? It's a
very long response that we get, but I
only care about part of the response.
Now, the part of the response that I
care about are to the organic results
and the knowledge that Google pulls
here. The knowledge is like a quick
summary of the uh information that you
search for that you've probably seen
before if you've done, you know, a
Google search. So, what I'm going to say
is extracted data is equal to this and
I'm going to say knowledge
is equal to the full response.get
get and I'm going to get the knowledge
field. Now, if that doesn't exist, I'm
just going to get an empty set of braces
or an empty dictionary. Then I'm going
to get organic and this is going to be
the full response. And then same thing,
organic except here, this is actually
going to be a list. Now, the reason I
know this is because before the video, I
was obviously preparing. I looked
through the response structure and these
are the two fields that I care about. If
you want to see the entire response
structure, then feel free just to print
it out and you can see all the
information that it gives you. Here,
we're just narrowing down the data. that
we only get the important stuff. But if
you wanted all of the data or something
else for a different use case, then of
course you can get that from this API
request as well. And what's interesting
here is that you can run this as many
times as you want. It's very scalable.
So you can run this, you know, hundreds
of times, thousands of times with
different requests. And it returns very,
very quickly because it's already
indexed by bright data. So I'm going to
return the extracted data here from this
function. And then that should be it for
this first function where we're
essentially just calling this kind of
search function. All right, so that's
pretty much it, at least for right now.
What I'm going to do is I'm going to go
back to main. I'm going to import what
we just wrote and then I'm going to call
that from one of our functions so we can
test it out and make sure it works. So
I'm going to say from web operations
import and we're just going to import
the SER search. So now where we go to
Google search the Google results are
actually going to be SER search and then
the user question and we're just going
to say the engine is equal to Google and
then that should give us back the Google
results. Okay. So for now what we can do
is we can just print the Google results
so we can see if we're actually getting
anything at all. And while we're at it
we might as well just do the same thing
for Bing because it's going to be the
same thing just with a different engine.
So let's go SER search. We'll go user
question engine is equal to Bing and
then same thing we can print the Bing
results and we can do an initial test
here to see if this is working. Okay, so
let's run the file and let's go invest
in video. Okay, and it says searching
Bing, searching Google. Wait a second
and then we should get the results. And
you can see no knowledge popped up for
this one. That's okay. And then for
organic there's a bunch of links, right?
So it gives us kind of all of this data
popping up related to those results and
you get descriptions. So I start
thinking of Nvidia stock etc etc. Now of
course there's a lot more stuff that we
can extract from here but for now that
is good and it gives us kind of the top
results on Google and we can read
through the descriptions the links etc.
Okay, so let's exit out of that and
let's continue because now we have the
SER API functioning. And I'm just going
to remove the print statements because
now that we know we're getting the
correct results, we don't really need
anything more. All right, so we have
Google search, we have Bing search. Now
what we want to do is we want to
implement the Reddit search. So for the
Reddit search, it's a little bit
different and that's going to require us
going to web scrapers. So from here,
we're going to go to new. We're going to
go to browse scraper marketplace and
then we're going to search for Reddit.
Okay, so it's going to take a second.
We're going to press on the result for
scrapers and then this is going to give
us a few options that we can use to
actually scrape Reddit and get the real
lifetime data. Now you can build your
own scraper if you want, but a lot of
them that you need are already built for
you and you can just call them like an
API which makes it very easy to actually
download and get the data quite quickly.
And in our case, there's two main things
that we want to use here from Reddit. We
want to discover posts by their keywords
because we want to search essentially on
Reddit. And then after we search, we
want to get all of the relevant post
URLs and download all of the comments.
So you can see we have Reddit's comments
collect by URL. Then we have Reddit post
discover by keyword. So we're going to
use both of them. First, we collect the
post. Then we get the comments from the
post that we care about. So let's go to
Reddit post here. It's going to say
scraper API. So I'm going to go ahead
and press on next. And it's going to
create this scraper for us. Now here we
have collect by URL, collect by keyword,
collect by subreddit URL, right? Like
and we can get the comments as well. And
we can kind of run through this and see
how we use this scraper, all of the
fields that we can pass to it and all of
the fields that we'll get back. So it
shows us what the response structure
looks like. And then if you go to the
API request builder, it shows us how to
build this API. Now our API key will be
the same. The thing that's going to
change is the URL that we need to hit
here. And this is where I want to go in
and talk about our management API. Okay,
so when we use this um scraper,
essentially what's going to happen is
we're going to send a request and then
bright data is going to go using its
scalable network and start collecting
all of that data for us. Now, it's not
going to be available instantly because
it needs to actually access it in real
time from Reddit. So what's going to
happen is we're going to create
something called a snapshot. Now the
snapshot is going to be generating. So
when we first send a request, it's going
to take a second. It's going to start
generating. So what we need to do is we
need to monitor the progress of this
particular snapshot and wait until it's
ready. Now, as soon as the snapshot is
ready, we can then download the snapshot
and we can access the data, but we need
to wait for it to be finished. So
essentially, there's these multiple API
endpoints that we're going to hit. The
first endpoint is going to be to
actually start the collection process.
Then we're going to use this monitor
endpoint to wait for when the snapshot
is ready. And then as soon as it's
ready, we're going to download the
snapshot. So we need to kind of write
some code in Python here that is going
to allow us to do this process where we
hit the API, we wait for it to be ready,
and then we download the snapshot. Okay,
so I'm going to start writing this out.
I've written it based on myself reading
the documentation here, and you'll kind
of see how it works as we code this out.
Again, we just need to make this scraper
to start, and then we need to get access
to the data set ID, which I'm going to
show you in one second. So in the left
hand side here, we're going to go to
discover by keyword. We're going to go
to the management APIs and we're going
to scroll down here until we see this
data set ID. Okay, so we need to copy
this data set ID because this is
something that we're going to need to
use when we actually perform this
scraping operation so we can identify
what data set we're talking about. All
right, so we have that data set ID. What
I'm going to do for now is I'm just
going to put it in a comment in my web
operations file so I don't forget it. So
I'm going to say data set ID is equal to
that. And now I'm going to start writing
the Reddit search function. Okay, so I'm
going to say define Reddit search. This
is going to take in the following. It's
going to take in the keyword that we
want to search for. It's going to take
in the date, which in this case I'm
going to say is all time. It's going to
take in the sort by. I'm going to sort
by hot posts, but you could sort by
rating or um you know up votes or
whatever you want. So let's go hot. And
then we're going to say the numbum of
posts. And in this case, I'm going to do
75. You can do as many as you want.
Okay, so let's zoom out a little bit so
you guys can read this and let's
continue. So for Reddit search, the
first thing we're going to do is define
our trigger URL, which is going to be
the bright data API. So we're going to
say
https/api.brightdata.com/datasv3/trigger.
Okay, then I'm going to say my params
are equal to and I'm going to put my
data set ID and this is going to be
underscore ID and this is going to be
equal to the ID that we had up here. So
I'm just going to copy it and paste it
inside of here. Okay, then we're going
to have include errors and this is going
to be true inside of quotation marks.
I'm going to have type and this is going
to be discover
new. Again, I'm getting all this from
the Bright Data documentation and I'm
going to say discover_by
and then keyword. Okay, so this
indicates what type of kind of search
we're doing essentially. Next, I need to
indicate the data that we're searching
for. So, I'm going to say data is equal
to and this is going to be a list. And
then inside of here, I'm going to put
all the keywords that I want to search.
Now, you'll notice that because this is
a list, I can actually put multiple sets
of keywords at once and Bright Data will
go and asynchronously scrape all of them
for us. So, what that means essentially
is that if you wanted to do a 100
different search strings or a thousand
different search strings, you can do
that in one API request rather than
having to send multiple of them because
this is set up to obviously scale. So,
we're going to say keyword is equal to
keyword.
We're going to say date and then this is
going to be equal to the date. We're
going to say sort by
okay and this is going to be sort by and
then we're going to say the num of posts
is equal to the number of posts and then
again you could write this multiple
times for multiple search strings. So
this is going to start setting it up.
We're now going to say raw data is equal
to and I'm going to call a function here
that I haven't yet defined. So right now
we're going to say none. We're going to
say if not raw data then we're going to
return
none. Otherwise, we're going to go and
we're going to parse this raw data. So,
I'm going to say to-do parse raw data
and then we are going to return the
information right here, which we'll
write in 1 second. Okay. So, essentially
this is how we're going to start uh set
up the trigger. But essentially, what we
need to do is write a function that will
allow us to download the snapshot, which
is how we're going to get the data,
which is what I'm going to write now.
So, I'm going to put another function
here. Here I'm going to say define
underscore trigger underscore and
underscore download
snapshot like this. I'm going to take in
the trigger URL. I take in the params
the data and the operation name which in
this case I'm just going to call
operation. Okay. Now here what I'm going
to do is I'm going to make an API
request to brightite data. I'm then
going to get the snapshot information
and I'm going to pull that snapshot
until it's ready and then download it.
This is going to be a little bit of
code, so just bear with me here. I'm
gonna say trigger result is equal to
underscoreake API request. So the
function that we wrote before, we're
going to pass our trigger URL, our
params, which is equal to the
parameters, and our JSON, which is equal
to the data. And then what we're going
to do is say if not trigger result then
return none because of course if it
didn't give us anything then we can't
pull it. Otherwise we're going to say
the snapshot
and this is going to be underscore ID is
equal to the trigger result.get
snapshot ID or snapshot ID sorry. We're
going to say if not snapshot ID then
same thing return none because we don't
have any snapshot to retrieve.
Otherwise, what we're going to do here,
I'm going to write a to-do, is pull the
snapshot.
Okay, so that's what we need to do. Now,
we need to write some more functions uh
to essentially pull the snapshot and
download the snapshot. Now, to make this
a little bit cleaner, I'm going to make
a new file here. For this new file, I'm
going to call this the
snapshot_operations.
py. And I'm just going to copy in this
file just to save us a little bit of
time because it's pretty kind of
redundant code and it's not super
valuable for you to write all of this
manually. So what I'm going to do is
paste it in. It's about 70 lines. I'm
going to walk through exactly what it's
doing, but you can just simply download
this code by going to the link in the
description for the GitHub repository,
finding this file, just copying it, and
pasting it in here. Anyways, let me walk
through what we're doing here. So you
can see we're importing OS time requests
and typing. What I'm doing is I'm
pulling the snapshot status. So, I'm
getting my Bright Data API key. I'm
setting up the progress URL. I'm setting
up my headers. And what I'm saying here
is, okay, I want to keep sending
requests to this endpoint until
eventually it tells me that it's ready.
So, I'm going to do this a maximum of 60
times. I'm going to delay by 5 seconds
in between each of those so that this
takes me a maximum of 5 minutes. I'm
going to say, okay, checking snapshot
progress. This is the attempt. We're
going to get the response from this URL.
We're going to check the status. If it's
ready, we return true. If it's failed,
we return false. If it's still running,
then we just add a time delay. And we
keep doing this. Okay? And we keep going
and we keep going and we keep going
until eventually it fails or it's ready.
Now, we have another function called
download snapshot. And we only call this
function once the snapshot is ready. So,
same thing. We set up our API key and
our download URL. And then we simply
send a request where it downloads a
snapshot and then returns the data to
us. Okay. Okay, so that's all that I put
inside of this function. So now from web
operations, we're going to import those
functions. So we're going to say from
and this going to be snapshot operations
import and then we are going to import
the download snapshot and the pull
snapshot status. Okay, so now let's go
to our to-dos. So we have a to-do here
where we need to pull the snapshot. So
for pulling the snapshot, we're going to
do the following. We are going to say if
not pull snapshot status and the
snapshot ID then we are going to return
none. What this is going to do is it's
going to continually pull the snapshot
until it eventually gets a result of
true or false. True means we can
download it. False means there's an
error in which case we return none. So
we're now going to say the raw data is
equal to the download snapshot and we're
going to download the snapshot with this
snapshot ID which will contain our
scraped data and then we can return raw
data. Okay, so this function trigger and
download snapshot is going to well do
that. Okay, so now we can go to Reddit
search. Let me just add a new line here.
From this we can get the raw data now.
So we can say the raw data is equal to
underscore trigger and download
snapshot. We are going to pass the
trigger URL. We're going to pass the
params our data and an operation name
which I'll just call Reddit in case we
want to do some logging later on. Same
thing if there's no raw data we'll
return none. Otherwise we're going to
parse this data. So I'm going to say
parsed data is equal to an empty list.
that what I want to do is I want to take
all of the data that was returned to us
and I just want to get the information
from this data that I care about. That's
because I don't want to pass all this
unnecessary data to my LLM when I start
checking which post we actually want to
download or want to get the information
from. So I just want to get for example
the description of the post and the
title of the post or the title of the
post and the URL of the post just the
data that I actually need. Okay. So I'm
going to say parse data is equal to a
list. I'm going to say for post in raw
data. Then I'm going to say the parsed
post is equal to and I'm going to say my
title is equal to the post.get
and then title and I'm going to say my
URL is equal to the post.get and then
URL. Now each post in my raw data is
going to have a ton of information,
right? It's going to have the number of
likes, number of upvotes, the number of
comments. It's going to have a
description. and it's going to have the
date was posted. It's going to have the
author. I don't care about all that
information. So, I'm just parsing
through it, getting the information I do
care about. And then I'm going to say
parse data.append. And I'm going to add
this post to that data. Then here I can
return my parsed
data like that. Okay. So, this function
now should actually work where if we do
Reddit search, it should essentially
trigger this scrape operation to start
happening. So, Bright Data will go to
Reddit, it will do the search, and it
will start collecting all of the
relevant posts. Then, we're going to
pull that snapshot because it takes a
second to run. As soon as the snapshot
is ready, we're going to download the
snapshot. We're going to parse through
the results and then we're going to
return that parsed data. Now, the next
step after this would then be to get the
URLs from this parsed data that we want
to explore further and then to download
all of their comments. So, we're going
to do another operation in a second here
that's going to get all of the comments
from a list of posts. But for now, let's
test this one out by going back to
main.py and actually calling this
function now from our Reddit search.
Okay, so we're going to go to Reddit
search now. And we're just going to
change this to call the Reddit search
function. So what did we call this?
Actually, we called this Reddit search.
And actually, let's call this Reddit
search API. Uh because if we name it the
same thing as our function here, that's
going to be an error. So we're going to
say Reddit search API like that. And
then we're going to import this. So,
let's go up here and let's import the
Reddit search API.
Okay, cool. Come back here and same
thing. We'll just pass the user's
question. And then what we can do is we
can print out the Reddit results. Okay,
so now we've tested that function. So,
let's run this and see if it works. And
we're going to say, should I buy AMD
stock? Okay, and it says it's starting
the search. And it gave us a bad request
for this URL. Also, I probably just
typed something incorrectly and I will
check what that problem is. Okay, I was
just checking here in kind of a silly
mistake, but I accidentally had a
capital T when I typed all time here and
this needs to be a lowercase T. Uh, that
should fix the problem for us now in
this function. So, if we come here, we
can run this and we can say, you know,
invest in AMD and then we should be
good. And you can see it starts checking
the snapshot progress. Okay. Now, while
it does that, we're just going to make a
small change to the code as well because
the way that I'm returning this parsed
data uh from here is not actually how I
want to return it. What I want to return
instead is a format that makes a little
bit more sense. So, I'm going to say
return. So, I'm going to put a set of
braces and I'm going to say parsed posts
and this is going to be equal to the
parsed data. And then I'm going to say
the total found.
Okay, total found is equal to the len of
the parsed data. Okay, cool. So that's
that. And if we go here, it looks like
it finished running. And you can see
that we get some posts and some titles
from Reddit. Now, these don't seem to
make a ton of sense to me. So I'm just
going to quit this and try again because
I think I may have messed something up
in the search string here specifically
because I think I spelled invest
incorrectly. But let's just search
Nvidia here. And let's see if we get
some posts that make a little bit more
sense here from Reddit. Okay. And there
we go. So, these make a lot more sense,
right? Nvidia is actually in the post
title because I didn't spell it
incorrectly this time. And there's 75
posts as we go through here. And the
next step is going to be to narrow those
down so we can grab all of the comments
that we need from them. Okay, so that is
working. And we've got this first
function where we're doing the Reddit
search. Next, we want to get all of the
Reddit posts. So, what I'm going to do
is write another function here. And then
this will wrap up all of the search
operations, and we'll go back into
Langraph and start doing some of the
kind of prompting with the LLM. I just
like to get the data first. Then once we
have the data, we can pass it to the
LLM, and we can kind of analyze it. So,
here we're going to go Reddit_post
retrieval.
Okay? And we should spell retrieval
correctly. What we're going to do is
we're going to take a list of URLs.
We're going to say days back. So, this
is the number of comments that we want
to get or how many days back we want to
get the comments from. We're going to
say load all replies. For now, this is
going to be equal to false. But if you
wanted to get all of the nested replies,
then you could go with true. And we're
going to say comment limit. And for now,
we'll just make this an empty string.
And then later, we can add a limit if
we're getting too many results. And what
we're going to do is we're going to say
if not URLs, then we're just going to
return none because if you don't pass me
any URLs, well then there's no reason to
do this search. And then what we need to
do is set up a similar thing to before.
So I'm going to copy this trigger URL.
And I'm going to put this here and we're
going to say the trigger URL is equal to
the following. Then we're going to say
the params are equal to and we're going
to say the data set ID. The data set ID
here is going to be different. I'll show
you where to get that from in a second.
And we're going to say include
underscore errors
is true. Again, we're then going to
create our data. So, we're going to say
data is equal to same thing. You could
run this at scale if you want. And we're
going to say URL is equal to URL. We're
going to say the days back is days back.
We're going to say load all replies.
Load all dollar replies. And we're going
to say comment limit is equal to comment
limit. And then this is going to be for
URL in URLs. So essentially we're
creating one of these entries for every
single URL, passing that all inside of
here, and then we'll get all of the
comments for all of these URLs with
these parameters. Okay, so now we need
to find the data set ID. So what we're
going to do is go back to write data
here. We're going to go to where it says
collect by URL. We're going to go to the
management API. And then if you scroll
here, you'll see this new data set ID,
which is the one that we're going to
copy. We'll come back here and we'll
paste that updated data set. So this is
the one for getting the comments. All
right. Now, we're effectively going to
do the same thing that we did before. So
here we're going to say the raw data is
equal to and it's going to be underscore
trigger and download snapshot. We're
going to pass the trigger URL. We're
going to pass our params data and the
operation name is going to be Reddit
comments. Okay, we're going to say if
not raw data, then we're going to return
none. And then if we do have raw data,
we are going to parse the comments. So
we're going to say parsed comments
is equal to an empty list. We're going
to say for comment in the raw data.
We're going to say the parsed comment is
equal to and then we're going to start
writing the comments. We're going to say
the comment ID is equal to comment.get.
And then this is going to be the comment
id. We're going to say the content is
equal to the comment.get
and this is going to be the content.
We're going to say the date and this is
going to be the comment.get and then of
course the date. We're going to say the
parent comment ID because this will be
important for the linkage. So parent
comment ID is equal to comment.get get
and then this is the parent comment ID.
Okay. And then lastly, we're going to
say the postc_title
and this is going to be the comment.get
post title. And make sure we don't
forget to put that inside of quotes.
Okay. Then we're going to say the parsed
comments.append the parsed comment. And
then lastly from here we're going to say
return and we're going to return the
comments which is the parsed comments.
and we're going to say the total
underscore
retrieved.
Okay. And this is going to be equal to
the len of the parsed comments. Okay. So
that's it for getting the comments. Now
again, it's literally the exact same
thing as getting the post except we're
changing the data set ID and a few of
the different parameters. That's it. So
now we have the ability to get all of
the different uh comments for a
particular post. Now, before we can test
this, we need to know what posts we want
to get the comments for. So, what we're
going to do is close out of this. We're
going to close out a snapshot. We're
going to close out av. And now, we're
just going to be working inside of this
main file. So, the Reddit search works,
the Bing search works, and the Google
search works. Now, the next step is to
analyze the Reddit search, pull out the
relevant URLs, right? So, that's what
this is doing right here. And then to
retrieve those comments or really the
post data from those particular posts.
So let's move on and let's handle that.
All right, so let's move on to the next
step here where we are going to analyze
the Reddit post and then pull out the
ones that are relevant. Now, in order to
do that, we're going to need some LLM
operations here. And I'm going to make a
new file and I'm going to call this
prompts. py. Now, similarly to before,
I'm not going to write all of this out
from scratch because it is a good amount
of code and it's not super valuable to
do that. But what I'm going to do is
paste in all of the prompts that I've
already written that we're going to use
for this video. Again, you can get these
from the link in the description. Just
go to the GitHub repository and download
them. So, I'm going to paste it in. It's
going to look like a lot of code, but
really most of it is just prompts that
I've already written that I've tested
that work well here. Now, you see that I
have this class called prompt template.
I have a few static methods inside of
here where when you call this function,
essentially just returns to you the
prompt. So, for example, the Reddit URL
analysis system. That's the one we're
about to use. You're an expert at
analyzing social media content. Your
task is to examine Reddit search results
and identify the most relevant post that
would provide valuable additional
information. You get the idea. Okay. And
then we tell it do the following. You
know, find this information blah blah
blah. And then return a structured
response with the selected URL. Sorry.
Reddit URL analysis user user prompt for
analyzing Reddit URLs. So same thing
user question that we pass the user
question here. Pass the Reddit results
analyze these Reddit results. Same thing
for the Google Analysis. Okay. Pull this
in. Google analysis user. same thing the
user prompt. So we have the system
prompt and the user prompt and all these
functions or methods that contain the
prompts and allow us to pass some
variables and have it kind of embedded
inside of here. So don't worry too much
about this uh but there's a few
functions that of course we are going to
use from this file. So now we're going
to go to main.py and we are going to
import them. So we're going to say from
prompts
and we're essentially just going to
import all of the functions that we have
written there. So what this is going to
be is get the underscore Reddit analysis
messages, get the underscore Google
analysis messages, get the underscore
Bing analysis messages, get the Reddit
analysis messages, and get the synthesis
message. Okay? And then we can format
this. So we can just put set of
parenthesis here. Okay? So like that.
And then we can move this down to the
next line and kind of put all of them
like this. So they're all getting
imported from the same place. Okay. So I
think that is good. We have the prompts.
Now what we're going to do is go over to
analyze Reddit posts and we're going to
start using some of these prompts when
we call the LLM. Okay. So first things
first, we're going to get the user
question. So we're going to say user
question is state.get user question and
we're going to get the Reddit results.
So, we're going to say the Reddit
results is equal to state.get
and then you guessed it, this is going
to be the Reddit results or an empty
string. Okay, so we're going to say if
not Reddit results. So, for some reason
we don't have any, which can happen,
then we're going to say return and then
we're just going to return the selected
URLs equal to an empty list because we
won't have any to select. Next, we're
going to say structured_Lm
is equal to llmwith_structured
output. And what I'm going to pass here
is something called a pidantic model,
which will force the LLM to give me an
output in a particular format. So, we're
going to write that now, and you're
going to see how useful this actually
is. So, I'm going to make a class, and
this is going to be the Reddit analysis.
So we're going to say reddit URL
analysis and this is going to inherit
from the base model which we imported
here from pi dantic. Now what we're able
to do is define a python class and then
pass that to an LLM and tell the llm it
needs to give us an output that's in
this particular format. This allows us
to ensure that we always get something
in the same format. And in this case the
format that we want is just a list of
URLs. So what we can say is selected
urls. We can say this is a list of type
string and we can make this equal to a
field. This comes from pyantic and we
can say the description is equal to the
following. And then I'm just going to
paste in the description. So let me copy
it from my other code file here. But
essentially we just describe what we
want the model to populate this field
with. So I've said this is a list of
Reddit URLs that contain valuable
information for answering the user's
question. So now what will happen is
when I initialize the LLM I can give it
this model and I can say hey you need to
give me an output that's always in this
format and then every time we run the
LLM we're going to get selected URLs
it'll be a list and it will contain the
URLs that we need right that are
strings. So if we come here now we can
just pass this which is the Reddit URL
analysis and that's it. We've created
this structured output model and again
it's just very very useful at giving us
uh content in the correct format. We're
then going to say messages is equal to
get Reddit analysis messages. We're
going to pass the user question and
we're going to pass the Reddit results.
Okay, so these are the messages now that
we're going to pass to the LLM. And now
what we're going to do is we're going to
invoke the LLM and we're going to
attempt to get this kind of structured
output. So, we're going to say try and
we're going to set analysis is equal to
the structured if we could spell
structured correctly. Not sure why it's
not giving me the autocomplete. Okay.
Dot invoke. And then we're going to pass
these list of messages. Now, the
messages if we go and look at prompts
here is essentially just two messages,
right? So, we have like the system
prompt and then we have the user prompt.
So, that's all that we're passing. And
if we look at the Reddit one, so it's
right here. We get the Reddit analysis
system prompt from the prompt template
and then we get the user prompt. So we
just pass that here. Those are the two
messages. And then what we're going to
do is say the selected URLs is equal to
analysis. Okay. Dot selected URLs
because it's going to give us a Python
object. Let's fix the spelling. Okay. So
now what we can do is we can print out
the URLs just to make sure that it's
actually correct. We're getting the
proper URLs. So we can say for i,
url in enumerate and we can enumerate
over the selected urls. From here what
we can do is we can say print and then
we can put an f. We can put maybe a few
spaces here and we can say i and then we
can say dot url inside of braces. And if
we want I can just go with one here so
that we start the index at one. Okay.
Okay, so now this is just going to print
out the selected URLs. Before that, we
can also just say selected URLs just so
we have some logs and then we'll be able
to see what those are. Now down here, we
need the accept. So we're going to say
accept exception as E. We're going to
say print and we'll just print out E.
And then we'll say selected
URLs is equal to an empty list. And then
when we return the selected URLs, we'll
just return the selected URLs. Okay, so
that's all that we're doing. We
essentially said all right we're going
to create this structured output LLM.
What we do is we tell it that we need
something in this format which we
defined above. We generate the messages
that we need and then we pass that to
the LLM. So we invoke the LLM. We get
the selected URLs from the response. We
print that out. There's some error then
we print E and we say there's no
selected URLs and then we keep going
from there. Okay. So now this should
actually just work. Um what we can do is
we can test this and we can see if it
selects some URLs for us. So let's go to
run. Let's run this and let's say
Nvidia. Okay, it's going to start
searching for this and let's wait for
the snapshot and then see what URLs end
up getting selected. Okay, so we just
got an error here essentially saying
that we forgot to pass one of the
parameters to our functions get Reddit
analysis messages. So if we go here, you
can see that we have to pass the user
question, the Reddit results, and the
Reddit post data. So I think we probably
are calling the wrong function uh
because that's not the one we want. We
want the get Reddit URL analysis
messages which just takes in two
parameters and then we have another one
later. Yes. So this one here that takes
in four. So are these called the same
thing? No, they should not be get ready
URL. Yeah. So we just misnamed this uh
function call essentially. So what we'll
do is we will rename this to be get
ready URL analysis messages. That should
be fixed. And then we're going to rerun
this and same thing. Let's go invest
Nvidia and see what pops up. Okay, cool.
So, that just finished and you can see
that it actually selected four URLs here
and these all seem to be relevant in
terms of investing. We're investing in
video. Why are you investing in Nvidia?
Nvidia is rising today. DCA, Nvidia,
Tesla, I don't know what that is. You
get the idea. So, let's exit out of
that. That stage is completed. And the
next thing that we need to do now is we
need to actually retrieve all of the
comments and then get those comments and
again continue to pass those to the LLM.
So we already have the function to do
that, but we need to now write it inside
of main.py. So we're going to go to
retrieve Reddit posts. And what we're
going to do here is just do a simple
print statement and we're going to say
getting
Reddit post comments. Okay, like that.
And then we can say the selected urls is
equal to states.get selected reddit
urls. And then we can continue from
here. So we're going to say if not
selected urls again it's possible then
we're going to return and this is going
to say reddit_post
data. This will be equal to an empty
list. And then what we'll do down here
is we will start to collect that data.
So we're going to say print and we'll do
an fstring. We're going to say
processing and then we'll say len of
selected urls and we can say reddit
urls like that and then we can say here
the reddit post data is equal to
reddit_post
retrieval which we need to import. So
let's go import that from the top of our
program. So we can import that here.
Reddit post retrieval, the function that
we wrote. Scroll back down. Okay, so
Reddit post retrieval. From here, we're
just going to pass the selected URLs.
And that should be pretty much all that
we need to do. Now, down here, we're
going to say if Reddit post data, then
we can say print. Now, successfully
got and then we can say something like
this. Let's do an fstring
successfully got len of Reddit post data
posts.
Okay, let's fix the spelling here.
All right, so we successfully got those
posts and then otherwise we're going to
say else print failed to get post data
and we can return or we can say sorry
Reddit post data is equal to an empty
list and then here we will go and say
return the Reddit post data. Okay, so
that should retrieve the Reddit post
data for us. We're saying, okay, get the
Reddit post, get the selected URLs, make
sure we have some, obviously. Uh, if we
do, then we call this function, which
should go and grab all of the comment
data from that. And then if we want, we
can, of course, print this out. So now
we can print the Reddit post data and
make sure that's working before we move
into the analysis and kind of synthesize
step, which will be pretty
straightforward. So let's make this
bigger. Let's run this again. Let's go
invest in Tesla. I'm sure that's going
to be all over for Reddit. So, let's run
that and let's see what we get. Okay, so
I was just doing a little bit of
debugging here because the results I was
getting from the comments weren't great
and I realized this because I made a
small mistake in the way that we are
parsing this. So, if we go back into web
operations and we go to where we're
parsing the comments, we need to change
some of these fields because they're not
actually correct based on the response
that we're getting here from the API. So
the major change is that where we have
content, we're going to change this to
say comment. So we're getting the
comment because that's actually where
the comment is stored. Same thing for
the date. We're going to change this to
date posted. And then I think we can
just remove the parent comment ID
because that doesn't seem to actually be
working. And for the post title, I
believe that we don't need that either
because again, it wasn't populating
quite a bit. So, let's remove that and
let's just stick with the comment, the
content, and the date. Again, mostly
just changing this to say comment. And
then I'm going to run this again and
give it another test uh to make sure
we're getting the right data. Okay. And
there we go. It just loaded a bunch of
comments for me. And you can see now the
data is actually filling in. If we go
here, there is a lot of data that we
pulled because we pulled a bunch of
comments from a bunch of different
posts. All right. So, that is it for
that phase. So now we've got in the post
and we've got the comments from the
post. The next step is really to
synthesize all of this data together,
which is going to be pretty
straightforward. We just need to write
these four functions. So let's get
started here with our analyze Google
results. And then we can just go through
the rest of them. Again, it's pretty
much going to be copy and paste uh but
just changing a few things and changing
kind of the prompt that we're using. So
we're going to do a print statement
here. We're going to say print analyzing
Google search results like that. We're
going to get the user's question. So,
we're going to say user question
state.get user question. We're going to
say Google results is equal to state.get
and then get the Google results. Okay.
Then from here, we're going to say
messages is equal to get Google analysis
messages. And then we're going to pass
the user question and the Google
results. We're then going to say the
reply is equal to llm.invoke.
and we're just going to invoke the
messages and we're going to go here and
we're going to say the Google analysis
is equal to the reply content. All
right, so the llm is just the one that
we defined right at the beginning,
right? So if we go here, the chat model.
So we're just calling it raw without
doing anything else and essentially just
getting whatever response it has based
on our prompt. Again, you can go read
the prompt from in here, but essentially
we're just creating a prompt that says,
hey, you know, analyze these Google
results and give us the interesting uh
output. Okay, so let's copy the exact
same thing for Bing, except we're just
going to change everything to say Bing
essentially. So rather than the Google
results, this is going to be the Bing.
Change this to Bing. Same thing. This is
going to be Bing.
Okay. And then this needs to be Bing as
well. for the return. We can change the
print statement as well. Okay, cool. So,
that's pretty much it there. And rather
than get Google, this is going to be get
Bing. All right, so let's copy the same
thing and we're going to do it again.
This time for Reddit. Okay. Now, for
Reddit, it's actually going to look a
little bit different. So, we'll modify
this a bit more. So, let's paste this in
here. I'm going to say analyze Reddit
search results. Rather than just getting
the Bing results, we're going to say
Reddit results,
this is going to be Reddit results. But
then not just the Reddit results, we
also need to get the Reddit post data.
So we're going to say Reddit_post
data. It's equal to Reddit_ost
data. For the messages, this is going to
be get Reddit analysis messages. This
takes in three things. So the results,
the post data, and the user question. So
let's go to Reddit results and then
change this to say Reddit post data like
that. Okay. And then same thing. This is
just going to say Reddit like that.
Let's make sure everything else has
changed. So analyzing Reddit results.
Okay. Get the user question Reddit
results. Reddit post data. Invoke the
LLM. And then there we go. Okay. And
then the last thing that we need to do
is synthesize all of our analysis. This
is going to be quite a bit different. So
we'll just write this manually. We're
going to say print combine all results
together. We're going to say the user
question is state.get user question.
We're going to get the Google analysis
first. So state.get Google analysis.
We're going to say the Bing analysis is
the state.get Bing analysis. And then
the Reddit analysis is going to be the
same thing for the Reddit analysis.
We're going to say messages is equal to
get the synthesis messages. And we're
going to pass the user question, the
Google analysis, the Bing analysis, and
the Reddit analysis. We're then going to
say the reply is equal to llm.invoke
the messages. We're going to say the
final answer,
okay, is equal to the reply.content.
And then what we're going to do is we're
going to pass the final answer, which
will be the final answer. We also need
to pass messages because this is kind of
how langlow works. And we're going to
pass this where we say roll and this is
going to be assistant. Okay. And then
we're going to say content is the final
answer. All right. And let me zoom out a
little bit and kind of close this
sidebar so you guys can see what's going
on. Let's close the terminal as well. So
again, what we've done here is we said,
okay, we're going to get all the results
that we analyzed previously, right?
We're going to combine that into a
message, pass that to the LLM again, and
then it's going to synthesize all of
that together and return to us a final
answer and also just a final message. We
need this message again for the lang
flow kind of chain to operate properly.
So that's pretty much it. I mean, I know
that's a lot of code and we went through
a bunch of stuff in this video. Again,
all the code will be available from the
link in the description, but of course,
we need to test this and make sure it
works. So, let's bring this up and say,
"Tell me if Elon Musk
is a good person." Okay. And let's go
ahead and see what that tells us. Okay.
And after a minute here, we've gone
through this whole process and we get
this general response here telling us
what the sentiment is on if Elon Musk is
a good person or not. And if we scroll
over here, we can see all of the sources
where it was getting this information
from, you know, Reddit comments, etc.
Okay, so pretty cool. Now, obviously, we
can make this a lot better and we can
search more things and we don't just
have to have one search string and we
could actually have the LLM searching
multiple things and giving us a really
detailed response. I just wanted to show
you this to give you kind of the sense
of how you create this more complex
orchestration with an AI agent that's
pulling in a bunch of relevant data. In
our case, our lang graph is relatively
simple, right? We have the architecture
that I discussed before and we have
what, seven, eight nodes, something
along those lines. But if we added more
nodes, we added more LM interaction, we
allowed this to run a little bit longer,
we can get significantly better
responses. So I think with that said,
guys, that's going to wrap up this
video. If you made it to the end, give
yourself a pat on the back because this
is very complicated and was a long video
to go through. Again, all the code will
be available from the link in the
description. Massive thanks to Bright
Data for sponsoring this video and I
look forward to seeing you in another
one.
[Music]
Get started with BrightData and get $20 in credits for free: https://brdta.com/twt_websearch Check out PyCharm, the Python IDE for data and web professionals: https://jb.gg/check-pycharm-now In this video, we're building an advanced AI agent in Python using LangGraph. Now, this isn't going to be a basic jump on. This is a multi-step, deep research agent that will pull live data from sources like Google, Bing, and Reddit. I'm going to be covering advanced Python concepts, complex architecture, and best practices for building agents that go far beyond just a single prompt response. DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim š Video Resources š Code in this video: https://github.com/techwithtim/Advanced-Langflow-Web-Agent Learn LangGraph: https://www.youtube.com/watch?v=1w5cCXlh7JQ&ab_channel=TechWithTim UV Tutorial: https://www.youtube.com/watch?v=6pttmsBSi8M&t=1s&ab_channel=TechWithTim ā³ Timestamps ā³ 00:00:00 | Overview 00:00:55 | Project Demo 00:03:34 | Understanding Web Search 00:06:04 | Understanding the Architecture 00:08:35 | Project Setup 00:11:36 | Langflow Structure 00:37:44 | BrightData Setup 00:51:12 | Web Operations/Scraping/Searching 01:08:49 | LLM Calls & Prompting Hashtags #LangGraph #Python #AIAgents