How to Build an Advanced AI Agent with Search (LangGraph, Python, Bright Data & More) | DailyDevLists

Loading video player...

Full Transcript

19,371 words • EN

In this video, we're building an

advanced AI agent in Python using

Langraph. Now, this isn't going to be a

basic chatbot. This is a multi-step deep

research agent that will pull live data

from sources like Google, Bing, and

Reddit. Now, this tutorial is not for

beginners. I'm going to be covering

advanced Python concepts, complex

architecture, and best practices for

building agents that go far beyond just

a single prompt response. Now, for this

video, like I said, we're using

Langraph, Python, and I will also be

using bright data to access real time

web data. Either way, even if you don't

want to build a web-based agent, you can

still learn a lot from this video

because I'm going to build really a

pretty complex agent and show you how to

structure that in Python. So, by the

end, you're going to have a fully

functioning project and the skills to

build powerful production AI systems.

So, with that said, let's dive in and

let me show you a quick demo of the

project. So I'm in PyCharm here and I'm

just going to give you a quick demo of

what the finished project will look

like. Keep in mind that you can adjust

this a ton and this is really meant to

give you a solid base. Anyways, let's

have a look at what we can do. So as you

can read here, this is a multissource

research assistant, but you can use it

for a lot of different tasks. It says

ask me anything. So I said, should I

move to Dubai from Canada? And then what

we do is we start in parallel searching

both Google, Bing, and Reddit. Okay, so

we can actually do this at scale and

search for like thousands of different

results. So you can see we searched Bing

for this, we searched Google for this,

we searched Reddit for this, and then

what we start doing is waiting for the

results. So essentially what Bright Data

is going to be doing is it's going to be

pulling all of these results into a

large snapshot for us. See, we get all

the results here. We download it and

then we're able to actually, for

example, parse through all of the Reddit

posts. In this case, I just did 75 of

them. We analyze the Reddit post. We

find the titles that actually make sense

for our search query. We then go, we

find those Reddit uh posts, sorry. We

download all of the comments from those

two Reddit posts. You can see right here

it says it downloaded 26 Reddit

comments. We then take all of that data

and we analyze it. So we analyze the

Google results, the Bing results, and

the Reddit discussions. And we

synthesize all of that data into one

final answer. And then it gives us the

response right here. So it says, you

know, taxfree income, cost of living,

lifestyle, and environment. And if we

scroll over here, it tells us where it

actually got this data from. So from

Google and old assets. Here it found

this from Reddit r/duby and it gives us

kind of all of these sources and quotes

where it found this information from.

Same thing I asked it should I buy AMD

or Nvidia stock? Same process. It goes

through finds all of this relevant web

data finds all of these different posts

and then as we scroll down you can see

here in this case there was a lot more

comments that it analyzed and then it

gave us this comprehensive response

breaking down whether we should invest

in Nvidia or AMD. And ultimately it

gives us a conclusion here and says

Nvidia appears to be a more

straightforward choice for those

prioritizing immediate and robust growth

in AI who favor it strong financial and

marketing position. Okay, so there you

go. And then I just stopped the agent

here which is why we got that kind of

interrupt message. So overall this is a

very powerful agent and the interesting

thing about this is the way that we're

going to write this is it's very

scalable. So I could run this agent

thousands of times in parallel. I could

scrape thousands of different posts and

nothing really changes here. You'll see

how we do that later on. But because of

the technology that we're using and the

way that we write this, we really can

run this actually in production in a

scalable format and it's going to be

pretty fast to actually execute. It

doesn't take a very long time due to the

services that we're using. Anyways, with

that said, I want to quickly talk a

little bit more about search because

that's a really important part of what

we're doing here. And then we'll get

back onto the computer and start coding

all of this out. So, now that we've

looked at a quick demo, I want to

briefly touch the current problems that

exist with search because that's what

this agent is going to do. it's going to

be searching the web. Now, most AI agent

systems today share a pretty major

limitation. They can't access the full

range of data that's actually relevant

to the problem. So, relying on a basic

search API, or just a single plug-in

means that you're usually only seeing a

fraction of what's actually available on

the public web. So, important sources

like live social media sentiment, real

time crawling, or historical trend data

often go untapped. Now without them the

results are incomplete and the decisions

these agents make can be based on

outdated information and opportunities

are going to be easily missed. Now on

top of that many setups require

stitching together multiple APIs that

don't work smoothly together and they

leave the agent with a narrow or

fragmented view of the entire world kind

of based on the web. Now with that said

that's why for this video I mentioned

we're going to be using Bright Data. Now

Bright Data has been a long-term sponsor

of this channel. I've worked with them

on many videos now and what they have

here is a web discovery API that

provides a much easier way to access a

wide range of public data. So rather

than just basic search results, it can

pull all kinds of information like live

SER data from engines like Google, Bing,

and other search engines. Realtime

sentiment from platforms like Twitter,

Reddit, and Tik Tok. In this video,

we're going to be scraping Reddit, for

example. They have historical web data

that goes back years. and then insights

from answer engine such as perplexity,

Gemini, chat, GBT, etc. Now, that's

because Bright Data handles the

crawling, parsing, and does this

reliably in one unified API so that we

can focus on building the agent rather

than trying to set up these really

complex web scrapers. Again, I've used

sprite data many, many times in the

past. Essentially, just a smarter,

easier way to be able to scrape the web.

And we're going to integrate that tool

into our agent here in this multi-step

kind of orchestrated flow so that our

agent essentially grabs all of this real

web data based on what we're asking for,

analyzes it using models like chat GBT,

and then gives us a really competent

response that's kind of followed the

strict protocol that we've set up. So

with that said, let's get onto the

computer here. Let me start explaining

the architecture of our agent. and we're

going to start kind of scaffolding this

out, building it, and then going step by

step because I won't lie to you, there

is a lot of code here. And also, I'll

quickly mention that all of the code for

this video will be available from the

link in the description. Anyways, let's

dive in. So, we're back on the computer

now and we're going to start coding this

out. Now, first I want to quickly

explain the Langraph architecture that

we're going to use for this agent just

so you can understand what it is that

we're about to build. And with that

said, if you're unfamiliar with

Langraph, I do suggest you have some

background with that before following

along in this video. Otherwise, it might

be a little bit confusing. So, I'm going

to put a video on screen right now that

explains Langraph in depth that you can

follow along with to get some context

before going into more of a complex

project. Okay. So, anyways, we're going

to be using Langraph and essentially

what's going to happen is the following.

The user is going to ask some kind of

question and we're going to

simultaneously in parallel go and search

Google, Bing, and Reddit for information

based on their question. Now, in theory,

we could search a lot more sources as

well, but I'm just keeping it a little

bit slim for this video, so it doesn't

take us 25 hours to code this out. Okay.

Now, after that, what we're going to do

is we're going to wait to analyze the

Reddit posts. The reason for this is

that the Google results and the Bing

results are extremely fast because

Bright Data already has them indexed.

So, we get them in like a few seconds.

Whereas the Reddit post can take a

little bit longer because we're doing

manual web scraping. Well, Bright Data

is doing that for us and pulling all of

the relevant posts. So, in this case, we

have to wait for the Reddit post. Takes

a second. And then once we get the

Reddit post, what we're going to do is

retrieve all of the posts that are

actually related to the prompt that we

passed in. So, we're going to search

Reddit. It's going to return a bunch of

different results for us. Then we're

actually going to analyze those Reddit

posts. We're going to pull out the ones

that make sense. And then from those

posts, we're going to retrieve all of

the comments on those particular posts.

Hopefully that makes sense. But that's

kind of these two steps here. Now, after

that happens, we're then going to

analyze all of the results that we got.

So, we're going to analyze the Google

results, the Bing results, and the

Reddit results. This is so that we cut

down the amount of information that we

have before we synthesize that all

together into one larger prompt. So we

have these three kind of smaller prompts

that are focused on pulling information

that we need from each source. Then we

take all of that and we pass that to

kind of a synthesizer which is then

going to take all three results from

here and synthesize that into one final

answer which we'll end up getting here.

Okay. So this is kind of the

architecture or the graph that we're

going to build. And of course, when I

say lang graph, that's referencing this

graph, right? Like we're building this

graph essentially where we're flowing

data through this and eventually getting

this final answer. Okay, so that's what

we're going to build. So what I'm going

to do now is go over to PyCharm and

start setting this up. Now, for this

video, you can use any IDE that you

want, but I do typically recommend

PyCharm for larger Python projects,

especially when you're working with AI

or modules like Langraph, Langchain,

etc. And I do actually have a long-term

partnership with PyCharm and you can

check them out and start using it for

free from the link in the description. I

always recommend at least try it. If you

don't like it, you can switch to

something else. But personally for me,

it is my favorite for larger Python

projects and it is literally designed

for Python, hence the name PyCharm.

Anyways, let's get started here. So,

what I've done is I've opened up a new

folder and I've just called this AI

search agent. You can call this anything

you want. And then from this folder,

we're going to initialize a new UV

project and create our virtual

environment in Python. So in order to do

that, I'm going to type UV init and then

dot in my terminal. This is going to

initialize a new UV project. And from

here, we're going to install the

dependencies that we need. So I'm going

to type uv and I'm going to add lang

chain.

Okay. and then lang graph and then we're

going to install lang chain and then

dash open ai because we need to use gpt

here and then we're going to install

python env. Okay, these are the four

dependencies that we're going to need to

install. So let's go ahead and press

enter here and they all get installed in

our environment. Now feel free to use

pip or any other virtual environment

that you want. But if you want to use UV

and you're not familiar with it, I'll

leave a video on screen that teaches you

how to use UV because it's kind of the

standard now. It's very fast for

managing dependencies and environments

in Python. Okay, so now that we've got

UV installed, we're going to start

setting up our project. So, there is

quite a bit of setup and there's going

to be a lot of code here. If at any

point you're getting lost or you just

want to copy something that I'm writing,

you can do that by clicking the link in

the description. There'll be a GitHub

repository that contains all of the code

for this project. So, what I'm going to

do now is I'm going to make a new file

inside of this folder called env. Now,

this is going to store some environment

variables that we need. I'm just going

to ignore this for right now.

Specifically, our OpenAI API key and our

bright data API key, which we're going

to get later on or in a few minutes. So,

we'll just start by writing the

variables that we're going to need. So,

first is going to be bright data

API_key.

Okay, we'll fill that in later. And then

next is going to be the open AI API_key.

And again, let's spell open correctly.

We can get that later. Okay, so we have

our environment variables defined. Now,

we're going to go into our main.py. py

file. Just create a new one if you don't

have one here and we'll start writing

some code. Now, PyCharm is also

prompting me just to configure my

interpreter. So, let me just select the

correct one and then I'll be right back.

All right, so I've configured the

correct interpreter and now what I'm

going to do is start scaffolding my

project. Now, when I say scaffolding,

what I mean is I'm going to write all of

the functions and logic that we'll

eventually implement. But for now, I'm

going to just kind of connect it

together so that we understand the

architecture and the flow that we're

going to follow along with. And then

once we have that, it's a little bit

easier to go write each individual

function. This is typically how I plan

larger projects. So I'm going to kind of

walk you through my thought process in

this video and you'll see how we plan it

out. So first we're going to say from So

let's do this correctly. Env

import load.env. This is going to allow

us to load in the environment variables

that we've defined in this. Envile. I'm

going to say from typing and I'm going

to import annotated. Okay. I'm gonna say

from

lang graph and then this is going to be

dotg graph. I'm going to import the

state graph and the start and the end

nodes. Now again if you're not super

familiar with lang graph essentially

this allows us to build a graph which is

a bunch of different nodes that are

connected to each other and to flow some

state or some data through that graph

where each kind of node in the graph can

modify or update that data. So we're

going to have some state which is going

to store all of the information that our

agent needs to have access to. And as we

run through these different stages or

nodes in our graph, we'll be populating

that state where then at the end of our

graph, we have this final answer which

we can present to the user. So it's kind

of a really unique way to build AI

agents in a bit more of a kind of

predictable flow. Rather than just

giving a set of tools to a model and

letting it go crazy, we actually kind of

walk through this manual process that's

a lot more consistent where we update

this state kind of stage by stage. Okay,

so next we're going to say from lang

graph and this is going to beg graph.

Okay, dot message we're going to import

add messages. We're then going to say

from langchain dot and this is going to

be chat_models.

We're going to import the init chat

model which is a really quick way to

initialize an LLM in langraph. We're

then going to say from typing extensions

and we're going to import the typed

dictionary. Okay, these are just some

typings that we need in Python. And then

we're going to say from pi dantic. Okay,

and we're going to import the base model

and the field. And then we're going to

say from typing import list. And

actually I realized we can just put this

up here because we need list and

annotated and they come from the same

package. Okay. So that's most of our

imports. Later we'll import a few other

things but for now we can start with

this. And then we're going to call this

load.env function. When you call.load

env

file and it loads these variables for us

that we can start using them inside of

our Python code. Okay. Now, next step,

we're going to say llm is equal to init

chat model. And for the chat model,

we're going to put the name of the LLM

we want to init, which in my case is

just going to be GPT40. Now, you can put

pretty much anything that you want here.

You just need to make sure that if you

put a different model from another

provider that you pass the correct API

key in this file. So, automatically when

we try to load GPT40, Langraph is going

to look for the presence of this OpenAI

API key variable and then use that as

our API key. Okay, so we just put GPT40.

You can put a newer model, whatever you

want. Just make sure you have the

correct API key. Okay, so now that we've

done that, what I want to start with is

actually writing out the state that

we're going to pass through our graph.

When we have the state, we'll kind of

understand the data that we need to come

up with and find. And then I can start

creating this graph and making the

connections between these nodes. And

then we can start actually kind of

populating the graph by writing the

different implementations.

So for now we're going to say class

state and this is going to inherit from

the typed dictionary. Okay. Inside of

here we're going to start by having a

list of messages. Now these messages are

essentially the messages that our user

is sending into this graph that we'll

then process and start getting the

information for coming up with an answer

for. So we're going to say messages is

annotated and then this is going to be

list and add underscore messages. When

we do this, we mean that okay, messages

is type list. And when we want to add a

new message, we call this add messages

function, which will uh essentially

modify the messages for us. Okay. Next,

we're going to have the user question.

And this is going to be string or none.

Okay. Then we're going to have the

Google

results. And this is going to be string

or none. And you'll notice that all of

these are going to be or none because at

some point in time, we may not have

these results. and then later we'll

populate it. So then next we're going to

have the Bing results. This is going to

be string or none. Then we're going to

have the Reddit results. Again, string

or none. Okay. Then we're going to have

the selected_reddit

URLs. This is going to be list of type

string or none. Now, the reason why I

have selected Reddit URLs is because

we're going to get a bunch of results

from Reddit. And then we're going to

pass these results to an LLM where it's

going to select which of these URLs we

actually want to process further just to

avoid us looking at data that we don't

need. Then we're going to have

Reddit_post

data which is going to be the data for

those selected URLs. This is going to be

list or none. We are then going to have

the Google_analysis.

This is going to be string or none. This

is the kind of LLM analysis. After we

get our results, we're going to have the

Bing analysis which is string or none.

And then the Reddit analysis which is

string or none. And then finally the

final answer which again is string or

none. Okay. So this is essentially our

state. This is what we're going to be

flowing through the graph. And we'll

start populating these one by one as we

kind of go through all of the nodes that

we have in our lang graph. All right.

All right. Now, that's great. What we

need to do next is we need to start

defining the nodes that we have in our

graph. So, if we look here, this is what

we need, right? We need to do Google

search, Bing search, Reddit search,

analyze the Reddit post. All of these

are essentially just functions that need

to execute in some order. So, what we're

going to do now is we're going to write

all of these empty functions without the

implementation. And then later, we're

going to implement all of these

functions. This way we can build this

flow and then later we can go and we can

actually add all of the different pieces

and kind of test it step by step. So

what I'm going to do is I'm going to

start defining a bunch of functions. So

the first function that I'm going to

have is going to be called Google

search. So I'm going to say define

Google search and then for all of these

functions what they take as a parameter

is simply the state. Okay, so we say

state and then that's going to be equal

to this right here. they take in this

state and then what they're going to do

is return something that modifies the

state. So for right now, we're just

going to say return. We don't need to

return anything right now. And we're

just going to keep making a bunch of

functions with all the different

operations that we need to perform. So

after we have Google search, then we

have the Bing search. Okay, that's the

next function or the next note. Then

we're going to have the Reddit search.

And again, we're just going to populate

all of these and then finish them later.

Okay, then we're going to have analyze

Reddit posts. So let's go analyze

Reddit posts. Okay. Next function is

going to be retrieve Reddit posts. So

let's change this. Retrieve Reddit

posts.

Okay. And I'm just going to add some

spaces between here because you can see

PyCharm is kind of linting this and

telling us that we should have two

spaces between our functions. So that's

fine. After this, we're going to have

analyze Google results. So analyze

Google

results. We need to analyze the Bing

results as well. So in fact, let's just

copy this function

and paste this down here. Analyze

Bing results. Okay. Then we need one

more for analyzing the Reddit results.

Okay. So Reddit results are a little bit

different than the Reddit posts. So

we're going to have Reddit results and

then we're going to synthesize this. So,

we're going to say synthesize_analysis

like that. And then this is going to

actually take all of the results that we

had here and synthesize it into one

larger result. Okay. So, we have a bunch

of functions. We're almost done with

functions. What we're going to do now is

we are actually going to create the

graph where we connect these nodes

together. So, again in Lang graph

essentially we have nodes. Nodes are

really just operations or functions. So

we've defined now what all of our

operations are going to be. What we need

to do though is we need to connect them

to one to one each other. Sorry. So we

know that we start for example with

Google search then we do the Bing search

then we go to Reddit search then we

analyze this and we need to just create

these connections or create the graph.

So what we're going to do is we're going

to say our graph builder is equal to and

this is going to be state graph. And for

the state graph we simply pass our

state. Okay. And now that we have that,

what we're going to do is we're going to

add all of these nodes into our graph.

Now, the way that you do that is the

following. You say graph_builder

add node. And then what we're going to

do is give the node a name. So we're

going to say something like Google

search. And then this is going to be

pointing to the Google search function.

But make sure you don't call the

function. All right. Now, this is a

little bit tedious, but essentially you

just need to give a name to every

function or every node that we have. So

you see like the next autocomplete is

we're going to add the Bing search and

that's going to be the Bing search

function. We have the Reddit search

that's going to be the Reddit search

function. And I'm just going to keep

going here and using my autocomplete. So

we're going to have analyze Reddit post.

That's going to be to analyze Reddit

post function. Analyze Google results.

And you get the idea. And I'm just going

to go through here and analyze results

results is definitely not the name of a

function, is it? Um okay. If that is

then I definitely spelled something

wrong. So we don't want analyze results

result. We want analyze Reddit results.

Okay, so let's fix that. And let's go

here to analyze

Reddit results and analyze Reddit

results. Okay, good job. We caught that.

And then we have the last one, which is

the synthesis. So, let's add that here.

Okay, it looks like we're actually just

missing one here, which is retrieving

the Reddit post. So, let me add this.

We're going to say graph builder.add add

node and this is going to be be retrieve

okay Reddit post like that and then

we're going to fix this to be

retrieved_reddit

post. So we analyze the Reddit post we

retrieve the Reddit post and then we

have the rest of them here which I think

is all good. Okay. And ignore the yellow

highlight. We're going to fix that later

on. It's just because we're returning

none from these functions. All right.

All right. So at this point, what we've

done now is we've created the nodes. So

they exist, but they're not yet

connected. So now that we've created

them, we need to connect them to one to

one each other. Sorry. So the way that

we do this is we can say graph_builder.

And we can say add edge like this. And

an edge is a connection between the

nodes. And we're going to connect the

start to the first node, which is going

to be our Google search. So what this

says is okay when we start the graph the

first thing that we're going to do is

we're going to go to Google search. Now

what we're going to do is we're going to

connect the start to multiple of these.

So we're going to connect to Google

search as well as Bing search as well as

the Reddit search. So this way what will

happen is at the exact same time we'll

execute all three of these operations.

So we'll run them in parallel. So as

soon as we start or as soon as the user

gives us some message or some request,

we searched on Google, we searched on

Bing and we searched on Reddit at the

same time. Okay, cool. So next thing

we're going to do here is we're going to

connect the next steps. So after Google

search, what do we do? After Bing

search, what do we do? You get the idea.

Okay, so in order to do this, we're

going to say the graph builder added.

And the first edge that we're going to

add might seem a little bit weird, but

from Google search, we're actually going

to connect the analyze Reddit posts.

Okay, let me make this a little bit

smaller so you guys can see this. Now,

the reason we're doing this is because

if we follow our architecture, right,

after we do all of the searches, we need

to wait a second to get all of the

Reddit results before we can move any

further. So, all of these connect to the

analyze Reddit post, which is what we're

doing right now. Then, we'll retrieve

the Reddit post. then we'll go and do

all of this analysis. Now, there's

actually some ways that we can make this

a little bit more efficient, but for

right now, this is just a simpler

architecture that I want to follow. Um,

you probably know what I mean if you're

thinking like, okay, how can I make this

a bit more efficient? But for right now,

I don't want to make it too complex. So,

we're just going to go with this. Okay,

so after the Google search, we go here.

After the Bing search, we're going to go

here as well. So, we kind of wait at

this stage. And then, of course, after

the Reddit search, we need to go here,

too.

Okay, so we've now made this next

connection. So we have start to Google

search, start to Bing search, start to

Reddit search, and then from each of

these we wait at the analyze Reddit

posts. Okay, now after we analyze the

Reddit post, what we need to do, so

let's put one more in here, is we need

to retrieve the Reddit post. So we're

going to go analyze Reddit post to

retrieve Reddit post and add that edge

between those two. All right. Now, the

next thing that we're going to do is

we're going to say graphbuilder.add

edge. Let's get rid of all of this. And

we're going to start now from the

retrieve Reddit post. And after the

retrieve Reddit post, what we're going

to do is we're going to start analyzing

all of our results. So we're going to

say analyze. And then the first thing

that we're going to do is the Google

results. So underscore Google results

like that. Let's spell analyze

correctly. Okay. Now let's copy this and

go down here. Now, after retrieve Reddit

post, we are also going to go and

analyze the Bing results. Okay? And then

we're going to go one more down here.

After we retrieve the Reddit post, we're

also going to analyze the Reddit

results. All right? So, hopefully this

is making sense. But again, if we go

back to the architecture, we wait for

the Reddit post, we retrieve them. Then

after we retrieve them, we go and we

analyze the Google results, the Bing

results, and the Reddit results. That's

what I've just written, right? We go

Google, Bing, Red. So, analyze those

three at once. Okay. Now, after those

three, what we need to do is we need to

synthesize our analysis. So, we're going

to do this again. We're going to add

another edge. Now, we're going to start

from analyze Google results. And the end

key here is going to be the synthesize

analysis.

Then, we're going to copy this. And I'm

just going to copy it twice because

we're going to need it here. After we

analyze the Bing results, we're also

going to go there. And after we analyze

the Reddit results, we're going to go

here as well. Okay, so those three then

connect to this next node where we're

synthesizing everything. And then

lastly, we add one more. So we say graph

builder added edge and we're going to

say from the synthesize analysis, we're

going to go to the end. Okay, so this is

how you set it up. You need to always

have a start key, which we do right

here, and an end key or an end node. And

then this creates this graph that you

just saw in this diagram. Okay. Now,

after we create the graph, what we can

do is compile it. So, we're going to say

graph is equal to graph_builder and

notbuild, but compile. This is going to

actually execute the graph for us. So,

we're able to run it. And then what we

can do is we can essentially pass a

message to this graph. it will run

through all of the different nodes. The

state will get updated and then we can

print out that state. So what I'm going

to do is I'm going to write the function

that would allow us to execute this

graph. And then of course before we can

do that, we need to start writing all of

these different functions. So we're

going to make a function here. I'm going

to call this run chatbot. Okay, what

this is going to do is start executing

our uh what you call agent so that we

can actually run through this graph. So,

what we're going to do is we're going to

do a print statement and we're just

going to say a multi- source research

agent like that. I'm going to print a

line and I'm going to say, you know,

type

exit to quit. Okay. And then back

slashn.

All right, that's good. And then what

I'm going to do is have a while loop.

So, I'm going to say while true. And

we're going to keep asking the user to

give us some input. So we're going to

say user input is equal to and then

we're going to say input I'm going to

say ask me anything colon like that

we're going to say if the user input.

equal to exit then print by and we can

break the loop. Okay. Otherwise what

we're going to do is we're going to say

state is equal to and we need to

initialize a kind of starting state. So

in order to do that, we could say state

is equal to the following and we can say

messages and then for the message we

just need to put the message that we

want the uh bot to reply to. So we're

going to say roll is user and we're

going to say content is the user input.

Okay, we're then going to say that the

user question is the user input. We're

going to say the Google So we need to

actually have this in a string. The

Google results is just equal to none.

And then we're going to do the same for

all the rest. So the Bing results is

equal to none. The Reddit results is

none. The Google analysis is none. The

Bing analysis is none. The Reddit

analysis is none. Final answer is none.

And then of course there's a few other

ones that we missed. So let's go here.

After Google results, Bing results,

Reddit results. We also have the

selected

underscore Reddit URLs. Okay, this is

going to be none. We then have the

Reddit post data. So let's do this.

Reddit_post

data and that is none as well. Okay. And

I think that's all of the state that we

need. So again, kind of at the

beginning, we need to initialize the

state that we're going to be passing

through the graph. So that's what we've

just done. We've plugged in the user

input. So the question that they've

asked us and then what we're able to do

is start running the graph. So what we

can do here is we can do a print

statement and we can say back slashn and

then something like starting parallel

okay and let's go here research

process

dot dot dot then we can do another print

and we can say something like launching

Google this is just for logging by the

way but I think it looks nice bing and

reddit searches dot dot dot okay I'll do

a back slashn here as well to kind of

separate this out. And then I'm just

going to print a dash*

80 uh just so that we get kind of some

separation here between what is

appearing. Okay, so actually I just

missed something. So up here what I'm

going to do is I'm going to say the

final state is equal to graph.invoke.

I'm going to invoke the graph with my

state. And then what I'm going to do

down here is I'm going to say if the

final state and then I'm going to say

get and this is going to be the final

answer. So if that does exist then I'm

going to print out the answer. So I'm

going to say print. Okay. I'm going to

do an fstring. I'm going to say back

slashn. I'm going to say final answer

like that. I'm going to say back slashn

again. And then I'm going to put the

final state. Okay. Get final answer like

that. and then we'll put a back slashn.

Okay, so essentially what I'm saying is

all right, we're going to invoke the

graph. This is how you invoke it. We

pass our initial state. Again, don't

worry too much about that. We'll fix

that later on. We say if the final state

does contain a final answer, then we'll

print out the final answer and it will

just print kind of a separation here so

that if we run this again, we know, you

know, which run was which. Then lastly,

we need to execute this function. So we

can say if name is equal to main, then

we can run the chatbot. And then that

actually would be a finished program

assuming that all of the nodes were

completed which of course they're not

and we're going to need to write. So let

me quickly just kind of summarize what

we've done here. I just want to zoom out

a little bit so you guys can read this.

Essentially we started with all of our

imports. We loaded the environment

variable file. We initialized our LLM

which we still need the API key for

which we're going to get in one second.

We created our initial state. Okay. And

then what we did is we kind of stubbed

all of these different operations or

nodes that we're going to have in our

graph. So Google search, Bing search,

Reddit search, analyze Reddit post,

retrieve the Reddit post, analyze Google

results, analyze Bing results, analyze

Reddit results, synthesize the analysis,

and then we created the graph. So we

added all of the nodes where we

connected functions to the node name.

That's essentially what we did here. We

then added all of the edges. So we

started with these three running in

parallel. We then kind of connected them

to this analyze Reddit post node. We get

the Reddit post here after we analyze

them. And then we go through the rest of

the flow until we eventually synthesize

all of the results. We then have this uh

while loop that just allows user to type

something in and essentially run it

through our graph. And that is the lang

graph component kind of done. What we

need to do now is we need to start

updating the state as we go through

these various nodes. So let's get into

that. And that's going to allow us of

course to start searching the web using

the SER API, all of that kind of stuff

that I'm going to show you. So for now,

because I want to be able to test this

step by step, I'm going to start filling

in some of the outputs that we're going

to have from these functions so that

even if they're not fully complete,

we'll be able to execute the graph and

kind of test the Google search first,

then the Bing search and see what

results we're getting. So from our

Google search function, I'm going to say

the user question is equal to state.get

and this is going to be the user

question or an empty string. Okay, so

because I have state in all of these

functions, I can pull out the state.

Then what I'm going to do is I'm going

to have a print and I'm going to say f

string. So, and this is going to be

searching Google for and then we'll do a

colon. So, searching Google for the user

question. Okay, I'm then going to say

the Google results

are equal to an empty list and later

we'll actually get the uh Google

results, but for now we'll just make it

an empty list. And then I'm going to

return the following which is Google

results is equal to the Google results.

Now whenever you return something from

these functions it needs to match what

you have in the state. So in this case

we have Google results matching with our

Google results. So this Google results

will get updated to be equal to whatever

this is. And then in the next function

we have access to these updated Google

results. That's how the state kind of

flows through here. you return a partial

update to the state from one of these

nodes and then it gets updated here

where it's continually passed to all of

the next nodes in the sequence.

Hopefully that makes sense. But that's

kind of the idea. Now for the Reddit

search or the sorry the Bing search,

it's effectively the same thing. So I'm

just going to paste this here and rather

than searching Google, I'm going to

search Reddit or not Reddit, Bing. I

keep messing these up. And then rather

than the Google results, this is just

going to be the Bing results. So change

this

change this and change this here.

Okay. And then for the Reddit search,

again, it's pretty much going to be the

same thing except just named Reddit. So,

we're going to paste this here. We're

going to say searching Reddit.

And then just update all these variables

to say Reddit.

Okay. Reddit and

Reddit like that. Okay. Now, we're going

to go to the analyze uh Reddit posts.

From here, essentially what we're going

to do right now is just return some fake

data. So we're going to say the

selected_reddit

urls. And for now, this is just going to

be equal to an empty list. Later, we can

populate that, but for now, that's all

we need. Now, same thing when we talk

about the Reddit post data, we're just

going to say return Reddit post data is

equal to an empty list. And then for the

analysis, same thing. We're going to

return. Okay. And this is going to be

the Google analysis. And for now, just

going to return an empty string.

And then we can do the same thing for

Bing. So return the Bing analysis empty

string.

Okay. Return the

Reddit analysis empty string and then

the final answer. So we're going to say

return

final answer.

Okay. And then empty string. just so

that all of these functions work

properly and they return the correct

format. Okay, cool. So, I just saved

this now and what we can actually do

just to test and make sure that the

logic is set up correctly is we can just

run this file. It should ask us to type

something in and it should just give us

no response. So, what I'm going to do is

just press on run here and we'll see if

we get any errors. It should just prompt

us to type something. So, it says ask me

anything. So just go hello and then you

can see that it just kind of gives us

this output and then doesn't say

anything and says ask me anything. So

that to me means that this is working

again type hello and you can see that it

just kind of doesn't give us anything

but we get some output here and that's

actually exactly what we were looking

for. So I'm going to stop this here

because it means the flow is working

properly. And I did notice that we have

a few spacing issues. So let me kind of

fix that here. Looks like yeah we

randomly printed out a quote. So, let's

get rid of that and fix the quote here.

And I think we're kind of good to go.

So, the next step is going to be to set

up our LLM as well as to set up Bright

Data to start doing these search

operations. I want to start by searching

Google, then searching Bing, and kind of

walk through these and do one search

operation at a time so you understand

how they work. So, what we're going to

do is we're going to go and get these

API tokens. So, we're going to create a

Bright Data account in an OpenAI

account. Let me go to my browser and

let's set that up. All right. All right.

So, let's start by getting our OpenAI

API key so we can use GPT and then we'll

get the bright data one. So, what we're

going to do is go over to

platform.openai.com.

From here, we can just go to our

settings, then to our API keys, and we

can create a new key. So, from here, I'm

just going to go with AI agent as the

name. Okay. And then obviously, you

don't want to leak this key. So, I'm

going to copy it and I'm going to paste

it here in my environment variable file.

Okay. And now that we have that, we need

the Bright Data credentials. So, if you

don't already have an account, you're

going to need to create a new one on

Bright Data. I'll leave a link below in

the description and you should be able

to get some free credits. So, you do not

need to pay to use this for the

tutorial. Now, I quickly want to show

you a few of the services that we're

going to use here from Bright Data

because they have a lot of options when

it comes to getting web data

specifically for AI agents. So, for

example, they have a chat GPT scraper,

right? where you can actually scrape the

conversations from chat GBT, the

responses, the user queries, etc. We're

not going to use that here because we

don't really need it for this specific

tutorial, but in other ones, it's quite

useful. Now, we also have, let's go

here, social media scraper. So, this is

the one that we're specifically going to

use to scrape Reddit data. You can also

get stuff from Facebook, Instagram, Tik

Tok, YouTube, which is notoriously very

difficult to scrape. If you've ever

tried to build your own scrapers before,

you've likely seen that it's very

complicated to actually get this data

and you get blocked by captas, IP bands,

uh, etc. Whereas Bright Data can

actually overcome and bypass all of that

for you and just give you the data in a

very easy format. So, for example, you

can get Instagram profiles, posts, X,

LinkedIn. In our case, we're using

Reddit, which I think makes a lot of

sense for this particular agent, but

obviously you can pick pretty much

anything you want. And then we have the

SER API or the search engine API where

we can really quickly scrape all of the

major search engines like Duck.go,

Google, Bing, etc. This also works for

things like Google flights, right? Uh,

and all of those other services that

come from those search engines. Yeah,

like maps, images, hotels. Pretty cool.

I've done some other projects in the

past where I've used this. And again,

for this one, we're just going to use

the standard kind of Google Bing search

engines. They also have things like a

web archive. So, for example, if I go to

the documentation here, you can see that

you can actually scrape all of the

previous web data. So, you can get like

years back and you can kind of see

trends and historical data. Again, not

going to use that for this video, but we

could add that if we wanted to make it

more complex. Okay, so for now, we need

to make a new account or log into our

existing account. So, go to the link

that I have in the description. I'm

going to log in because I already have

an account here. For you, you are likely

going to create a new one. So, once

you've signed into your account, you

should be brought to a page that looks

like this. They actually recently added

this feature where you can just ask the

AI here and it can tell you how to do

what you want to do. Uh I'm not going to

use that though. What I'm going to do is

go to proxies and scraping from the left

hand side here. And what we're going to

do is create a new SER API. Now of

course there's a lot of other features

here as well like web scrapers. We'll

use this later to actually collect the

Reddit comments and the posts as you can

see that I was kind of doing already.

But for now, we go back to proxies and

scraping and we're going to go to add up

here and we're going to create a new SER

API, search engine API. Okay. So, press

on this. From here, we can give it a

name. I'm just going to call this AI

agent uh two because I already have one

called AI agent. Can give it a

description if you want. And then in

this case, I'm just going to leave this

at standard, but you could go maximized

if you care about actually retrieving

the ads. Okay. There's a few advanced

settings as well. We don't really need

to modify any of those currently. And we

can just go ahead and press on add

again. You should have some free credit.

So this should be free to use for you.

And then later obviously you can pay for

it if you want to use the service. Okay.

So I'm going to go yes create this new

zone. Once this is created it should

give us access to an API key which we

can actually see here and show us how we

can call this API. So notice here we

have method API and then we can do for

example Python and it gives us an

example of how to call this. What I'm

going to do for now is I am just going

to copy the API key which is right here.

And we're going to take that and put

that into our uh file into our

environment variable file. So let's go

here and paste the API key. And

obviously don't leak that to anyone. I'm

going to delete that after this video.

And then we'll be able to start using

this service, the search engine API. Now

if you go to the playground, you can

actually mess around with it here and

you can uh kind of test out your

different searches. So for example, we

could search all of these different

engines here. We can choose the keyword

that we want to search. And then there's

a bunch of other information that we can

add. So we can search a specific Google.

So dot, you know, France.AE

dot whatever, right? And then we can add

all these other settings like do we want

to look on desktop or mobile? Do we want

to add specific headers? Do we want to

actually get a page nated response? Do

we want geoloccation? There's all these

different parameters that we can add.

And there's some examples that you can

view here as well on exactly how to do

this. In our case, it's going to be

pretty simple. We're just going to

search Google. Um, and that's kind of

it. So, what we'll do from now is we'll

go back to main.py. And I'm actually

going to make a new file where I'm going

to call this webcore operations. And

inside of this file, which will be a

Python file, I'm going to start

implementing all of the operations

related to the web scraping and to using

the bright data service. So, inside of

here, we're going to start with just our

basic search. So, we're going to be

searching Google. But in order to do

that, I'm going to set up some reusable

functions that will make our life a

little bit easier in the future because

we'll be sending quite a few requests

over here to the bright data SER API. So

what I'm going to do is I'm going to say

from enenv import load.env again because

we need to import our environment

variables. I'm going to say import OS

import request

and I'm going to say from

URL lib.parse parse import the quote

underscore plus which is going to allow

us to turn a normal string into a string

that we could include in a query

parameter in our URL which you will see

in a minute. For now, I'm going to call

load.enb to load theenv function. And

I'm going to make a simple function here

called make API request which will just

be a reusable function that we can use

anytime we want to send a request to

bright data so we automatically can

include the correct headers for our

authentication. So I'm going to say

define_make

API

request like that. I'm going to take in

the URL and starst star quarks like

that. Here I'm going to say API_key is

equal to os.get env. And we're going to

get the bright data API key like that.

Okay. So we're going to get the API key

and we're going to create a set of

headers because we need to send these

headers to tell bright data who we are.

So we're going to say authorization is

equal to fstring and then bearer space

and then our API key and then we're

going to say the content dash type is

going to be the application JSON because

that's what we want to get back. Okay.

From here we're going to do a simple try

accept block where we send a request to

whatever the URL is that was provided

here. So we're going to say try and this

is going to be response is equal to

request.post.

We're going to post to the URL with

headers equal to headers like that and

pass our star star quarks. We're then

going to say response.rafor

status. What this means is we're going

to raise an exception if we don't get an

okay status. And then otherwise we're

going to return the response.json.

Okay. Then we're going to say accept and

this is going to be request.exceptions

exceptions dot and this is going to be

the request exception as E and we're

going to say print like this an F string

and we're going to say API

request failed and then we'll put E

inside of parenthesis or inside of

braces and then we can return none

because this didn't give us a response.

Then we can have another exception. So

we can say except any general exception

as E. We can say print f unknown error

and then we can print out e and again

return none. I'm just doing some more

advanced exception handling here just so

that if it's related to the network

request we can handle that. If it's not

related to the network request then we

deal with it here. So we kind of know

which exception or what error was

actually causing the problem. Okay. So

now we have a general function that can

send a request to brightite data. What

we need to do next is implement our SER

function. Okay. So we're going to say

define and this is going to be SER

search. And what this is going to do is

take in a query and an engine which by

default right now is going to be equal

to Google. Now I'm going to write this

kind of dynamically because this will

allow us to actually search any search

engine that bright data supports. So

something like Bing, Google. So we can

reuse this function multiple times. All

right. What I'm going to do is I'm going

to say if the engine

is equal to Google, then I'm going to

say the base URL is equal to

https/google.com/arch.

And this needs to be ww.google.com/arch.

We just add a new line here. We're going

to say l if the engine is equal to bin

or not bin, bing. Then the base URL is

https www.bing.com/ bing.com/arch

and then else we're going to raise an

error and we're going to tell them hey

this engine is not supported. So we're

going to say raise value error and then

unknown engine and then whatever engine

they passed. Okay. Now we're going to

say the URL is equal to https slash and

this is going to be api.brightdata.com/

request. Okay, because this is where

we're going to send the request and

we're going to pass essentially the

search URL that we want to search and

get the data back from. Now, we're going

to say our payload is equal to zone and

the zone is going to be the name of the

zone that we created, which is AI agent

2. So, if we go back here and we look at

our overview, we should be able to see

the zone name. You can see it's right

here, AI agent 2. Okay, so that's our uh

zone name and you can kind of see the

information down here as well. So,

anyways, we need to pass the zone. We

also need to pass the URL. So the URL is

going to be the following. We're going

to put an F string. We're going to put

our base URL, which is the search engine

that we want to search essentially. And

then we're going to say question mark Q

is equal to and we're going to say quote

unl.

Now what this is going to do is it's

going to take whatever the user typed

in. That's going to be our search

string. It's going to turn it into a

format that we can actually pass

correctly in a query parameter for this

URL. And then we're going to put an and

we're going to say BRD_JSON

equals 1. Now what this means is bright

data JSON enabled. So essentially we

want to get our responses back in JSON

format. So Bright Data is actually able

to parse all of the responses from the

search engines and then return it to us

in a digestible format. In this case,

JSON. And then we're going to say the

format is raw like that. Okay. So this

is the payload which is essentially how

we send this search request. This will

hit the search engine API and then it

will give us back a response. So what

we're going to do here is we're going to

say the full underscore response is

equal to underscore make API request.

We're going to pass our URL right which

is the URL right here. So that's where

we're sending the request to. And then

we're going to say actually the JSON is

equal to payload which will be another

query parameter that we pass there along

with our request. We're going to say if

not

full response then we're going to return

none. Otherwise what we're going to do

is extract data out of this response. So

the bright data response is going to

give us a ton of information. It's going

to give us the sponsored post, the

organic post. It's gonna give us a bunch

of stuff, but we only care about a few

sections of that response. Now, if you

want to look at this response, you can

just mess with it right here. You can go

to the playground, right? And we can

kind of run this request and see the

result that we get. But what I want to

do is I just want to pull out a few

pieces of information. So, right, it's

giving us the full kind of preview of

the page. We can actually look at the

JSON format and you'll see it has like

general, input, navigation, it has all

of these other fields, right? It's a

very long response that we get, but I

only care about part of the response.

Now, the part of the response that I

care about are to the organic results

and the knowledge that Google pulls

here. The knowledge is like a quick

summary of the uh information that you

search for that you've probably seen

before if you've done, you know, a

Google search. So, what I'm going to say

is extracted data is equal to this and

I'm going to say knowledge

is equal to the full response.get

get and I'm going to get the knowledge

field. Now, if that doesn't exist, I'm

just going to get an empty set of braces

or an empty dictionary. Then I'm going

to get organic and this is going to be

the full response. And then same thing,

organic except here, this is actually

going to be a list. Now, the reason I

know this is because before the video, I

was obviously preparing. I looked

through the response structure and these

are the two fields that I care about. If

you want to see the entire response

structure, then feel free just to print

it out and you can see all the

information that it gives you. Here,

we're just narrowing down the data. that

we only get the important stuff. But if

you wanted all of the data or something

else for a different use case, then of

course you can get that from this API

request as well. And what's interesting

here is that you can run this as many

times as you want. It's very scalable.

So you can run this, you know, hundreds

of times, thousands of times with

different requests. And it returns very,

very quickly because it's already

indexed by bright data. So I'm going to

return the extracted data here from this

function. And then that should be it for

this first function where we're

essentially just calling this kind of

search function. All right, so that's

pretty much it, at least for right now.

What I'm going to do is I'm going to go

back to main. I'm going to import what

we just wrote and then I'm going to call

that from one of our functions so we can

test it out and make sure it works. So

I'm going to say from web operations

import and we're just going to import

the SER search. So now where we go to

Google search the Google results are

actually going to be SER search and then

the user question and we're just going

to say the engine is equal to Google and

then that should give us back the Google

results. Okay. So for now what we can do

is we can just print the Google results

so we can see if we're actually getting

anything at all. And while we're at it

we might as well just do the same thing

for Bing because it's going to be the

same thing just with a different engine.

So let's go SER search. We'll go user

question engine is equal to Bing and

then same thing we can print the Bing

results and we can do an initial test

here to see if this is working. Okay, so

let's run the file and let's go invest

in video. Okay, and it says searching

Bing, searching Google. Wait a second

and then we should get the results. And

you can see no knowledge popped up for

this one. That's okay. And then for

organic there's a bunch of links, right?

So it gives us kind of all of this data

popping up related to those results and

you get descriptions. So I start

thinking of Nvidia stock etc etc. Now of

course there's a lot more stuff that we

can extract from here but for now that

is good and it gives us kind of the top

results on Google and we can read

through the descriptions the links etc.

Okay, so let's exit out of that and

let's continue because now we have the

SER API functioning. And I'm just going

to remove the print statements because

now that we know we're getting the

correct results, we don't really need

anything more. All right, so we have

Google search, we have Bing search. Now

what we want to do is we want to

implement the Reddit search. So for the

Reddit search, it's a little bit

different and that's going to require us

going to web scrapers. So from here,

we're going to go to new. We're going to

go to browse scraper marketplace and

then we're going to search for Reddit.

Okay, so it's going to take a second.

We're going to press on the result for

scrapers and then this is going to give

us a few options that we can use to

actually scrape Reddit and get the real

lifetime data. Now you can build your

own scraper if you want, but a lot of

them that you need are already built for

you and you can just call them like an

API which makes it very easy to actually

download and get the data quite quickly.

And in our case, there's two main things

that we want to use here from Reddit. We

want to discover posts by their keywords

because we want to search essentially on

Reddit. And then after we search, we

want to get all of the relevant post

URLs and download all of the comments.

So you can see we have Reddit's comments

collect by URL. Then we have Reddit post

discover by keyword. So we're going to

use both of them. First, we collect the

post. Then we get the comments from the

post that we care about. So let's go to

Reddit post here. It's going to say

scraper API. So I'm going to go ahead

and press on next. And it's going to

create this scraper for us. Now here we

have collect by URL, collect by keyword,

collect by subreddit URL, right? Like

and we can get the comments as well. And

we can kind of run through this and see

how we use this scraper, all of the

fields that we can pass to it and all of

the fields that we'll get back. So it

shows us what the response structure

looks like. And then if you go to the

API request builder, it shows us how to

build this API. Now our API key will be

the same. The thing that's going to

change is the URL that we need to hit

here. And this is where I want to go in

and talk about our management API. Okay,

so when we use this um scraper,

essentially what's going to happen is

we're going to send a request and then

bright data is going to go using its

scalable network and start collecting

all of that data for us. Now, it's not

going to be available instantly because

it needs to actually access it in real

time from Reddit. So what's going to

happen is we're going to create

something called a snapshot. Now the

snapshot is going to be generating. So

when we first send a request, it's going

to take a second. It's going to start

generating. So what we need to do is we

need to monitor the progress of this

particular snapshot and wait until it's

ready. Now, as soon as the snapshot is

ready, we can then download the snapshot

and we can access the data, but we need

to wait for it to be finished. So

essentially, there's these multiple API

endpoints that we're going to hit. The

first endpoint is going to be to

actually start the collection process.

Then we're going to use this monitor

endpoint to wait for when the snapshot

is ready. And then as soon as it's

ready, we're going to download the

snapshot. So we need to kind of write

some code in Python here that is going

to allow us to do this process where we

hit the API, we wait for it to be ready,

and then we download the snapshot. Okay,

so I'm going to start writing this out.

I've written it based on myself reading

the documentation here, and you'll kind

of see how it works as we code this out.

Again, we just need to make this scraper

to start, and then we need to get access

to the data set ID, which I'm going to

show you in one second. So in the left

hand side here, we're going to go to

discover by keyword. We're going to go

to the management APIs and we're going

to scroll down here until we see this

data set ID. Okay, so we need to copy

this data set ID because this is

something that we're going to need to

use when we actually perform this

scraping operation so we can identify

what data set we're talking about. All

right, so we have that data set ID. What

I'm going to do for now is I'm just

going to put it in a comment in my web

operations file so I don't forget it. So

I'm going to say data set ID is equal to

that. And now I'm going to start writing

the Reddit search function. Okay, so I'm

going to say define Reddit search. This

is going to take in the following. It's

going to take in the keyword that we

want to search for. It's going to take

in the date, which in this case I'm

going to say is all time. It's going to

take in the sort by. I'm going to sort

by hot posts, but you could sort by

rating or um you know up votes or

whatever you want. So let's go hot. And

then we're going to say the numbum of

posts. And in this case, I'm going to do

75. You can do as many as you want.

Okay, so let's zoom out a little bit so

you guys can read this and let's

continue. So for Reddit search, the

first thing we're going to do is define

our trigger URL, which is going to be

the bright data API. So we're going to

say

https/api.brightdata.com/datasv3/trigger.

Okay, then I'm going to say my params

are equal to and I'm going to put my

data set ID and this is going to be

underscore ID and this is going to be

equal to the ID that we had up here. So

I'm just going to copy it and paste it

inside of here. Okay, then we're going

to have include errors and this is going

to be true inside of quotation marks.

I'm going to have type and this is going

to be discover

new. Again, I'm getting all this from

the Bright Data documentation and I'm

going to say discover_by

and then keyword. Okay, so this

indicates what type of kind of search

we're doing essentially. Next, I need to

indicate the data that we're searching

for. So, I'm going to say data is equal

to and this is going to be a list. And

then inside of here, I'm going to put

all the keywords that I want to search.

Now, you'll notice that because this is

a list, I can actually put multiple sets

of keywords at once and Bright Data will

go and asynchronously scrape all of them

for us. So, what that means essentially

is that if you wanted to do a 100

different search strings or a thousand

different search strings, you can do

that in one API request rather than

having to send multiple of them because

this is set up to obviously scale. So,

we're going to say keyword is equal to

keyword.

We're going to say date and then this is

going to be equal to the date. We're

going to say sort by

okay and this is going to be sort by and

then we're going to say the num of posts

is equal to the number of posts and then

again you could write this multiple

times for multiple search strings. So

this is going to start setting it up.

We're now going to say raw data is equal

to and I'm going to call a function here

that I haven't yet defined. So right now

we're going to say none. We're going to

say if not raw data then we're going to

return

none. Otherwise, we're going to go and

we're going to parse this raw data. So,

I'm going to say to-do parse raw data

and then we are going to return the

information right here, which we'll

write in 1 second. Okay. So, essentially

this is how we're going to start uh set

up the trigger. But essentially, what we

need to do is write a function that will

allow us to download the snapshot, which

is how we're going to get the data,

which is what I'm going to write now.

So, I'm going to put another function

here. Here I'm going to say define

underscore trigger underscore and

underscore download

snapshot like this. I'm going to take in

the trigger URL. I take in the params

the data and the operation name which in

this case I'm just going to call

operation. Okay. Now here what I'm going

to do is I'm going to make an API

request to brightite data. I'm then

going to get the snapshot information

and I'm going to pull that snapshot

until it's ready and then download it.

This is going to be a little bit of

code, so just bear with me here. I'm

gonna say trigger result is equal to

underscoreake API request. So the

function that we wrote before, we're

going to pass our trigger URL, our

params, which is equal to the

parameters, and our JSON, which is equal

to the data. And then what we're going

to do is say if not trigger result then

return none because of course if it

didn't give us anything then we can't

pull it. Otherwise we're going to say

the snapshot

and this is going to be underscore ID is

equal to the trigger result.get

snapshot ID or snapshot ID sorry. We're

going to say if not snapshot ID then

same thing return none because we don't

have any snapshot to retrieve.

Otherwise, what we're going to do here,

I'm going to write a to-do, is pull the

snapshot.

Okay, so that's what we need to do. Now,

we need to write some more functions uh

to essentially pull the snapshot and

download the snapshot. Now, to make this

a little bit cleaner, I'm going to make

a new file here. For this new file, I'm

going to call this the

snapshot_operations.

py. And I'm just going to copy in this

file just to save us a little bit of

time because it's pretty kind of

redundant code and it's not super

valuable for you to write all of this

manually. So what I'm going to do is

paste it in. It's about 70 lines. I'm

going to walk through exactly what it's

doing, but you can just simply download

this code by going to the link in the

description for the GitHub repository,

finding this file, just copying it, and

pasting it in here. Anyways, let me walk

through what we're doing here. So you

can see we're importing OS time requests

and typing. What I'm doing is I'm

pulling the snapshot status. So, I'm

getting my Bright Data API key. I'm

setting up the progress URL. I'm setting

up my headers. And what I'm saying here

is, okay, I want to keep sending

requests to this endpoint until

eventually it tells me that it's ready.

So, I'm going to do this a maximum of 60

times. I'm going to delay by 5 seconds

in between each of those so that this

takes me a maximum of 5 minutes. I'm

going to say, okay, checking snapshot

progress. This is the attempt. We're

going to get the response from this URL.

We're going to check the status. If it's

ready, we return true. If it's failed,

we return false. If it's still running,

then we just add a time delay. And we

keep doing this. Okay? And we keep going

and we keep going and we keep going

until eventually it fails or it's ready.

Now, we have another function called

download snapshot. And we only call this

function once the snapshot is ready. So,

same thing. We set up our API key and

our download URL. And then we simply

send a request where it downloads a

snapshot and then returns the data to

us. Okay. Okay, so that's all that I put

inside of this function. So now from web

operations, we're going to import those

functions. So we're going to say from

and this going to be snapshot operations

import and then we are going to import

the download snapshot and the pull

snapshot status. Okay, so now let's go

to our to-dos. So we have a to-do here

where we need to pull the snapshot. So

for pulling the snapshot, we're going to

do the following. We are going to say if

not pull snapshot status and the

snapshot ID then we are going to return

none. What this is going to do is it's

going to continually pull the snapshot

until it eventually gets a result of

true or false. True means we can

download it. False means there's an

error in which case we return none. So

we're now going to say the raw data is

equal to the download snapshot and we're

going to download the snapshot with this

snapshot ID which will contain our

scraped data and then we can return raw

data. Okay, so this function trigger and

download snapshot is going to well do

that. Okay, so now we can go to Reddit

search. Let me just add a new line here.

From this we can get the raw data now.

So we can say the raw data is equal to

underscore trigger and download

snapshot. We are going to pass the

trigger URL. We're going to pass the

params our data and an operation name

which I'll just call Reddit in case we

want to do some logging later on. Same

thing if there's no raw data we'll

return none. Otherwise we're going to

parse this data. So I'm going to say

parsed data is equal to an empty list.

that what I want to do is I want to take

all of the data that was returned to us

and I just want to get the information

from this data that I care about. That's

because I don't want to pass all this

unnecessary data to my LLM when I start

checking which post we actually want to

download or want to get the information

from. So I just want to get for example

the description of the post and the

title of the post or the title of the

post and the URL of the post just the

data that I actually need. Okay. So I'm

going to say parse data is equal to a

list. I'm going to say for post in raw

data. Then I'm going to say the parsed

post is equal to and I'm going to say my

title is equal to the post.get

and then title and I'm going to say my

URL is equal to the post.get and then

URL. Now each post in my raw data is

going to have a ton of information,

right? It's going to have the number of

likes, number of upvotes, the number of

comments. It's going to have a

description. and it's going to have the

date was posted. It's going to have the

author. I don't care about all that

information. So, I'm just parsing

through it, getting the information I do

care about. And then I'm going to say

parse data.append. And I'm going to add

this post to that data. Then here I can

return my parsed

data like that. Okay. So, this function

now should actually work where if we do

Reddit search, it should essentially

trigger this scrape operation to start

happening. So, Bright Data will go to

Reddit, it will do the search, and it

will start collecting all of the

relevant posts. Then, we're going to

pull that snapshot because it takes a

second to run. As soon as the snapshot

is ready, we're going to download the

snapshot. We're going to parse through

the results and then we're going to

return that parsed data. Now, the next

step after this would then be to get the

URLs from this parsed data that we want

to explore further and then to download

all of their comments. So, we're going

to do another operation in a second here

that's going to get all of the comments

from a list of posts. But for now, let's

test this one out by going back to

main.py and actually calling this

function now from our Reddit search.

Okay, so we're going to go to Reddit

search now. And we're just going to

change this to call the Reddit search

function. So what did we call this?

Actually, we called this Reddit search.

And actually, let's call this Reddit

search API. Uh because if we name it the

same thing as our function here, that's

going to be an error. So we're going to

say Reddit search API like that. And

then we're going to import this. So,

let's go up here and let's import the

Reddit search API.

Okay, cool. Come back here and same

thing. We'll just pass the user's

question. And then what we can do is we

can print out the Reddit results. Okay,

so now we've tested that function. So,

let's run this and see if it works. And

we're going to say, should I buy AMD

stock? Okay, and it says it's starting

the search. And it gave us a bad request

for this URL. Also, I probably just

typed something incorrectly and I will

check what that problem is. Okay, I was

just checking here in kind of a silly

mistake, but I accidentally had a

capital T when I typed all time here and

this needs to be a lowercase T. Uh, that

should fix the problem for us now in

this function. So, if we come here, we

can run this and we can say, you know,

invest in AMD and then we should be

good. And you can see it starts checking

the snapshot progress. Okay. Now, while

it does that, we're just going to make a

small change to the code as well because

the way that I'm returning this parsed

data uh from here is not actually how I

want to return it. What I want to return

instead is a format that makes a little

bit more sense. So, I'm going to say

return. So, I'm going to put a set of

braces and I'm going to say parsed posts

and this is going to be equal to the

parsed data. And then I'm going to say

the total found.

Okay, total found is equal to the len of

the parsed data. Okay, cool. So that's

that. And if we go here, it looks like

it finished running. And you can see

that we get some posts and some titles

from Reddit. Now, these don't seem to

make a ton of sense to me. So I'm just

going to quit this and try again because

I think I may have messed something up

in the search string here specifically

because I think I spelled invest

incorrectly. But let's just search

Nvidia here. And let's see if we get

some posts that make a little bit more

sense here from Reddit. Okay. And there

we go. So, these make a lot more sense,

right? Nvidia is actually in the post

title because I didn't spell it

incorrectly this time. And there's 75

posts as we go through here. And the

next step is going to be to narrow those

down so we can grab all of the comments

that we need from them. Okay, so that is

working. And we've got this first

function where we're doing the Reddit

search. Next, we want to get all of the

Reddit posts. So, what I'm going to do

is write another function here. And then

this will wrap up all of the search

operations, and we'll go back into

Langraph and start doing some of the

kind of prompting with the LLM. I just

like to get the data first. Then once we

have the data, we can pass it to the

LLM, and we can kind of analyze it. So,

here we're going to go Reddit_post

retrieval.

Okay? And we should spell retrieval

correctly. What we're going to do is

we're going to take a list of URLs.

We're going to say days back. So, this

is the number of comments that we want

to get or how many days back we want to

get the comments from. We're going to

say load all replies. For now, this is

going to be equal to false. But if you

wanted to get all of the nested replies,

then you could go with true. And we're

going to say comment limit. And for now,

we'll just make this an empty string.

And then later, we can add a limit if

we're getting too many results. And what

we're going to do is we're going to say

if not URLs, then we're just going to

return none because if you don't pass me

any URLs, well then there's no reason to

do this search. And then what we need to

do is set up a similar thing to before.

So I'm going to copy this trigger URL.

And I'm going to put this here and we're

going to say the trigger URL is equal to

the following. Then we're going to say

the params are equal to and we're going

to say the data set ID. The data set ID

here is going to be different. I'll show

you where to get that from in a second.

And we're going to say include

underscore errors

is true. Again, we're then going to

create our data. So, we're going to say

data is equal to same thing. You could

run this at scale if you want. And we're

going to say URL is equal to URL. We're

going to say the days back is days back.

We're going to say load all replies.

Load all dollar replies. And we're going

to say comment limit is equal to comment

limit. And then this is going to be for

URL in URLs. So essentially we're

creating one of these entries for every

single URL, passing that all inside of

here, and then we'll get all of the

comments for all of these URLs with

these parameters. Okay, so now we need

to find the data set ID. So what we're

going to do is go back to write data

here. We're going to go to where it says

collect by URL. We're going to go to the

management API. And then if you scroll

here, you'll see this new data set ID,

which is the one that we're going to

copy. We'll come back here and we'll

paste that updated data set. So this is

the one for getting the comments. All

right. Now, we're effectively going to

do the same thing that we did before. So

here we're going to say the raw data is

equal to and it's going to be underscore

trigger and download snapshot. We're

going to pass the trigger URL. We're

going to pass our params data and the

operation name is going to be Reddit

comments. Okay, we're going to say if

not raw data, then we're going to return

none. And then if we do have raw data,

we are going to parse the comments. So

we're going to say parsed comments

is equal to an empty list. We're going

to say for comment in the raw data.

We're going to say the parsed comment is

equal to and then we're going to start

writing the comments. We're going to say

the comment ID is equal to comment.get.

And then this is going to be the comment

id. We're going to say the content is

equal to the comment.get

and this is going to be the content.

We're going to say the date and this is

going to be the comment.get and then of

course the date. We're going to say the

parent comment ID because this will be

important for the linkage. So parent

comment ID is equal to comment.get get

and then this is the parent comment ID.

Okay. And then lastly, we're going to

say the postc_title

and this is going to be the comment.get

post title. And make sure we don't

forget to put that inside of quotes.

Okay. Then we're going to say the parsed

comments.append the parsed comment. And

then lastly from here we're going to say

return and we're going to return the

comments which is the parsed comments.

and we're going to say the total

underscore

retrieved.

Okay. And this is going to be equal to

the len of the parsed comments. Okay. So

that's it for getting the comments. Now

again, it's literally the exact same

thing as getting the post except we're

changing the data set ID and a few of

the different parameters. That's it. So

now we have the ability to get all of

the different uh comments for a

particular post. Now, before we can test

this, we need to know what posts we want

to get the comments for. So, what we're

going to do is close out of this. We're

going to close out a snapshot. We're

going to close out av. And now, we're

just going to be working inside of this

main file. So, the Reddit search works,

the Bing search works, and the Google

search works. Now, the next step is to

analyze the Reddit search, pull out the

relevant URLs, right? So, that's what

this is doing right here. And then to

retrieve those comments or really the

post data from those particular posts.

So let's move on and let's handle that.

All right, so let's move on to the next

step here where we are going to analyze

the Reddit post and then pull out the

ones that are relevant. Now, in order to

do that, we're going to need some LLM

operations here. And I'm going to make a

new file and I'm going to call this

prompts. py. Now, similarly to before,

I'm not going to write all of this out

from scratch because it is a good amount

of code and it's not super valuable to

do that. But what I'm going to do is

paste in all of the prompts that I've

already written that we're going to use

for this video. Again, you can get these

from the link in the description. Just

go to the GitHub repository and download

them. So, I'm going to paste it in. It's

going to look like a lot of code, but

really most of it is just prompts that

I've already written that I've tested

that work well here. Now, you see that I

have this class called prompt template.

I have a few static methods inside of

here where when you call this function,

essentially just returns to you the

prompt. So, for example, the Reddit URL

analysis system. That's the one we're

about to use. You're an expert at

analyzing social media content. Your

task is to examine Reddit search results

and identify the most relevant post that

would provide valuable additional

information. You get the idea. Okay. And

then we tell it do the following. You

know, find this information blah blah

blah. And then return a structured

response with the selected URL. Sorry.

Reddit URL analysis user user prompt for

analyzing Reddit URLs. So same thing

user question that we pass the user

question here. Pass the Reddit results

analyze these Reddit results. Same thing

for the Google Analysis. Okay. Pull this

in. Google analysis user. same thing the

user prompt. So we have the system

prompt and the user prompt and all these

functions or methods that contain the

prompts and allow us to pass some

variables and have it kind of embedded

inside of here. So don't worry too much

about this uh but there's a few

functions that of course we are going to

use from this file. So now we're going

to go to main.py and we are going to

import them. So we're going to say from

prompts

and we're essentially just going to

import all of the functions that we have

written there. So what this is going to

be is get the underscore Reddit analysis

messages, get the underscore Google

analysis messages, get the underscore

Bing analysis messages, get the Reddit

analysis messages, and get the synthesis

message. Okay? And then we can format

this. So we can just put set of

parenthesis here. Okay? So like that.

And then we can move this down to the

next line and kind of put all of them

like this. So they're all getting

imported from the same place. Okay. So I

think that is good. We have the prompts.

Now what we're going to do is go over to

analyze Reddit posts and we're going to

start using some of these prompts when

we call the LLM. Okay. So first things

first, we're going to get the user

question. So we're going to say user

question is state.get user question and

we're going to get the Reddit results.

So, we're going to say the Reddit

results is equal to state.get

and then you guessed it, this is going

to be the Reddit results or an empty

string. Okay, so we're going to say if

not Reddit results. So, for some reason

we don't have any, which can happen,

then we're going to say return and then

we're just going to return the selected

URLs equal to an empty list because we

won't have any to select. Next, we're

going to say structured_Lm

is equal to llmwith_structured

output. And what I'm going to pass here

is something called a pidantic model,

which will force the LLM to give me an

output in a particular format. So, we're

going to write that now, and you're

going to see how useful this actually

is. So, I'm going to make a class, and

this is going to be the Reddit analysis.

So we're going to say reddit URL

analysis and this is going to inherit

from the base model which we imported

here from pi dantic. Now what we're able

to do is define a python class and then

pass that to an LLM and tell the llm it

needs to give us an output that's in

this particular format. This allows us

to ensure that we always get something

in the same format. And in this case the

format that we want is just a list of

URLs. So what we can say is selected

urls. We can say this is a list of type

string and we can make this equal to a

field. This comes from pyantic and we

can say the description is equal to the

following. And then I'm just going to

paste in the description. So let me copy

it from my other code file here. But

essentially we just describe what we

want the model to populate this field

with. So I've said this is a list of

Reddit URLs that contain valuable

information for answering the user's

question. So now what will happen is

when I initialize the LLM I can give it

this model and I can say hey you need to

give me an output that's always in this

format and then every time we run the

LLM we're going to get selected URLs

it'll be a list and it will contain the

URLs that we need right that are

strings. So if we come here now we can

just pass this which is the Reddit URL

analysis and that's it. We've created

this structured output model and again

it's just very very useful at giving us

uh content in the correct format. We're

then going to say messages is equal to

get Reddit analysis messages. We're

going to pass the user question and

we're going to pass the Reddit results.

Okay, so these are the messages now that

we're going to pass to the LLM. And now

what we're going to do is we're going to

invoke the LLM and we're going to

attempt to get this kind of structured

output. So, we're going to say try and

we're going to set analysis is equal to

the structured if we could spell

structured correctly. Not sure why it's

not giving me the autocomplete. Okay.

Dot invoke. And then we're going to pass

these list of messages. Now, the

messages if we go and look at prompts

here is essentially just two messages,

right? So, we have like the system

prompt and then we have the user prompt.

So, that's all that we're passing. And

if we look at the Reddit one, so it's

right here. We get the Reddit analysis

system prompt from the prompt template

and then we get the user prompt. So we

just pass that here. Those are the two

messages. And then what we're going to

do is say the selected URLs is equal to

analysis. Okay. Dot selected URLs

because it's going to give us a Python

object. Let's fix the spelling. Okay. So

now what we can do is we can print out

the URLs just to make sure that it's

actually correct. We're getting the

proper URLs. So we can say for i,

url in enumerate and we can enumerate

over the selected urls. From here what

we can do is we can say print and then

we can put an f. We can put maybe a few

spaces here and we can say i and then we

can say dot url inside of braces. And if

we want I can just go with one here so

that we start the index at one. Okay.

Okay, so now this is just going to print

out the selected URLs. Before that, we

can also just say selected URLs just so

we have some logs and then we'll be able

to see what those are. Now down here, we

need the accept. So we're going to say

accept exception as E. We're going to

say print and we'll just print out E.

And then we'll say selected

URLs is equal to an empty list. And then

when we return the selected URLs, we'll

just return the selected URLs. Okay, so

that's all that we're doing. We

essentially said all right we're going

to create this structured output LLM.

What we do is we tell it that we need

something in this format which we

defined above. We generate the messages

that we need and then we pass that to

the LLM. So we invoke the LLM. We get

the selected URLs from the response. We

print that out. There's some error then

we print E and we say there's no

selected URLs and then we keep going

from there. Okay. So now this should

actually just work. Um what we can do is

we can test this and we can see if it

selects some URLs for us. So let's go to

run. Let's run this and let's say

Nvidia. Okay, it's going to start

searching for this and let's wait for

the snapshot and then see what URLs end

up getting selected. Okay, so we just

got an error here essentially saying

that we forgot to pass one of the

parameters to our functions get Reddit

analysis messages. So if we go here, you

can see that we have to pass the user

question, the Reddit results, and the

Reddit post data. So I think we probably

are calling the wrong function uh

because that's not the one we want. We

want the get Reddit URL analysis

messages which just takes in two

parameters and then we have another one

later. Yes. So this one here that takes

in four. So are these called the same

thing? No, they should not be get ready

URL. Yeah. So we just misnamed this uh

function call essentially. So what we'll

do is we will rename this to be get

ready URL analysis messages. That should

be fixed. And then we're going to rerun

this and same thing. Let's go invest

Nvidia and see what pops up. Okay, cool.

So, that just finished and you can see

that it actually selected four URLs here

and these all seem to be relevant in

terms of investing. We're investing in

video. Why are you investing in Nvidia?

Nvidia is rising today. DCA, Nvidia,

Tesla, I don't know what that is. You

get the idea. So, let's exit out of

that. That stage is completed. And the

next thing that we need to do now is we

need to actually retrieve all of the

comments and then get those comments and

again continue to pass those to the LLM.

So we already have the function to do

that, but we need to now write it inside

of main.py. So we're going to go to

retrieve Reddit posts. And what we're

going to do here is just do a simple

print statement and we're going to say

getting

Reddit post comments. Okay, like that.

And then we can say the selected urls is

equal to states.get selected reddit

urls. And then we can continue from

here. So we're going to say if not

selected urls again it's possible then

we're going to return and this is going

to say reddit_post

data. This will be equal to an empty

list. And then what we'll do down here

is we will start to collect that data.

So we're going to say print and we'll do

an fstring. We're going to say

processing and then we'll say len of

selected urls and we can say reddit

urls like that and then we can say here

the reddit post data is equal to

reddit_post

retrieval which we need to import. So

let's go import that from the top of our

program. So we can import that here.

Reddit post retrieval, the function that

we wrote. Scroll back down. Okay, so

Reddit post retrieval. From here, we're

just going to pass the selected URLs.

And that should be pretty much all that

we need to do. Now, down here, we're

going to say if Reddit post data, then

we can say print. Now, successfully

got and then we can say something like

this. Let's do an fstring

successfully got len of Reddit post data

posts.

Okay, let's fix the spelling here.

All right, so we successfully got those

posts and then otherwise we're going to

say else print failed to get post data

and we can return or we can say sorry

Reddit post data is equal to an empty

list and then here we will go and say

return the Reddit post data. Okay, so

that should retrieve the Reddit post

data for us. We're saying, okay, get the

Reddit post, get the selected URLs, make

sure we have some, obviously. Uh, if we

do, then we call this function, which

should go and grab all of the comment

data from that. And then if we want, we

can, of course, print this out. So now

we can print the Reddit post data and

make sure that's working before we move

into the analysis and kind of synthesize

step, which will be pretty

straightforward. So let's make this

bigger. Let's run this again. Let's go

invest in Tesla. I'm sure that's going

to be all over for Reddit. So, let's run

that and let's see what we get. Okay, so

I was just doing a little bit of

debugging here because the results I was

getting from the comments weren't great

and I realized this because I made a

small mistake in the way that we are

parsing this. So, if we go back into web

operations and we go to where we're

parsing the comments, we need to change

some of these fields because they're not

actually correct based on the response

that we're getting here from the API. So

the major change is that where we have

content, we're going to change this to

say comment. So we're getting the

comment because that's actually where

the comment is stored. Same thing for

the date. We're going to change this to

date posted. And then I think we can

just remove the parent comment ID

because that doesn't seem to actually be

working. And for the post title, I

believe that we don't need that either

because again, it wasn't populating

quite a bit. So, let's remove that and

let's just stick with the comment, the

content, and the date. Again, mostly

just changing this to say comment. And

then I'm going to run this again and

give it another test uh to make sure

we're getting the right data. Okay. And

there we go. It just loaded a bunch of

comments for me. And you can see now the

data is actually filling in. If we go

here, there is a lot of data that we

pulled because we pulled a bunch of

comments from a bunch of different

posts. All right. So, that is it for

that phase. So now we've got in the post

and we've got the comments from the

post. The next step is really to

synthesize all of this data together,

which is going to be pretty

straightforward. We just need to write

these four functions. So let's get

started here with our analyze Google

results. And then we can just go through

the rest of them. Again, it's pretty

much going to be copy and paste uh but

just changing a few things and changing

kind of the prompt that we're using. So

we're going to do a print statement

here. We're going to say print analyzing

Google search results like that. We're

going to get the user's question. So,

we're going to say user question

state.get user question. We're going to

say Google results is equal to state.get

and then get the Google results. Okay.

Then from here, we're going to say

messages is equal to get Google analysis

messages. And then we're going to pass

the user question and the Google

results. We're then going to say the

reply is equal to llm.invoke.

and we're just going to invoke the

messages and we're going to go here and

we're going to say the Google analysis

is equal to the reply content. All

right, so the llm is just the one that

we defined right at the beginning,

right? So if we go here, the chat model.

So we're just calling it raw without

doing anything else and essentially just

getting whatever response it has based

on our prompt. Again, you can go read

the prompt from in here, but essentially

we're just creating a prompt that says,

hey, you know, analyze these Google

results and give us the interesting uh

output. Okay, so let's copy the exact

same thing for Bing, except we're just

going to change everything to say Bing

essentially. So rather than the Google

results, this is going to be the Bing.

Change this to Bing. Same thing. This is

going to be Bing.

Okay. And then this needs to be Bing as

well. for the return. We can change the

print statement as well. Okay, cool. So,

that's pretty much it there. And rather

than get Google, this is going to be get

Bing. All right, so let's copy the same

thing and we're going to do it again.

This time for Reddit. Okay. Now, for

Reddit, it's actually going to look a

little bit different. So, we'll modify

this a bit more. So, let's paste this in

here. I'm going to say analyze Reddit

search results. Rather than just getting

the Bing results, we're going to say

Reddit results,

this is going to be Reddit results. But

then not just the Reddit results, we

also need to get the Reddit post data.

So we're going to say Reddit_post

data. It's equal to Reddit_ost

data. For the messages, this is going to

be get Reddit analysis messages. This

takes in three things. So the results,

the post data, and the user question. So

let's go to Reddit results and then

change this to say Reddit post data like

that. Okay. And then same thing. This is

just going to say Reddit like that.

Let's make sure everything else has

changed. So analyzing Reddit results.

Okay. Get the user question Reddit

results. Reddit post data. Invoke the

LLM. And then there we go. Okay. And

then the last thing that we need to do

is synthesize all of our analysis. This

is going to be quite a bit different. So

we'll just write this manually. We're

going to say print combine all results

together. We're going to say the user

question is state.get user question.

We're going to get the Google analysis

first. So state.get Google analysis.

We're going to say the Bing analysis is

the state.get Bing analysis. And then

the Reddit analysis is going to be the

same thing for the Reddit analysis.

We're going to say messages is equal to

get the synthesis messages. And we're

going to pass the user question, the

Google analysis, the Bing analysis, and

the Reddit analysis. We're then going to

say the reply is equal to llm.invoke

the messages. We're going to say the

final answer,

okay, is equal to the reply.content.

And then what we're going to do is we're

going to pass the final answer, which

will be the final answer. We also need

to pass messages because this is kind of

how langlow works. And we're going to

pass this where we say roll and this is

going to be assistant. Okay. And then

we're going to say content is the final

answer. All right. And let me zoom out a

little bit and kind of close this

sidebar so you guys can see what's going

on. Let's close the terminal as well. So

again, what we've done here is we said,

okay, we're going to get all the results

that we analyzed previously, right?

We're going to combine that into a

message, pass that to the LLM again, and

then it's going to synthesize all of

that together and return to us a final

answer and also just a final message. We

need this message again for the lang

flow kind of chain to operate properly.

So that's pretty much it. I mean, I know

that's a lot of code and we went through

a bunch of stuff in this video. Again,

all the code will be available from the

link in the description, but of course,

we need to test this and make sure it

works. So, let's bring this up and say,

"Tell me if Elon Musk

is a good person." Okay. And let's go

ahead and see what that tells us. Okay.

And after a minute here, we've gone

through this whole process and we get

this general response here telling us

what the sentiment is on if Elon Musk is

a good person or not. And if we scroll

over here, we can see all of the sources

where it was getting this information

from, you know, Reddit comments, etc.

Okay, so pretty cool. Now, obviously, we

can make this a lot better and we can

search more things and we don't just

have to have one search string and we

could actually have the LLM searching

multiple things and giving us a really

detailed response. I just wanted to show

you this to give you kind of the sense

of how you create this more complex

orchestration with an AI agent that's

pulling in a bunch of relevant data. In

our case, our lang graph is relatively

simple, right? We have the architecture

that I discussed before and we have

what, seven, eight nodes, something

along those lines. But if we added more

nodes, we added more LM interaction, we

allowed this to run a little bit longer,

we can get significantly better

responses. So I think with that said,

guys, that's going to wrap up this

video. If you made it to the end, give

yourself a pat on the back because this

is very complicated and was a long video

to go through. Again, all the code will

be available from the link in the

description. Massive thanks to Bright

Data for sponsoring this video and I

look forward to seeing you in another

one.

[Music]

How to Build an Advanced AI Agent with Search (LangGraph, Python, Bright Data & More)

Tech With Tim

85 days ago

1:33:01

RAG & Vector Search

Rank #7

Description

Get started with BrightData and get $20 in credits for free: https://brdta.com/twt_websearch Check out PyCharm, the Python IDE for data and web professionals: https://jb.gg/check-pycharm-now In this video, we're building an advanced AI agent in Python using LangGraph. Now, this isn't going to be a basic jump on. This is a multi-step, deep research agent that will pull live data from sources like Google, Bing, and Reddit. I'm going to be covering advanced Python concepts, complex architecture, and best practices for building agents that go far beyond just a single prompt response. DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim 🎞 Video Resources 🎞 Code in this video: https://github.com/techwithtim/Advanced-Langflow-Web-Agent Learn LangGraph: https://www.youtube.com/watch?v=1w5cCXlh7JQ&ab_channel=TechWithTim UV Tutorial: https://www.youtube.com/watch?v=6pttmsBSi8M&t=1s&ab_channel=TechWithTim ⏳ Timestamps ⏳ 00:00:00 | Overview 00:00:55 | Project Demo 00:03:34 | Understanding Web Search 00:06:04 | Understanding the Architecture 00:08:35 | Project Setup 00:11:36 | Langflow Structure 00:37:44 | BrightData Setup 00:51:12 | Web Operations/Scraping/Searching 01:08:49 | LLM Calls & Prompting Hashtags #LangGraph #Python #AIAgents

Video Details

Category

RAG & Vector Search

Featured Date

November 17, 2025

Quality Rank

#7

AI Recommended