How to Build Advanced AI Agents – Course for Beginners (LiveKit, Exa, LangChain) | DailyDevLists

Loading video player...

Full Transcript

9,195 words • EN

Welcome to this comprehensive course

where we will build three cutting edge

AI agents from scratch. First, you'll

create a sophisticated sales agent that

can engage in natural real-time

conversations with customers, leveraging

the power of LiveKit and Cartasia. Next,

you'll learn to build a powerful deep

research agent using Exa capable of

searching the web, analyzing multiple

sources, and delivering structured

insights in seconds. Finally, you'll

learn how to construct a user research

agent with Lang Chain that can automate

the entire research process from

generating user personas to conducting

simulated interviews and synthesizing

feedback. Each project is designed to be

hands-on, providing you with the

practical skills to develop your own

functional AI agents. Thanks to Cerebras

for providing a grant to make this

course possible. Hi, I'm Sarah Chang, a

growth engineer at Cerebrus, and I'm so

excited to welcome you to our three-part

hands-on workshop series designed to

help you build powerful AI applications.

Across these workshops, we'll build

everything you need to get started with

realworld AI use cases from voice agents

to deep research assistants to

multi-agent workflows. You'll get sample

code, practical exercises, and

open-source example repos so you can

follow along and build. We'll also dive

into today's most popular AI tools and

frameworks, showing you how to

incorporate fast, cutting edge inference

into your projects. Let's get into it. I

can't wait to see what you build.

Welcome everyone to build your own sales

agent. I'm Sarah Chang from Cerebras and

I'm so excited to be joined by Russ Daw,

CEO of LiveKit today.

Thanks Sarah. Today we're going to be

walking through how to build a voice

sales agent that can have natural

conversations with customers. Our sales

agent will pull product context from an

external source and can respond in real

time.

This isn't just a simple chatbot. We're

building a fullfeatured AI agent that

can speak, listen, and respond

intelligently using your company's sales

materials. By the end of this workshop,

you'll have your very own working sales

agent that can handle customer inquiries

just like a human would.

We have a complete code notebook for you

to follow along with, and you'll be able

to keep experimenting and building even

after today's workshop is over. Before

we get started, let's go over what

you'll get out of this workshop. Free

API credits for LiveKit, Cerebrus, and

Cartisia, a complete quick start guide

to build apps with our technologies, and

your very own functional sales agent

that you can customize. Before we dive

in, please make sure you have the

notebook open. If you haven't already,

go ahead and click on the collab link we

shared.

Let me start by showing you what we're

building toward. In the future, most

customer interaction will be AI powered,

but instead of just typing back and

forth with a chatbot, you're going to

have a real conversation using your

voice.

Voice agents are becoming the new

frontier because they're more natural,

more engaging, and frankly more human

than traditional chat bots. When someone

calls your business, they want to talk

to someone who understands them, not

through a phone tree. So, Russ, taking a

step back, what exactly are AI voice

agents? AI voice agents are stateful,

intelligent systems that can

simultaneously run inference while

constantly listening to you speak. They

can engage in real natural conversation.

They have four key capabilities. The

first is that they understand and

respond to spoken language. They don't

just spit out answers based on string

matching or keywords. They understand

the meaning behind what people are

saying. This means they can handle

complex tasks and questions. Someone

might ask, "I'm looking for a product

recommendation." The agent can look at

the user's purchase history, the shop's

current stock levels, and recommend

something they'd like. You might see

this referred to in some places as

multi-agent or workflows. Speech is the

fastest way to communicate your intent

to any system. When you can just say

what you want, there's no typing, no

clicking through menus, and no learning

curve. People have been speaking to one

another their whole lives. So a computer

system with the same interface can take

advantage of that familiarity. Lastly,

none of this is possible unless the

agent can keep track of the state of the

conversation. Communication is highly

contextual and your agents need to have

state so they can hold a coherent

conversation across time.

I see. And I imagine this makes them

perfect for things like customer

service, sales conversations like

qualifying leads and answering product

questions, technical support, walking

people through solutions step by step,

or even information retrieval, finding

exactly what someone needs from your

knowledge base.

Now, let's talk about what's actually

happening inside the voice agent when

you're having a conversation. The first

step in this pipeline is a transcription

phase or ASR, automatic speech

recognition. Ideally, we only want to

send speech, not silence or background

noise, to our speechtoext model. To help

with that, we add a small VAD voice

activity detection model in front of ST,

which detects if what's being picked up

by our microphone contains human speech.

We'll use VAD to filter out any audio

that isn't human speech before it

reaches SDT. This improves the accuracy

of our ST transcription, helps with

understanding when you're done speaking,

and the LLM can start speaking. It also

saves you a ton of money by not having

to constantly stream audio to the ST

model. Once speech is detected, the

voice data is forwarded to ST. This

model listens to and converts your words

to text in real time. The last step in

the layer is end of utterance or end of

turn detection. Being interrupted by AI

every time you pause is annoying. While

VAD helps the system to know when you

are and aren't speaking, it's also

important to analyze what you're saying,

the content of your speech to predict

whether you're done sharing your

thought. We have another small model

here that runs quickly on the CPU. It

will instruct the system to wait if it

predicts you're still speaking. Once

your turn is done, the final text

transcription is then forwarded to the

next layer. Then comes the thinking

phase. Your complete question gets sent

to a large language model. Think of this

as a brain that understands what you're

asking. The brain might need to look

things up like checking your product

catalog or calling other services to

give you the right answer. Once it

figures out what to say, it starts

generating a response sentence by

sentence.

The third and final step is the speaking

phase. As the LLM streams a response

back to the agent, the agent immediately

starts forwarding those LLM tokens to

the TTS engine. This generated audio

from TTS streams back to your client

application in real time. That's why the

agent can start speaking while it's

thinking.

And the final result, a conversation

that feels natural and immediate even

though there's a lot of complex

processing happening behind the scenes.

There's a lot of moving pieces here, but

LiveKit's agents SDK is going to handle

all this orchestration and data

management for us. It manages the audio

streams, keeps track of context, and

coordinates all these different AI

services, so you don't have to worry

about the technical plumbing.

Awesome. And now that we have our basis

covered, let's get everyone set up. You

can access the starter code here. This

will take you directly to our Google

Collab notebook where you can see the

starter code for today. First, we need

to install all the necessary packages in

your notebook. Find the first code cell

and run it. You'll see this command. Go

ahead and run that cell now. Click on it

and press shift and enter or click the

play button. While that's installing,

you should see some output showing the

packages being downloaded. This installs

light agents with support for Cartisia,

Cilero for voice activity detection, and

OpenAI compatibility. While you're

pulling that up, let me explain the key

technologies that make this project

possible. First, let's talk about the

brain of our operation, the LLM. For

today's workshop, we're using Llama

3.37B,

which is from Meta's latest family of

open-source AI models running on

Cerebrus, the lightning fast inference.

Speed is always critical here. You can

have the most sophisticated LLM

available, but if the inference is slow,

the conversation feels slow and broken.

Exactly. And that's the challenge most

voice agents face today. If you're using

traditional GPUs, you're looking at five

to seconds per response. For a phone

conversation, that's just painful.

Nobody's going to wait around for that.

They'll just hang up.

So, that's where Cerebrus comes in.

We're about 50 times faster than

traditional GPUs. So, instead of those

multisecond delays, you'll get responses

in milliseconds. When someone asks the

AI sales agent, "What's your pricing?"

They expect an immediate answer just

like they would from a human. For voice

agents, speed isn't just a nice to have.

It's table stakes. When people talk to

each other in everyday conversation,

they have less than 500 milliseconds of

total latency between turns. When we

stretch AI agents response times too far

past that, it stops feeling natural.

And as a final note on Cerebrris, this

is the AI processor running these

models, the Cerebrus wafer scale engine,

WSE3. It's a massive AI chip that

delivers the fastest inference in the

world. As you can see in the benchmark,

we're delivering 2591

tokens per second with Llama 3.3. That's

five times faster than the next best

provider.

First, let's install the LiveKit CLI.

This is optional for our workshop today,

but if you want to use LiveKit beyond

this, here are the commands depending on

your system. Today we're using a Python

notebook so that nobody needs to battle

with their environment when they're

getting started. But if you want to

build and deploy any agents that other

people can interact with, the CLI is by

far the easiest way to do it. Just type

LK appcreate and instantly clone a

pre-built agent like the one we're about

to build here. Let's talk a bit about

what exactly LiveKit is and why we need

it for our voice agent. The existing

internet wasn't really designed for

building voice AI applications. HTTP

stands for hypertext transfer protocol.

It was designed for transferring text

over a network. For what we're building,

we need to transfer voice data over a

network with low latency. Livekit is a

real time infrastructure platform for

doing just that. Instead of HTTP, it

uses a different protocol, WebRTC, to

transport voice data between your client

application and an AI model with less

than 100 milliseconds of latency

anywhere in the world. It's resilient,

handles any number of concurrent

sessions, and is fully open- source, so

you can dig into the code and see how it

works, or even host it yourself if you

want. You can use LiveKit to build any

type of voice agent, including ones that

can join meetings, answer phones and

call centers, and in our case today, an

agent that can speak to a prospective

customer on your website.

Here's the key part for Voice Agent.

LiveKit acts as middleware between your

AI and your customers. When someone

wants to talk to your agent, LiveKit

makes sure the audio gets from their

phone or computer to your LLM and then

gets the LLM's response back to them,

all in real time. We take care of the

hard parts so you can focus on your

application. Connection management,

routing information between data

centers, traversing firewalls, or

adapting to spotty cellular connections.

You don't have to worry about any of it.

Our goal is to make building a voice

agent as simple as building a website.

Here you can see those boxes labeled

LLM, TTS, and ST. Those are the AI

components we talked about earlier that

help the agent listen, think, and speak.

LiveKit is the real time layer ensuring

data flows smoothly between your

customers and all of your AI components.

In addition to Cerus and LiveKit, we

will also be using Cartisia. The final

component we need is the actual speech

processing to turn voice into text and

text back into voice. We need

specialized models. That's where our

partner Cartisia comes in. Cartisia has

their own flavor of whisper large v3

turbo called ink that's focused on

real-time accuracy and time to first

bite latency. When you talk to our sales

agent, INC converts your speech into

text that the AI can read and

understand. INC is pretty fast.

Transcriptions come back within 60

milliseconds of you finish speaking.

Then when the AI has something to say

back, Cartisia takes that text and

converts it into speech that sounds

natural and humanlike. It's like having

a really good interpreter who works both

ways. They also handle all the messy

parts of human speech, like when people

pause or interrupt each other or say a

lot. Now, let's take a moment to set up

our API keys for Cerebrris, LiveKit, and

Cartisia. In the second code cell,

you'll need to replace the placeholder

API keys with your actual keys. The

links to get these free API keys are all

in your notebook. Now that our API keys

are set up, step two is about teaching

our AI sales agent about your business.

Think of it like training a new

employee. You would put someone on the

phone without telling them what you

sell right?

The challenge with LLMs is that they

know a lot about everything, but they

might not know many specifics about your

company. LLMs are only as good as their

training data set. If we wanted to

respond with information that isn't

common public knowledge, we should try

to load it into the LLM's context window

to minimize hallucinations or the I

can't help with that responses.

This is where something called rag comes

in, retrieval augmented generation.

This process is simple in concept. We

feed the LLM a document containing

additional information. For example, if

we load our pricing details into the

LLM's context window, then when someone

asks about pricing, it's easy for the

LLM to look up that information and

return an accurate answer.

So, for this demo, we'll load in

information like product descriptions,

pricing info, key benefits, even

pre-written responses to common

questions like, "It's too expensive."

That way, our agent always stays on

message and has the context it needs to

generate accurate responses. Let's take

a look at our notebook and see what that

added context looks like in practice.

Under step two, we've organized all the

information our sales agent needs into a

simple, structured format that's easy

for the AI to understand and reference.

You can see we have everything a good

salesperson would need. A clear product

description that explains what we're

selling, a list of key benefits that

highlight why customers should care, and

specific pricing for each tier. But

here's the really fun part. Those

objection handlers at the bottom, these

are pre-written responses to the most

common things customers say when they're

hesitant to buy.

When someone says, "It's too expensive,"

the agent already has a proven answer

ready. For example, I understand the

cost is important. Let's look at the

ROI. Most clients see 3x return in the

first six months. When they say, "I need

time to think," the agent knows to ask,

"what specific concerns can I address

right now?" Rather than just saying,

"Okay, call us back."

This gives your agent a loose script.

You can fill this with the responses

that have converted best for your sales

team. LLMs are non-deterministic,

so it's not going to say the exact same

thing every time, but it provides a

solid framework for the agent to follow

while it's talking to prospects. Now,

let's fill out our load context function

and load this information into our agent

and see how it uses this knowledge

during conversations. Next is the

exciting part, step three, where we

actually create our sales agent. This is

where we take all of those components we

just talked about and wire them together

into a working system.

Before you run anything, let's walk

through what's happening in the sales

agent class. We start by loading our

context using the load context function

we defined earlier. This gives our agent

access to all the product information,

pricing, and objection handlers that we

set up. Then we configure the four

components that are part of our voice

pipeline. The LLM is a Llama 3.370B

running on Cerebras. Llama 3.370B is a

good balance between speed and quality,

and it's great at tool calling, which

we're going to need later. For

speechtoext, we're also using Cartisia's

Ink Whisperer engine, which is really

fast. On the texttospech side, we're

using Cartisia again, partially because

it means you only need one API key, but

also because their TTS engine, Sonic, is

also really fast. For voice activity

detection, we're using Cero, which is

the default option for the agents

framework and has a light footprint and

really fast performance.

Now, let's look at how we actually

implement this in code. The instructions

are important, but if we tried to show

the whole prompt here, the text would be

really small. The full version is in

your notebook. We start by telling the

agent, you are a sales agent

communicating by voice and give it

important rules like don't use bullet

points because everything will be spoken

aloud and most importantly only use

context from the information provided.

If someone asks about something not in

the context, say you don't have that

information.

The super call initializes our agent and

passes all of our configurations to the

parent agent class. This sets up the

agent with our LLM, ST, TTS, VAD, and

instructions all working together.

We also define an on- enter method to

start the conversation. This is

triggered as soon as someone joins a

conversation with the agent. Instead of

sitting in silence, it immediately

generates a greeting and offers to help,

just like a good salesperson would. Now,

go ahead and let's run the step three

cell to define our sales agent class.

Step four is our launch sequence. This

is how we actually get our agent up and

running so people can talk to it.

Think of this entry point as the start

button for our agent. When someone wants

to have a conversation, this is what

kicks everything into gear and gets the

agent ready to talk. The entry point

function does three main things. First,

it connects the agent to a virtual room

where the conversation will happen, like

dialing into a conference call. Then, it

creates an instance of our sales agent

with all the setup we just configured.

Finally, it starts a session that

manages the back and forth conversation.

This session glues together all of our

model configurations, media streams,

tool configurations, and maintains the

conversation history.

Usually, you'd have a front end like a

mobile app or a web page where you're

speaking to your agent. But today, to

make it easy, we're going to create a

minimal web interface right here in the

notebook.

Before we run this, let me set

expectations. When you execute this

cell, it's going to load several AI

models and establish connections to

multiple services. The first time might

take 30 to 60 seconds, so be patient.

Once you start your agent up, you'll see

some initial log output. You can think

of this as similar to booting up a

Nex.js app or a web server. Your agent

server waits for new conversation

requests from your customers. When a

request comes in, LiveKit will connect

your agent and front end together so you

can start speaking. Go ahead and run the

step four cell now. Watch the output and

wait for the interface to load.

If you see any errors about expire

tokens, just stop the cell and run it

again. The cell will request a new room

join token from the Jupiter proxy and

you'll be able to connect to the room.

Great. Now we have a fully working sales

agent, but let's keep going to make it

even more robust. This part here is

completely optional, but here are a few

ways you can expand your sales agent.

First, let's stop the current agent by

clicking the stop button on the cell

that's running or pressing the interrupt

button in your notebook.

Now, one thing we can do is expand our

single agent into a multi- aent system.

Why would we want to do that? Instead of

just having a single one to answer every

question.

Great question. Think about how real

sales teams work. You don't have one

person who's an expert at everything.

You have specialists. If someone calls

asking a deep technical question about

some API integration, you want them

talking to your best technical person,

not your pricing specialist. LLMs have

limited context windows, which means,

similar to people, they have limits on

the amount of things they can specialize

in. We can also tailor the

conversational style of the topics that

we're talking about. If there's a

conversation about technical issues,

it's less important for us to talk about

value props and ROI. In our case, we're

not actually pushing in that much

context. But this is an important lesson

to learn when you're creating larger,

more complex agents for production. For

a more complex system, here are three

agents that we've defined. A greeting

agent is our main sales agent who

qualifies leads. A technical specialist

agent is specialized on solving

technical issues and a pricing

specialist agent handles budgets, ROI,

and deal negotiations.

The magic is in the handoffs. Our

greeting agent figures out what the

customer actually needs, then smoothly

transfer them. Let me connect you with

our technical specialist who can dive

deeper into those integration questions.

Each specialist has its own voice,

personality, and specialized knowledge.

The technical agent can go deep on specs

without getting wrapped up in trying to

sell anything, while the pricing agent

can focus purely on ROI and budget

decisions.

It's like having your best sales team

available 24/7. Go ahead and run the

enhanced sales agent cell. Now run the

technical specialist agent cell followed

by the pricing specialist agent cell.

Finally, let's run the multi- aent entry

point. This will start our new system

with agent transfer capability. This

multi- aent system also uses tool

calling. When a customer asks about

technical details, our agent can

transfer them to a technical specialist

who has a different voice and

specialized knowledge. Let's implement

this enhanced system. Scroll down to the

challenges section and find step five.

First, run the cell that imports the

function tool. This gives us the ability

to create transferable functions. Now,

let's look at our enhanced sales agent.

You'll see we add function tools to

allow agent transfers. And that's it. In

less than an hour, you've built a

sophisticated multi- aent voice sales

system that can have natural

conversations with customers, transfer

between specialized agents, use your

actual product information, and handle

objections professionally.

Remember, you have free API credits for

all three platforms, so keep

experimenting. Try adding your own

product data or customize or expand the

agent personalities and voices. maybe

integrate with your existing systems or

external APIs.

The code notebook will remain available

to you and we've included links to get

free API keys for Cerebrus, LiveKit, and

Cartisia. If your agent isn't working

perfectly right now, don't worry.

Sometimes it takes a few tries to get

the microphone permissions set up

correctly or you might need to refresh

and run the cells again.

The most important thing is that you now

understand the architecture and you have

the complete working code to take with

you.

Happy building.

Welcome everyone to build your own deep

research. I'm Sarah Chang from Cerebrus

and I'm here today with Will Brick, CEO

of Exa. We're super excited to have you

all here.

Thanks Sarah. Today we're going to build

something pretty cool, a deep research

style assistant that can automatically

search the web, analyze multiple

sources, and provide structured insights

in under 30 seconds. We'll code it up

with you.

That's right, Will. By the end of today,

we'll learn how modern AI powered deep

research systems work under the hood. So

before we get started, let's do a quick

overview of what we're walking away with

today. First, free API credits for both

Cerebrus and Exa so you can continue

developing after this workshop ends.

Second, a complete quick start guide to

build apps with both API so you can jump

start your future projects. And third,

your very own functional research agent

that goes way beyond basic search.

Now, let's see what we're building. This

is Cerebris's deep research interface.

It looks clean and simple, but there's

some serious intelligence under the

hood. Our coding today will build the

functionality behind this. If you

notice, it's not just returning search

results. It's actually doing real

research, searching multiple sources,

analyzing the content, identifying

knowledge gaps, and then doing follow-up

searches to fill those gaps. You might

take an hour to do all this work, but

this app does it in less than a minute.

Your deep research will be 10x faster

than Chach's deep research.

Before we dive into AI powered research,

let's talk about how research was

traditionally done. In the old days

before 2025, researchers somehow used to

write their own research papers and

reports. Let me walk you through each

step as we build this diagram together.

So, first it all starts with a research

question. Once you have your question,

you need to figure out where to look for

for answers. So, you'd branch out in

multiple directions. You'd go to the

library to search through physical books

and journals. You'd do Google searches.

And if you were lucky, you'd interview

experts in the field. After gathering

all these sources from libraries, search

engines, and expert interviews, what's

the next step? You'd sit down and take

notes,

right? That means reading through

literally everything you can,

highlighting key passages, writing

summaries, and keeping track of where

each piece of information came from. You

can imagine doing this with hundreds of

sources.

And this is where human limitations

really showed up. You could only read so

much, process so much, and remember so

much. Important connections between

sources might be missed completely

because of information overload.

Finally, you take all these notes and

compile them into an answer. This is

where the real intellectual work

happened, synthesizing information from

multiple sources into coherent insights.

In reality, research is a recursive

process. You never actually know if you

found all the relevant information in

the first pass, so you have to go back

and do it again with your new knowledge.

Each loop through this process could

take hours or days and there's always

this nagging feeling that you might have

missed something important. Luckily, AI

can do days of research in 30 seconds.

Now, what do we mean when we talk about

deep research powered by AI? AI powered

deep research incorporates all those

research steps like gathering sources

and analyzing notes, but uses LLM to

dramatically accelerate each step.

For starters, instead of going to the

library or searching on Google, we use

tool calling with a web search API. In

this case, Exo. Our AI system can

autonomously break down the initial

research question into the web search

API calls that would answer the

question.

Next, instead of the take note step from

traditional research, we have retrieval

augmented generation or rag. Rag just

means that the LLMs get to retrieve from

the web before generating because they

don't have the whole web memorized and

often make stuff up. With rag, our web

search API gives the LLM fully informed

context, so it's super accurate.

Finally, instead of us manually reading

and analyzing all of these notes, we let

the large language model do it for us.

And like traditional research, there's a

feedback loop. Our AI system identifies

gaps in its rag output and triggers any

new searches if necessary. This loop

repeats multiple times until the LLM

decides that the answer has been

reached.

And because we're doing so many LLM

calls, speed becomes absolutely

critical. This is why we need Cerebrus.

When you're chaining together 10 or 15

LLM calls to complete a research task,

even small delays can add up really

quickly.

So, now that we understand what AI

powered deep research is, when should

you actually use it? Deep research takes

a lot longer than a typical search, but

it's perfect for things like answering

really hard questions like what project

to work on or what's the meaning of life

or keeping up with fastmoving fields

like AI or biotech.

Now, let's get everyone set up to build

our own deep research systems. You can

access the starter code here. This will

take you directly to our Google Collab

notebook where you can see the starter

code for today. The notebook contains

all the code we'll be working with plus

some additional examples and extensions

you can explore after the workshop.

First, we need to install our

dependencies and set up the environment.

In your collab notebook, you'll see

we're installing Exopy and Cerebrris

cloud SDK. Run that first cell to get

everything installed.

While that's running, let's go ahead and

grab our API keys. You'll need a

Cerebrus API key, and you can get

started for free at cloud.sris.ai, AI

and an Exa API key, which you can also

get for free at exa.ai. While you're

installing these packages and getting

your API keys, let me explain a bit more

about Cerebras and Exa. First, let's

talk about the brain of our operation,

the LLM. For today's workshop, we're

using Llama 4 17P Scout, which is from

Meta's latest family of open- source AI

models running on Cerebras hardware.

Cerebras' speed is a huge unlock here.

The faster the language model can run

inference, the less time our users have

to spend waiting for the research report

to come back. Especially with deep

research where we're chaining together

multiple LM calls, speed is super

important.

It's a great example of something you

can only build with fast inference.

Today, most deep research products take

multiple minutes to return a report.

Cerebrris unlocks a completely different

experience. We serve the top open source

models in the industry with 50 times

faster inference than traditional GPUs.

This is the actual hardware making it

all possible. The Cerebrus wafer scale

engine WSE3. It's a massive AI chip that

delivers the fastest inference in the

world. As you can see in the benchmark,

we're delivering 2,800 tokens per second

with Llama for Scout, five times faster

than the next best provider.

Now that we've covered how Cerebrus will

run our language model, the second

component is to give our LLM search over

the web. That's where Exa comes in. We

built our search engine from scratch to

connect LLMs to the web. X is actually

built specifically for AIS and we've

built all the features to make it super

simple. For example, we don't just

return search results, we return the

full content of each result page so that

your LLM has full context. In fact,

we're actually also the fastest search

API in the world.

Now that our API keys are set up, step

two is all about building our core

search function using Exap. Looking at

our research flow diagram here, this is

the first red box where we take a

question and find relevant sources on

the web. So, let's build this search

function and see how it works. In your

notebook, you'll see step two where we

define our search function. Here's what

the function looks like in practice. On

the left, we're creating a function

called search web. Let me walk you

through the actual code. Notice in our

search web function, we're using the

search and contents endpoint. This means

that not only are we doing a web search,

but for each URL returned, we're also

getting the crawled content from that

page. We also specify type auto. This is

really powerful. It automatically

chooses between keyword and neural

search. based on your query. You don't

have to decide which approach to use.

Think of it this way. If you search for

something like Python programming

tutorials, that's clearly a keyword

search. But if you search for companies

that are disrupting traditional banking,

that's more conceptual and benefits from

neural search. Exa figures this out

automatically. The text parameter

controls how much content we get from

each source. For deep research, we want

substantial content, not just snippets.

This gives our LLM rich content to work

with instead of trying to piece together

tiny fragments. Go ahead and run this

function now. Try searching for

something like space companies in

America and see what you get back. You

should see both the search results and

the full content from each page rendered

as clean markdown.

On the right side of the screen, you can

see the kind of results we get back.

Notice we're not just getting titles and

snippets. We're getting the full content

of relevant pages. This is crucial for

the LLM to do deep analysis rather than

surface level summarization.

Now let's move to the next step in our

flow. Taking all those relevant sources

we found and feeding them to our LLM for

analysis. This is the next step in our

diagram powered by Cerebrus.

This is where the magic of rag really

happens. We're passing the LLM fresh

relevant information from the Exa API

web search to work with. And this is

where Cerebrris' speed becomes critical.

We're about to start chaining multiple

LLM calls together. So each individual

call needs to be fast. A few seconds

here and there quickly adds up to

minutes of waiting.

Now let's look at the actual

implementation. Here's what's happening

under the hood. We're taking all that

rich web content from Exa and feed it

directly to the Cerebrus LLM along with

our original question. In our ask AI

function, we're using a low temperature

of 0.2. Think of temperature like a

creativity dial. Lower numbers give you

more focused deterministic responses.

Since we want factual analysis, not

creative writing, we keep it low. The

model we're using, Llama For Scout, is

specifically optimized for reasoning and

analysis tasks like this. And because

it's running on Siri versus WS3 trip,

you'll see responses in seconds, not

minutes.

Look at how we structure the prompt.

We're not just asking, "What do you

think?" We're giving the LM specific

instructions. Analyze these sources,

answer this question, and format your

response clearly. The LM reads through

all the sources simultaneously,

something that would take a human hours

and synthesizes insights in just

seconds. This is the power of combining

fast inference with comprehensive source

material. You can now go ahead and run

the ask AAI function with some sample

content. You'll see how quickly it

processes even large amounts of text and

returns structured insights.

Now, let's put it all together into a

function called research topic. This is

our basic research function that follows

a classic flow. Ask a question, do an

Exo web search, pass the relevant

sources to the LLM, then return a

response. If you walk through the

research topic function in your

notebook, you'll see it searches for

sources using our exo function, filters

for substantial content over 200

characters, creates context for the LLM,

then asks the LLM to analyze and

synthesize an answer.

The key is in the prompt structure.

We're asking for both a summary and

specific insights in a structured

format. This makes the output much more

useful than just a wall of text. Try

running this on a topic you're curious

about, something like climate change

solutions 2025 or quantum computing

advances. You should see it finds

multiple sources and gives you a real

analysis.

In the next and final step, we're going

to expand on our basic research model.

This mirrors how human experts actually

research. Instead of just doing one

round of web search and LLM synthesis,

after the first web search, the LLM

identifies the most important gaps in

understanding and does a targeted second

web search before producing an answer.

Let's take a look at step five where we

implement a basic recursive version of

our deep research. If you look at the

deeper research topic function, it does

everything the basic version does but

then adds this intelligent follow-up

layer. The implementation has two key

steps. After the first analysis, we ask

the LLM based on these sources, what is

the most important follow-up question

that would deepen our understanding of

our query. two, we then search for that

specific missing topic and combine both

layers for our final analysis.

Again, look at the prompt engineering.

We're not just asking for more

information. We're specifically asking

the LLM to identify gaps and formulate

targeted search queries. This is much

more sophisticated than just doing

random additional searches. You can go

ahead and run the enhanced version on

the same topic you tried before. You

should see much richer, comprehensive

results.

Now that you have the core system

working, let's talk about where you can

take this next. The beauty of what we've

built is that it's modular. You can

enhance any piece independently. Some

ideas for expansion include adding more

search layers, integrating with

specialized databases of academic

papers, patents, etc. or adding

different types of analysis and

sentiment, trend detection or

competitive intelligence.

You could also experiment with different

LLM models, add memory between searches,

or even build domain specific research

agents for things like market research,

academic research, or technical due

diligence. We could stop here, but for

those who want to go deeper, we're going

to look at the approach Enthropic uses

for their deep research agent. This is

completely optional, but it will give

you an idea of what production systems

look like. In the Enthropic approach, a

lead agent breaks down complex queries

into specialized subtasks, multiple sub

aents working simultaneously. Then, the

lead agent synthesizes everything

together, and decides whether to kick

off more sub agents or generate the

final report. By splitting the task

across agents, you keep the context

manageable for each individual agent.

Think of it like a research team where

everyone has their own specialty.

This is huge for reliability. It makes

sense, right? Humans also function

better when they're given a well scoped

task versus trying to manage 10 things

at once. Also, by running sub aents in

parallel, you can run dozens of searches

while keeping the response time

manageable. Instead of doing searches

sequentially, you're doing them

simultaneously. The enthropic

multi-agent research function in your

notebook shows a simplified version of

this approach. It's more complex, but it

is better at handling difficult topics

that require stringing together multiple

sources.

And that's a wrap. You've just built a

sophisticated research system that

combines the best of web search and AI

analysis. You now understand how modern

AI powered research works under the

hood.

Remember, you have free API credits for

both Cerebris and Exa, plus the complete

code guide to keep experimenting. You

should build something cool. The

techniques you learned today, rag,

multi-layer search, intelligent

follow-up, these are the same patterns

used by production research systems that

we use at places like Exa

and us at Cerebras. Now go build some

amazing research agents.

Welcome everyone to automate user

research with AI. I'm Sarah Chang from

Cerebrris and I'm so excited to be

joined by Lance Martin from Linkchain

today.

Thanks Sarah. Today we're going to walk

through how to build an AI powered user

research system that can automatically

generate user personas, conduct

interviews, and synthesize product

feedback in under 60 seconds.

This isn't just about automating surveys

or forms. We're building a sophisticated

AI research system that can generate

endtoend simulated interviews using

Langraph workflows. By the end of this

workshop, you'll have your very own

working user research system that can

compress weeks of work into minutes. We

have a complete code notebook for you to

follow along with, and you'll be able to

keep experimenting and building even

after today's workshop's over.

Before we get started, let's go over

what you'll get out of this workshop.

Free API credits for Cerebrris, a

complete quick start guide to build apps

with Cerebrris, Langchain, and Langraph,

and your very own functional AI user

research system that you can customize.

Before we dive in, please make sure you

have the notebook open. If you haven't

already, go ahead and click on the

collab link we shared.

You can scan the QR code on the screen

to access the starter code. Let's make

sure that everyone has this open before

we continue. So Sarah, taking a step

back, what exactly is user research?

User research is a systematic process of

gathering and analyzing information

about your target audience. The goal is

learning user behaviors and needs,

validating product ideas, and making

better product decisions. Everyone from

startups to big companies like Google,

Netflix, or Dropbox all have teams that

conduct user research before launching a

new feature or creating a product

roadmap. It's something that all

companies invest heavily in because it's

essential for building products people

actually want. It also takes an

extremely long amount of time. So now

let me walk you through how this is

traditionally done. We always start with

a user question. Then for each user you

create interview questions. This takes

time to get right because you need

open-ended questions that actually get

useful insights. Bad questions lead to

bad data. You need questions that reveal

motivations, not just surface level

preferences.

That's right. Next comes recruiting

participants. And this is where things

get very expensive. You need to find the

right people who match your target

demographic. This is often the biggest

bottleneck, finding qualified

participants who represent your actual

users.

Exactly. That's two to three weeks just

to find the right people to talk to.

Then you're scheduling interviews around

everyone's availability. Coordinating

schedules, sending reminders, dealing

with no-shows is a logistical nightmare.

Then conducting interviews, another one

to two weeks of actual conversations,

transcribing, and organizing responses.

Each interview might be 30 to 60 minutes

plus transcription time plus organizing

all of that qualitative data.

And finally, analyze responses, pulling

out themes, identifying patterns,

writing up actual insights. That's

another week. And this is really where

most research projects die. You have all

this data, but turning into actual

insights is a lot of hard work. And

here's the complete timeline. User

questions all the way through to final

insights. All in all, we're looking at 6

weeks total, and that's if everything

goes smoothly. This timeline really

kills innovation speed. Startups really

can't afford to wait six weeks to get

user feedback.

By the time you get results, your

product might have already changed or

your competitors might have moved ahead.

In the era of AI assisted coding,

engineer teams can build products before

PM and design teams can validate if

they're a good idea to build. This 6

week research timeline is simply too

long for modern organizations. And

that's what's so exciting about what

we're building today. What if we could

automate this entire process? Instead of

recruiting real people, we can create AI

personas that represent your target

users. AI user research creates multiple

AI personas and runs hundreds of

simulated interviews in minutes. We're

talking about genuine AI agents that can

think and respond like real users. So

now instead of scheduling interviews, we

can simulate them instantly. The AI can

roleplay as different types of users and

give you realistic responses. Look at

this. It's the same process, but every

step is automated and happens in

seconds, not weeks. Our system has four

AI powered components. We AI generate

the interview question based on your

research topic. Then the AI creates

diverse personas that match your target

demographic. Next, the AI runs simulated

interviews between the researcher and

the persona. And finally, the AI

analyzes all the responses to give you

actionable insights.

That's right. Each step that took weeks

can now take seconds. And you can

iterate on research or product questions

in real time. This is the magic. 6 weeks

becomes 60 seconds. You can test

multiple research approaches instantly

and get feedback immediately. When you

can test product feedback or ideas that

quickly, suddenly you can iterate very

quickly or instantaneously. Now, let's

switch over to the code. You can access

the starter code here and we'll be

walking through the notebook step by

step. If you haven't already, make sure

you have the starter code open. We're

about to dive into the technical

implementation. First, we need to

install all the necessary packages in

your notebook. Find the first code cell

and run it. You'll see this command. Go

ahead and run that cell now. Click on it

or press shift enter or click the play

button. While that's installing, you

should see some output showing the

packages being downloaded. This installs

Cerebrus and Langraph. First, let's talk

about the brain of our operation, the

LLM. For today's workshop, we're using

Llama 3.37B, which is from Meta's latest

family of open- source AI models running

on Cerebris for lightning fast

inference. Speed is always critical

here, especially when we're conducting

hundreds of AI interviews with multiple

AI users and questions. Every

millisecond counts. With Cerebras

delivering over 2,500 tokens per second,

we can simulate an entire research

study, 10 personas, five questions each,

plus analysis in under 60 seconds. With

traditional inference, this could take

upwards of 20 minutes. So with Cerebras,

you could run 20 user simulations in the

same time it would take you to run one

with other platforms. And as a final

note on Cerebras, this is the AI

processor running these models, the

Cerebrris wafer scale engine, WSC3. It's

a massive AI chip that delivers the

fastest inference in the world. As you

can see in the benchmark, we're

delivering over 2500 tokens per second

with Llama 3.3, five times faster than

the next best provider.

And the second platform we're going to

be using here is Langchain and Langraph.

Today, we're focused on Langraph for

workflow orchestration and Langchain for

integrations as well as structured

output. And we use Langmith for tracing

and observability. This gives us the

building blocks to create complex

stateful AI systems that can handle

multi-step research processes. We're

going to be using three key lang chain

features today. Model abstraction so we

can easily swap different providers,

standard interfaces that make our code

clean and maintainable, and structured

outputs to ensure that AI responses are

properly formatted. These features let

us focus on the research logic instead

of wrestling with AI SDKs. Now, Langraph

gives you low-level components for

building many types of AI applications,

including agents or workflows. And we

can lay out these applications in any

way we want as a series of steps or

nodes. Each node is just a Python

function, giving you full control over

the logic within each step. Think of it

like having a team of AI researchers,

each with their own expertise working

together seamlessly.

We connect those nodes or steps into a

workflow. We start with configuration.

This just generates the personas, runs

interviews in a loop, and then

synthesizes results. Langraph handles

all that orchestration and state

management for us.

The beauty is that each node can focus

on one job really well and Langraph

connects them intelligently. For a

second step, now let's set up our LLM

running on Cerebrris. Our LLM is the

brain of the operation. and handles

question generation, persona creation,

simulated responses and analysis. This

function is our interface to servers and

we'll be calling this function multiple

times throughout our code. For step

three, let's talk about state. As we

conduct all these simulated interviews,

we want some way to track the shared

information across each of these steps

and nodes. Langraph has a state object

which we can access within each node and

update once we finish running the node.

So because every node can read from and

update this state object, it can track

our questions, personas, or anything

else we want over the course of this

research process. Now, without proper

state management, AI systems can't

always maintain context across time.

Look at how the state flows through our

system. Each node read what it needs and

updates what others will use. This

shared state is what makes our multi-

aent system coherent instead of just a

bunch of disconnected AI calls. Now look

at the interview state, typed dict. It

defines exactly what information flows

through our system. Research questions,

personas, interview tracking, and final

results. This structure keeps everything

organized.

Now, for our next step, we need to build

our specialized agents. So, each node is

a specialist that performs one specific

task and updates the shared state, which

other nodes can then use.

This is where the magic happens. Each AI

agent has a clear job and does it really

well. The key insight is that nodes

don't just process data, they update a

shared state. This lets subsequence

nodes build on the previous work. It's

like a relay race where each runner

hands off exactly what the next runner

needs. For our workshop today, there are

four main nodes. Each has a specific

role in the research pipeline. The

configuration node is our first node and

is our entry point. It gets the research

question from the user and generates

interview questions automatically. This

node will initialize our research

process. It prompts the user the

research topic and types of users we

want to interview. It then generates a

configurable number of questions about

the research topic to seed our

interviews. The second node is the

personas node which creates diverse user

profiles with rich characteristics.

Different ages, backgrounds,

communication styles, everything that

makes interviews realistic. This is

where we get the diversity that makes

our research valuable. Each persona

brings a different perspective.

So once we have our personas in the

third node, which is our interview node,

we can just conduct interviews which is

actual QA simulation. Each persona

responds in character, maintaining their

personality throughout. And this is

really the most complex node. It's where

it has to manage the conversation and

flow between personas and keep them

consistent. The last node is our

synthesis node. It analyzes all the

completed interviews and generates

actionable insights. It looks for

patterns, themes, and practical

recommendations. This is where raw

interview data becomes business

intelligence.

For the next step, we need to set up our

router. In our flow, two things can

happen after an interview is completed.

Our program can move on to the next

persona or if we've gone through every

single interview, we can end the

program. The interview router decides

what happens next. Continue interviewing

the next persona or move to synthesis.

This is what makes the workflow

intelligent and adaptive. It's simple

logic, but it's what makes our system

autonomous. No human intervention needed

once it starts.

And now we connect everything together,

add the nodes, define the connections,

set up conditional routing, look at the

build interview workflow. It's

surprisingly clean for such a powerful

system. So, Langraph handles all the

complexity for us. We define the

structure and it manages the execution.

Finally, the moment of truth. We Let's

see it in action. We can input a

question and our target persona. Watch

how it generates personas with distinct

personalities, conducts interviews where

each persona responds differently and

synthesizes insights.

So, from research question to actual

insights in under a minute. That's

really the power of AI automated

research.

Great. Now we have a fully working AI

powered user research system, but let's

keep going and make it even more robust.

This part is completely optional, but

here are a few ways that you can expand

your code. A worthwhile expansion is to

implement multi-question interviews.

Instead of getting asked a series of

individual questions, the agent is able

to follow up and dig deeper into an

interview's response, creating more

natural conversations and uncovering

deeper insights. Some of our key changes

is that we've implemented enhanced

state. So, we've added follow-up

tracking and conversation context. We've

added a follow-up generator, so the AI

creates contextual follow-up questions

based on the responses. We've added a

smart interview flow, so it decides when

to follow up or move to the next

question. Conversation memory, so

personas remember and build on previous

answers. And finally, enhanced

synthesis, so analyzing the conversation

patterns for deeper insights.

Another thing you can do here is use

languid to understand what's happening

under the hood. You can look in detail

at the interviews between your personas

and the responses. And this is really

essential for auditing quality, finding

bugs or tracking costs. Building

effective AI applications often requires

two central things. Look at your data,

set up evaluations to test performance

as you update, for example, models over

time. And Langwith is a great way to do

both of these.

You can see exactly how each node

performed and optimize your prompts and

logic. And that's it. In less than an

hour, you've built a sophisticated user

research system that can generate

personas, conduct hundreds of user

interviews, and to synthesize actionable

insights in seconds.

Remember, you've all started codes. Keep

experimenting. Try adding your own

research questions, customizing persona

generation for your target demographics,

or expand the interview flow with

follow-up questions. You can also think

about using new models as they come out

and are available on Cerebras.

If your research system isn't working

perfectly right now, don't worry.

Sometimes it takes a few tries to get

the API connection set up correctly or

you might need to refresh and run the

cells again.

The most important thing is that you now

understand the LAN graph architecture

and have complete working code to take

with you. You can adapt this for any

type of user research like product

validation or market research. So happy

researching.

How to Build Advanced AI Agents – Course for Beginners (LiveKit, Exa, LangChain)

freeCodeCamp.org

55 days ago

49:55

AI Framework Development

Rank #3

Description

Learn how to build real-world AI apps in this 3-part workshop series. You'll learn to build voice agents, deep research tools, multi-agent workflows, and more.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‌‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‍‌‍‍‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‌‍‍‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‌‍‌‍‍‌‌‍‌‍‌‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‍‌‌‍‌‍‍‌‍‌‌‍‌‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‌‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‍‍‌‌‍‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‍‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‌‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‍‌‍‍‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‌‍‌‍‍‌‌‍‌‍‌‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‍‌‌‍‌‍‍‌‍‌‌‍‌‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‌‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‍‍‌‌‍‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‍‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‍‌‌ You’ll get hands on with today’s most popular tools, sample code, and open-source repos so you can follow along and build fast. This workshop series leverages Cerebras, the world's fastest AI inference provider. Get a free Cerebras API key (w/ increased rate limits) at https://cloud.cerebras.ai?referral_code=freecodecamp In this workshop, you'll learn how to build real-world AI apps in this 3-part workshop series. You'll learn to build voice agents, deep research tools, multi-agent workflows, and more.‌‍‍‍‌‍‌‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍‍‍‍‍‍‍‌‌‍‌‌‍‍‌‍‍‌‌‌‌‍‌‍‍‌‍‍‌‌‍‍‍‍‍‍‌‍‍‌‍‌‍‌‌‌‍‌‍‍‍‍‍‍‍‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‍‌‌‍‍‌‌‌‍‌‌‌‍‍‌‌‍‌‍‌‌‌‍‌‌‍‍‌‌‌‍‌‍‌‌‍‌‍‌‌‍‌‌‌‌‌‍‌‍‌‌‌‌‍‌‌‌‍‍‌‌‌‍‌‌‌‌‍‍‌‌‍‌‍‍‍‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‌‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‍‌‍‍‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‌‍‍‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‌‌‍‍‌‌‌‍‌‍‍‌‌‍‌‍‌‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‍‌‌‍‌‍‍‌‍‌‌‍‌‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‌‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‍‍‌‌‍‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‍‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‌‌‌‌‌‌‍‌‍‌‍‌‌‍‌‌‍‌‌‌‌‌‌‌‍‌‌‍‍‌‌‍‍‌‍‌‍‍‌‌‍‌‌‌‌‌‍‌‌‌‌‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‍‌‍‍‌‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‌‌‍‌‍‌‌‍‍‌‍‍‌‍‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‌‌‍‌‌‌‍‌‌‌‌‌‍‌‌‍‌‌‍‌‌‍‌‍‌‌‍‌‌‌‌‍‍‌‌‌‍‌‍‍‌‌‍‌‍‌‌‌‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‍‌‌‍‍‌‍‌‌‍‌‍‍‌‍‌‌‍‌‍‌‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‌‍‌‌‌‌‍‌‍‌‍‍‌‌‌‌‌‍‍‌‌‍‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‌‌‌‌‍‌‍‌‍‌‌‌‌‍‌‌‌‌‌‍‌‌‌‍‌‌‍‌‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‍‌‍‍‌‍‍‌‌‍‌‍‌‌‍‌‍‌‌‌‍‍‍‌‌‌‌‌‍‌‌‌‍‍‌‍‌‌‌‍‌‍‌‌‌‌‍‌‌‌‌‍‌‌‍‍‍‌‍‌‍‌‍‌‍‌‍‍‌‍‌‍‍‌‍‍‌‌‍‍‍‌‌‌‌‌‌‍‍‌‌‌‍‌‌‌‍‌‌‍‍‌‌You’ll get hands on with today’s most popular tools, sample code, and open-source repos so you can follow along and build fast. ⭐️ Workshop 1: Building Voice Agents with LiveKit and Cerebras ⭐️ Learn how to build a sophisticated real-time voice sales agent that can have natural conversations with potential customers. You'll create both single-agent and multi-agent systems where specialized AI assistants handle sales, technical support, and pricing inquiries. ⭐️ Workshop 2: Creating Research Assistants with Exa and Cerebras ⭐️ Build your own AI-powered research assistant that can intelligently search the web, analyze information, and provide comprehensive answers with proper citations. You'll create a "Perplexity-style" tool that rivals commercial AI search platforms. ⭐️ Workshop 3: Developing Multi-Agent Workflows with LangChain and Cerebras ⭐️ Build an AI-powered user research system that automatically generates user personas, conducts interviews, and synthesizes insights using LangGraph's multi-agent workflow. You'll create a complete research automation system that can deliver comprehensive user insights in under 60 seconds. ⭐️ Code ⭐️ https://inference-docs.cerebras.ai/cookbook/agents/sales-agent-cerebras-livekit https://inference-docs.cerebras.ai/cookbook/agents/build-your-own-perplexity https://inference-docs.cerebras.ai/cookbook/agents/automate-user-research 🏗️ Thanks to Cerebras for providing a grant to make this course possible. ⭐️ Contents ⭐️ 00:00 Introduction 01:31 Build a sales agent with LiveKit 23:32 Build your own deep research with Exa 37:05 Automate user research with LangChain

Video Details

Category

AI Framework Development

Featured Date

November 13, 2025

Quality Rank

#3

AI Recommended