Loading video player...
Welcome to this comprehensive course
where we will build three cutting edge
AI agents from scratch. First, you'll
create a sophisticated sales agent that
can engage in natural real-time
conversations with customers, leveraging
the power of LiveKit and Cartasia. Next,
you'll learn to build a powerful deep
research agent using Exa capable of
searching the web, analyzing multiple
sources, and delivering structured
insights in seconds. Finally, you'll
learn how to construct a user research
agent with Lang Chain that can automate
the entire research process from
generating user personas to conducting
simulated interviews and synthesizing
feedback. Each project is designed to be
hands-on, providing you with the
practical skills to develop your own
functional AI agents. Thanks to Cerebras
for providing a grant to make this
course possible. Hi, I'm Sarah Chang, a
growth engineer at Cerebrus, and I'm so
excited to welcome you to our three-part
hands-on workshop series designed to
help you build powerful AI applications.
Across these workshops, we'll build
everything you need to get started with
realworld AI use cases from voice agents
to deep research assistants to
multi-agent workflows. You'll get sample
code, practical exercises, and
open-source example repos so you can
follow along and build. We'll also dive
into today's most popular AI tools and
frameworks, showing you how to
incorporate fast, cutting edge inference
into your projects. Let's get into it. I
can't wait to see what you build.
Welcome everyone to build your own sales
agent. I'm Sarah Chang from Cerebras and
I'm so excited to be joined by Russ Daw,
CEO of LiveKit today.
Thanks Sarah. Today we're going to be
walking through how to build a voice
sales agent that can have natural
conversations with customers. Our sales
agent will pull product context from an
external source and can respond in real
time.
This isn't just a simple chatbot. We're
building a fullfeatured AI agent that
can speak, listen, and respond
intelligently using your company's sales
materials. By the end of this workshop,
you'll have your very own working sales
agent that can handle customer inquiries
just like a human would.
We have a complete code notebook for you
to follow along with, and you'll be able
to keep experimenting and building even
after today's workshop is over. Before
we get started, let's go over what
you'll get out of this workshop. Free
API credits for LiveKit, Cerebrus, and
Cartisia, a complete quick start guide
to build apps with our technologies, and
your very own functional sales agent
that you can customize. Before we dive
in, please make sure you have the
notebook open. If you haven't already,
go ahead and click on the collab link we
shared.
Let me start by showing you what we're
building toward. In the future, most
customer interaction will be AI powered,
but instead of just typing back and
forth with a chatbot, you're going to
have a real conversation using your
voice.
Voice agents are becoming the new
frontier because they're more natural,
more engaging, and frankly more human
than traditional chat bots. When someone
calls your business, they want to talk
to someone who understands them, not
through a phone tree. So, Russ, taking a
step back, what exactly are AI voice
agents? AI voice agents are stateful,
intelligent systems that can
simultaneously run inference while
constantly listening to you speak. They
can engage in real natural conversation.
They have four key capabilities. The
first is that they understand and
respond to spoken language. They don't
just spit out answers based on string
matching or keywords. They understand
the meaning behind what people are
saying. This means they can handle
complex tasks and questions. Someone
might ask, "I'm looking for a product
recommendation." The agent can look at
the user's purchase history, the shop's
current stock levels, and recommend
something they'd like. You might see
this referred to in some places as
multi-agent or workflows. Speech is the
fastest way to communicate your intent
to any system. When you can just say
what you want, there's no typing, no
clicking through menus, and no learning
curve. People have been speaking to one
another their whole lives. So a computer
system with the same interface can take
advantage of that familiarity. Lastly,
none of this is possible unless the
agent can keep track of the state of the
conversation. Communication is highly
contextual and your agents need to have
state so they can hold a coherent
conversation across time.
I see. And I imagine this makes them
perfect for things like customer
service, sales conversations like
qualifying leads and answering product
questions, technical support, walking
people through solutions step by step,
or even information retrieval, finding
exactly what someone needs from your
knowledge base.
Now, let's talk about what's actually
happening inside the voice agent when
you're having a conversation. The first
step in this pipeline is a transcription
phase or ASR, automatic speech
recognition. Ideally, we only want to
send speech, not silence or background
noise, to our speechtoext model. To help
with that, we add a small VAD voice
activity detection model in front of ST,
which detects if what's being picked up
by our microphone contains human speech.
We'll use VAD to filter out any audio
that isn't human speech before it
reaches SDT. This improves the accuracy
of our ST transcription, helps with
understanding when you're done speaking,
and the LLM can start speaking. It also
saves you a ton of money by not having
to constantly stream audio to the ST
model. Once speech is detected, the
voice data is forwarded to ST. This
model listens to and converts your words
to text in real time. The last step in
the layer is end of utterance or end of
turn detection. Being interrupted by AI
every time you pause is annoying. While
VAD helps the system to know when you
are and aren't speaking, it's also
important to analyze what you're saying,
the content of your speech to predict
whether you're done sharing your
thought. We have another small model
here that runs quickly on the CPU. It
will instruct the system to wait if it
predicts you're still speaking. Once
your turn is done, the final text
transcription is then forwarded to the
next layer. Then comes the thinking
phase. Your complete question gets sent
to a large language model. Think of this
as a brain that understands what you're
asking. The brain might need to look
things up like checking your product
catalog or calling other services to
give you the right answer. Once it
figures out what to say, it starts
generating a response sentence by
sentence.
The third and final step is the speaking
phase. As the LLM streams a response
back to the agent, the agent immediately
starts forwarding those LLM tokens to
the TTS engine. This generated audio
from TTS streams back to your client
application in real time. That's why the
agent can start speaking while it's
thinking.
And the final result, a conversation
that feels natural and immediate even
though there's a lot of complex
processing happening behind the scenes.
There's a lot of moving pieces here, but
LiveKit's agents SDK is going to handle
all this orchestration and data
management for us. It manages the audio
streams, keeps track of context, and
coordinates all these different AI
services, so you don't have to worry
about the technical plumbing.
Awesome. And now that we have our basis
covered, let's get everyone set up. You
can access the starter code here. This
will take you directly to our Google
Collab notebook where you can see the
starter code for today. First, we need
to install all the necessary packages in
your notebook. Find the first code cell
and run it. You'll see this command. Go
ahead and run that cell now. Click on it
and press shift and enter or click the
play button. While that's installing,
you should see some output showing the
packages being downloaded. This installs
light agents with support for Cartisia,
Cilero for voice activity detection, and
OpenAI compatibility. While you're
pulling that up, let me explain the key
technologies that make this project
possible. First, let's talk about the
brain of our operation, the LLM. For
today's workshop, we're using Llama
3.37B,
which is from Meta's latest family of
open-source AI models running on
Cerebrus, the lightning fast inference.
Speed is always critical here. You can
have the most sophisticated LLM
available, but if the inference is slow,
the conversation feels slow and broken.
Exactly. And that's the challenge most
voice agents face today. If you're using
traditional GPUs, you're looking at five
to seconds per response. For a phone
conversation, that's just painful.
Nobody's going to wait around for that.
They'll just hang up.
So, that's where Cerebrus comes in.
We're about 50 times faster than
traditional GPUs. So, instead of those
multisecond delays, you'll get responses
in milliseconds. When someone asks the
AI sales agent, "What's your pricing?"
They expect an immediate answer just
like they would from a human. For voice
agents, speed isn't just a nice to have.
It's table stakes. When people talk to
each other in everyday conversation,
they have less than 500 milliseconds of
total latency between turns. When we
stretch AI agents response times too far
past that, it stops feeling natural.
And as a final note on Cerebrris, this
is the AI processor running these
models, the Cerebrus wafer scale engine,
WSE3. It's a massive AI chip that
delivers the fastest inference in the
world. As you can see in the benchmark,
we're delivering 2591
tokens per second with Llama 3.3. That's
five times faster than the next best
provider.
First, let's install the LiveKit CLI.
This is optional for our workshop today,
but if you want to use LiveKit beyond
this, here are the commands depending on
your system. Today we're using a Python
notebook so that nobody needs to battle
with their environment when they're
getting started. But if you want to
build and deploy any agents that other
people can interact with, the CLI is by
far the easiest way to do it. Just type
LK appcreate and instantly clone a
pre-built agent like the one we're about
to build here. Let's talk a bit about
what exactly LiveKit is and why we need
it for our voice agent. The existing
internet wasn't really designed for
building voice AI applications. HTTP
stands for hypertext transfer protocol.
It was designed for transferring text
over a network. For what we're building,
we need to transfer voice data over a
network with low latency. Livekit is a
real time infrastructure platform for
doing just that. Instead of HTTP, it
uses a different protocol, WebRTC, to
transport voice data between your client
application and an AI model with less
than 100 milliseconds of latency
anywhere in the world. It's resilient,
handles any number of concurrent
sessions, and is fully open- source, so
you can dig into the code and see how it
works, or even host it yourself if you
want. You can use LiveKit to build any
type of voice agent, including ones that
can join meetings, answer phones and
call centers, and in our case today, an
agent that can speak to a prospective
customer on your website.
Here's the key part for Voice Agent.
LiveKit acts as middleware between your
AI and your customers. When someone
wants to talk to your agent, LiveKit
makes sure the audio gets from their
phone or computer to your LLM and then
gets the LLM's response back to them,
all in real time. We take care of the
hard parts so you can focus on your
application. Connection management,
routing information between data
centers, traversing firewalls, or
adapting to spotty cellular connections.
You don't have to worry about any of it.
Our goal is to make building a voice
agent as simple as building a website.
Here you can see those boxes labeled
LLM, TTS, and ST. Those are the AI
components we talked about earlier that
help the agent listen, think, and speak.
LiveKit is the real time layer ensuring
data flows smoothly between your
customers and all of your AI components.
In addition to Cerus and LiveKit, we
will also be using Cartisia. The final
component we need is the actual speech
processing to turn voice into text and
text back into voice. We need
specialized models. That's where our
partner Cartisia comes in. Cartisia has
their own flavor of whisper large v3
turbo called ink that's focused on
real-time accuracy and time to first
bite latency. When you talk to our sales
agent, INC converts your speech into
text that the AI can read and
understand. INC is pretty fast.
Transcriptions come back within 60
milliseconds of you finish speaking.
Then when the AI has something to say
back, Cartisia takes that text and
converts it into speech that sounds
natural and humanlike. It's like having
a really good interpreter who works both
ways. They also handle all the messy
parts of human speech, like when people
pause or interrupt each other or say a
lot. Now, let's take a moment to set up
our API keys for Cerebrris, LiveKit, and
Cartisia. In the second code cell,
you'll need to replace the placeholder
API keys with your actual keys. The
links to get these free API keys are all
in your notebook. Now that our API keys
are set up, step two is about teaching
our AI sales agent about your business.
Think of it like training a new
employee. You would put someone on the
phone without telling them what you
sell right?
The challenge with LLMs is that they
know a lot about everything, but they
might not know many specifics about your
company. LLMs are only as good as their
training data set. If we wanted to
respond with information that isn't
common public knowledge, we should try
to load it into the LLM's context window
to minimize hallucinations or the I
can't help with that responses.
This is where something called rag comes
in, retrieval augmented generation.
This process is simple in concept. We
feed the LLM a document containing
additional information. For example, if
we load our pricing details into the
LLM's context window, then when someone
asks about pricing, it's easy for the
LLM to look up that information and
return an accurate answer.
So, for this demo, we'll load in
information like product descriptions,
pricing info, key benefits, even
pre-written responses to common
questions like, "It's too expensive."
That way, our agent always stays on
message and has the context it needs to
generate accurate responses. Let's take
a look at our notebook and see what that
added context looks like in practice.
Under step two, we've organized all the
information our sales agent needs into a
simple, structured format that's easy
for the AI to understand and reference.
You can see we have everything a good
salesperson would need. A clear product
description that explains what we're
selling, a list of key benefits that
highlight why customers should care, and
specific pricing for each tier. But
here's the really fun part. Those
objection handlers at the bottom, these
are pre-written responses to the most
common things customers say when they're
hesitant to buy.
When someone says, "It's too expensive,"
the agent already has a proven answer
ready. For example, I understand the
cost is important. Let's look at the
ROI. Most clients see 3x return in the
first six months. When they say, "I need
time to think," the agent knows to ask,
"what specific concerns can I address
right now?" Rather than just saying,
"Okay, call us back."
This gives your agent a loose script.
You can fill this with the responses
that have converted best for your sales
team. LLMs are non-deterministic,
so it's not going to say the exact same
thing every time, but it provides a
solid framework for the agent to follow
while it's talking to prospects. Now,
let's fill out our load context function
and load this information into our agent
and see how it uses this knowledge
during conversations. Next is the
exciting part, step three, where we
actually create our sales agent. This is
where we take all of those components we
just talked about and wire them together
into a working system.
Before you run anything, let's walk
through what's happening in the sales
agent class. We start by loading our
context using the load context function
we defined earlier. This gives our agent
access to all the product information,
pricing, and objection handlers that we
set up. Then we configure the four
components that are part of our voice
pipeline. The LLM is a Llama 3.370B
running on Cerebras. Llama 3.370B is a
good balance between speed and quality,
and it's great at tool calling, which
we're going to need later. For
speechtoext, we're also using Cartisia's
Ink Whisperer engine, which is really
fast. On the texttospech side, we're
using Cartisia again, partially because
it means you only need one API key, but
also because their TTS engine, Sonic, is
also really fast. For voice activity
detection, we're using Cero, which is
the default option for the agents
framework and has a light footprint and
really fast performance.
Now, let's look at how we actually
implement this in code. The instructions
are important, but if we tried to show
the whole prompt here, the text would be
really small. The full version is in
your notebook. We start by telling the
agent, you are a sales agent
communicating by voice and give it
important rules like don't use bullet
points because everything will be spoken
aloud and most importantly only use
context from the information provided.
If someone asks about something not in
the context, say you don't have that
information.
The super call initializes our agent and
passes all of our configurations to the
parent agent class. This sets up the
agent with our LLM, ST, TTS, VAD, and
instructions all working together.
We also define an on- enter method to
start the conversation. This is
triggered as soon as someone joins a
conversation with the agent. Instead of
sitting in silence, it immediately
generates a greeting and offers to help,
just like a good salesperson would. Now,
go ahead and let's run the step three
cell to define our sales agent class.
Step four is our launch sequence. This
is how we actually get our agent up and
running so people can talk to it.
Think of this entry point as the start
button for our agent. When someone wants
to have a conversation, this is what
kicks everything into gear and gets the
agent ready to talk. The entry point
function does three main things. First,
it connects the agent to a virtual room
where the conversation will happen, like
dialing into a conference call. Then, it
creates an instance of our sales agent
with all the setup we just configured.
Finally, it starts a session that
manages the back and forth conversation.
This session glues together all of our
model configurations, media streams,
tool configurations, and maintains the
conversation history.
Usually, you'd have a front end like a
mobile app or a web page where you're
speaking to your agent. But today, to
make it easy, we're going to create a
minimal web interface right here in the
notebook.
Before we run this, let me set
expectations. When you execute this
cell, it's going to load several AI
models and establish connections to
multiple services. The first time might
take 30 to 60 seconds, so be patient.
Once you start your agent up, you'll see
some initial log output. You can think
of this as similar to booting up a
Nex.js app or a web server. Your agent
server waits for new conversation
requests from your customers. When a
request comes in, LiveKit will connect
your agent and front end together so you
can start speaking. Go ahead and run the
step four cell now. Watch the output and
wait for the interface to load.
If you see any errors about expire
tokens, just stop the cell and run it
again. The cell will request a new room
join token from the Jupiter proxy and
you'll be able to connect to the room.
Great. Now we have a fully working sales
agent, but let's keep going to make it
even more robust. This part here is
completely optional, but here are a few
ways you can expand your sales agent.
First, let's stop the current agent by
clicking the stop button on the cell
that's running or pressing the interrupt
button in your notebook.
Now, one thing we can do is expand our
single agent into a multi- aent system.
Why would we want to do that? Instead of
just having a single one to answer every
question.
Great question. Think about how real
sales teams work. You don't have one
person who's an expert at everything.
You have specialists. If someone calls
asking a deep technical question about
some API integration, you want them
talking to your best technical person,
not your pricing specialist. LLMs have
limited context windows, which means,
similar to people, they have limits on
the amount of things they can specialize
in. We can also tailor the
conversational style of the topics that
we're talking about. If there's a
conversation about technical issues,
it's less important for us to talk about
value props and ROI. In our case, we're
not actually pushing in that much
context. But this is an important lesson
to learn when you're creating larger,
more complex agents for production. For
a more complex system, here are three
agents that we've defined. A greeting
agent is our main sales agent who
qualifies leads. A technical specialist
agent is specialized on solving
technical issues and a pricing
specialist agent handles budgets, ROI,
and deal negotiations.
The magic is in the handoffs. Our
greeting agent figures out what the
customer actually needs, then smoothly
transfer them. Let me connect you with
our technical specialist who can dive
deeper into those integration questions.
Each specialist has its own voice,
personality, and specialized knowledge.
The technical agent can go deep on specs
without getting wrapped up in trying to
sell anything, while the pricing agent
can focus purely on ROI and budget
decisions.
It's like having your best sales team
available 24/7. Go ahead and run the
enhanced sales agent cell. Now run the
technical specialist agent cell followed
by the pricing specialist agent cell.
Finally, let's run the multi- aent entry
point. This will start our new system
with agent transfer capability. This
multi- aent system also uses tool
calling. When a customer asks about
technical details, our agent can
transfer them to a technical specialist
who has a different voice and
specialized knowledge. Let's implement
this enhanced system. Scroll down to the
challenges section and find step five.
First, run the cell that imports the
function tool. This gives us the ability
to create transferable functions. Now,
let's look at our enhanced sales agent.
You'll see we add function tools to
allow agent transfers. And that's it. In
less than an hour, you've built a
sophisticated multi- aent voice sales
system that can have natural
conversations with customers, transfer
between specialized agents, use your
actual product information, and handle
objections professionally.
Remember, you have free API credits for
all three platforms, so keep
experimenting. Try adding your own
product data or customize or expand the
agent personalities and voices. maybe
integrate with your existing systems or
external APIs.
The code notebook will remain available
to you and we've included links to get
free API keys for Cerebrus, LiveKit, and
Cartisia. If your agent isn't working
perfectly right now, don't worry.
Sometimes it takes a few tries to get
the microphone permissions set up
correctly or you might need to refresh
and run the cells again.
The most important thing is that you now
understand the architecture and you have
the complete working code to take with
you.
Happy building.
Welcome everyone to build your own deep
research. I'm Sarah Chang from Cerebrus
and I'm here today with Will Brick, CEO
of Exa. We're super excited to have you
all here.
Thanks Sarah. Today we're going to build
something pretty cool, a deep research
style assistant that can automatically
search the web, analyze multiple
sources, and provide structured insights
in under 30 seconds. We'll code it up
with you.
That's right, Will. By the end of today,
we'll learn how modern AI powered deep
research systems work under the hood. So
before we get started, let's do a quick
overview of what we're walking away with
today. First, free API credits for both
Cerebrus and Exa so you can continue
developing after this workshop ends.
Second, a complete quick start guide to
build apps with both API so you can jump
start your future projects. And third,
your very own functional research agent
that goes way beyond basic search.
Now, let's see what we're building. This
is Cerebris's deep research interface.
It looks clean and simple, but there's
some serious intelligence under the
hood. Our coding today will build the
functionality behind this. If you
notice, it's not just returning search
results. It's actually doing real
research, searching multiple sources,
analyzing the content, identifying
knowledge gaps, and then doing follow-up
searches to fill those gaps. You might
take an hour to do all this work, but
this app does it in less than a minute.
Your deep research will be 10x faster
than Chach's deep research.
Before we dive into AI powered research,
let's talk about how research was
traditionally done. In the old days
before 2025, researchers somehow used to
write their own research papers and
reports. Let me walk you through each
step as we build this diagram together.
So, first it all starts with a research
question. Once you have your question,
you need to figure out where to look for
for answers. So, you'd branch out in
multiple directions. You'd go to the
library to search through physical books
and journals. You'd do Google searches.
And if you were lucky, you'd interview
experts in the field. After gathering
all these sources from libraries, search
engines, and expert interviews, what's
the next step? You'd sit down and take
notes,
right? That means reading through
literally everything you can,
highlighting key passages, writing
summaries, and keeping track of where
each piece of information came from. You
can imagine doing this with hundreds of
sources.
And this is where human limitations
really showed up. You could only read so
much, process so much, and remember so
much. Important connections between
sources might be missed completely
because of information overload.
Finally, you take all these notes and
compile them into an answer. This is
where the real intellectual work
happened, synthesizing information from
multiple sources into coherent insights.
In reality, research is a recursive
process. You never actually know if you
found all the relevant information in
the first pass, so you have to go back
and do it again with your new knowledge.
Each loop through this process could
take hours or days and there's always
this nagging feeling that you might have
missed something important. Luckily, AI
can do days of research in 30 seconds.
Now, what do we mean when we talk about
deep research powered by AI? AI powered
deep research incorporates all those
research steps like gathering sources
and analyzing notes, but uses LLM to
dramatically accelerate each step.
For starters, instead of going to the
library or searching on Google, we use
tool calling with a web search API. In
this case, Exo. Our AI system can
autonomously break down the initial
research question into the web search
API calls that would answer the
question.
Next, instead of the take note step from
traditional research, we have retrieval
augmented generation or rag. Rag just
means that the LLMs get to retrieve from
the web before generating because they
don't have the whole web memorized and
often make stuff up. With rag, our web
search API gives the LLM fully informed
context, so it's super accurate.
Finally, instead of us manually reading
and analyzing all of these notes, we let
the large language model do it for us.
And like traditional research, there's a
feedback loop. Our AI system identifies
gaps in its rag output and triggers any
new searches if necessary. This loop
repeats multiple times until the LLM
decides that the answer has been
reached.
And because we're doing so many LLM
calls, speed becomes absolutely
critical. This is why we need Cerebrus.
When you're chaining together 10 or 15
LLM calls to complete a research task,
even small delays can add up really
quickly.
So, now that we understand what AI
powered deep research is, when should
you actually use it? Deep research takes
a lot longer than a typical search, but
it's perfect for things like answering
really hard questions like what project
to work on or what's the meaning of life
or keeping up with fastmoving fields
like AI or biotech.
Now, let's get everyone set up to build
our own deep research systems. You can
access the starter code here. This will
take you directly to our Google Collab
notebook where you can see the starter
code for today. The notebook contains
all the code we'll be working with plus
some additional examples and extensions
you can explore after the workshop.
First, we need to install our
dependencies and set up the environment.
In your collab notebook, you'll see
we're installing Exopy and Cerebrris
cloud SDK. Run that first cell to get
everything installed.
While that's running, let's go ahead and
grab our API keys. You'll need a
Cerebrus API key, and you can get
started for free at cloud.sris.ai, AI
and an Exa API key, which you can also
get for free at exa.ai. While you're
installing these packages and getting
your API keys, let me explain a bit more
about Cerebras and Exa. First, let's
talk about the brain of our operation,
the LLM. For today's workshop, we're
using Llama 4 17P Scout, which is from
Meta's latest family of open- source AI
models running on Cerebras hardware.
Cerebras' speed is a huge unlock here.
The faster the language model can run
inference, the less time our users have
to spend waiting for the research report
to come back. Especially with deep
research where we're chaining together
multiple LM calls, speed is super
important.
It's a great example of something you
can only build with fast inference.
Today, most deep research products take
multiple minutes to return a report.
Cerebrris unlocks a completely different
experience. We serve the top open source
models in the industry with 50 times
faster inference than traditional GPUs.
This is the actual hardware making it
all possible. The Cerebrus wafer scale
engine WSE3. It's a massive AI chip that
delivers the fastest inference in the
world. As you can see in the benchmark,
we're delivering 2,800 tokens per second
with Llama for Scout, five times faster
than the next best provider.
Now that we've covered how Cerebrus will
run our language model, the second
component is to give our LLM search over
the web. That's where Exa comes in. We
built our search engine from scratch to
connect LLMs to the web. X is actually
built specifically for AIS and we've
built all the features to make it super
simple. For example, we don't just
return search results, we return the
full content of each result page so that
your LLM has full context. In fact,
we're actually also the fastest search
API in the world.
Now that our API keys are set up, step
two is all about building our core
search function using Exap. Looking at
our research flow diagram here, this is
the first red box where we take a
question and find relevant sources on
the web. So, let's build this search
function and see how it works. In your
notebook, you'll see step two where we
define our search function. Here's what
the function looks like in practice. On
the left, we're creating a function
called search web. Let me walk you
through the actual code. Notice in our
search web function, we're using the
search and contents endpoint. This means
that not only are we doing a web search,
but for each URL returned, we're also
getting the crawled content from that
page. We also specify type auto. This is
really powerful. It automatically
chooses between keyword and neural
search. based on your query. You don't
have to decide which approach to use.
Think of it this way. If you search for
something like Python programming
tutorials, that's clearly a keyword
search. But if you search for companies
that are disrupting traditional banking,
that's more conceptual and benefits from
neural search. Exa figures this out
automatically. The text parameter
controls how much content we get from
each source. For deep research, we want
substantial content, not just snippets.
This gives our LLM rich content to work
with instead of trying to piece together
tiny fragments. Go ahead and run this
function now. Try searching for
something like space companies in
America and see what you get back. You
should see both the search results and
the full content from each page rendered
as clean markdown.
On the right side of the screen, you can
see the kind of results we get back.
Notice we're not just getting titles and
snippets. We're getting the full content
of relevant pages. This is crucial for
the LLM to do deep analysis rather than
surface level summarization.
Now let's move to the next step in our
flow. Taking all those relevant sources
we found and feeding them to our LLM for
analysis. This is the next step in our
diagram powered by Cerebrus.
This is where the magic of rag really
happens. We're passing the LLM fresh
relevant information from the Exa API
web search to work with. And this is
where Cerebrris' speed becomes critical.
We're about to start chaining multiple
LLM calls together. So each individual
call needs to be fast. A few seconds
here and there quickly adds up to
minutes of waiting.
Now let's look at the actual
implementation. Here's what's happening
under the hood. We're taking all that
rich web content from Exa and feed it
directly to the Cerebrus LLM along with
our original question. In our ask AI
function, we're using a low temperature
of 0.2. Think of temperature like a
creativity dial. Lower numbers give you
more focused deterministic responses.
Since we want factual analysis, not
creative writing, we keep it low. The
model we're using, Llama For Scout, is
specifically optimized for reasoning and
analysis tasks like this. And because
it's running on Siri versus WS3 trip,
you'll see responses in seconds, not
minutes.
Look at how we structure the prompt.
We're not just asking, "What do you
think?" We're giving the LM specific
instructions. Analyze these sources,
answer this question, and format your
response clearly. The LM reads through
all the sources simultaneously,
something that would take a human hours
and synthesizes insights in just
seconds. This is the power of combining
fast inference with comprehensive source
material. You can now go ahead and run
the ask AAI function with some sample
content. You'll see how quickly it
processes even large amounts of text and
returns structured insights.
Now, let's put it all together into a
function called research topic. This is
our basic research function that follows
a classic flow. Ask a question, do an
Exo web search, pass the relevant
sources to the LLM, then return a
response. If you walk through the
research topic function in your
notebook, you'll see it searches for
sources using our exo function, filters
for substantial content over 200
characters, creates context for the LLM,
then asks the LLM to analyze and
synthesize an answer.
The key is in the prompt structure.
We're asking for both a summary and
specific insights in a structured
format. This makes the output much more
useful than just a wall of text. Try
running this on a topic you're curious
about, something like climate change
solutions 2025 or quantum computing
advances. You should see it finds
multiple sources and gives you a real
analysis.
In the next and final step, we're going
to expand on our basic research model.
This mirrors how human experts actually
research. Instead of just doing one
round of web search and LLM synthesis,
after the first web search, the LLM
identifies the most important gaps in
understanding and does a targeted second
web search before producing an answer.
Let's take a look at step five where we
implement a basic recursive version of
our deep research. If you look at the
deeper research topic function, it does
everything the basic version does but
then adds this intelligent follow-up
layer. The implementation has two key
steps. After the first analysis, we ask
the LLM based on these sources, what is
the most important follow-up question
that would deepen our understanding of
our query. two, we then search for that
specific missing topic and combine both
layers for our final analysis.
Again, look at the prompt engineering.
We're not just asking for more
information. We're specifically asking
the LLM to identify gaps and formulate
targeted search queries. This is much
more sophisticated than just doing
random additional searches. You can go
ahead and run the enhanced version on
the same topic you tried before. You
should see much richer, comprehensive
results.
Now that you have the core system
working, let's talk about where you can
take this next. The beauty of what we've
built is that it's modular. You can
enhance any piece independently. Some
ideas for expansion include adding more
search layers, integrating with
specialized databases of academic
papers, patents, etc. or adding
different types of analysis and
sentiment, trend detection or
competitive intelligence.
You could also experiment with different
LLM models, add memory between searches,
or even build domain specific research
agents for things like market research,
academic research, or technical due
diligence. We could stop here, but for
those who want to go deeper, we're going
to look at the approach Enthropic uses
for their deep research agent. This is
completely optional, but it will give
you an idea of what production systems
look like. In the Enthropic approach, a
lead agent breaks down complex queries
into specialized subtasks, multiple sub
aents working simultaneously. Then, the
lead agent synthesizes everything
together, and decides whether to kick
off more sub agents or generate the
final report. By splitting the task
across agents, you keep the context
manageable for each individual agent.
Think of it like a research team where
everyone has their own specialty.
This is huge for reliability. It makes
sense, right? Humans also function
better when they're given a well scoped
task versus trying to manage 10 things
at once. Also, by running sub aents in
parallel, you can run dozens of searches
while keeping the response time
manageable. Instead of doing searches
sequentially, you're doing them
simultaneously. The enthropic
multi-agent research function in your
notebook shows a simplified version of
this approach. It's more complex, but it
is better at handling difficult topics
that require stringing together multiple
sources.
And that's a wrap. You've just built a
sophisticated research system that
combines the best of web search and AI
analysis. You now understand how modern
AI powered research works under the
hood.
Remember, you have free API credits for
both Cerebris and Exa, plus the complete
code guide to keep experimenting. You
should build something cool. The
techniques you learned today, rag,
multi-layer search, intelligent
follow-up, these are the same patterns
used by production research systems that
we use at places like Exa
and us at Cerebras. Now go build some
amazing research agents.
Welcome everyone to automate user
research with AI. I'm Sarah Chang from
Cerebrris and I'm so excited to be
joined by Lance Martin from Linkchain
today.
Thanks Sarah. Today we're going to walk
through how to build an AI powered user
research system that can automatically
generate user personas, conduct
interviews, and synthesize product
feedback in under 60 seconds.
This isn't just about automating surveys
or forms. We're building a sophisticated
AI research system that can generate
endtoend simulated interviews using
Langraph workflows. By the end of this
workshop, you'll have your very own
working user research system that can
compress weeks of work into minutes. We
have a complete code notebook for you to
follow along with, and you'll be able to
keep experimenting and building even
after today's workshop's over.
Before we get started, let's go over
what you'll get out of this workshop.
Free API credits for Cerebrris, a
complete quick start guide to build apps
with Cerebrris, Langchain, and Langraph,
and your very own functional AI user
research system that you can customize.
Before we dive in, please make sure you
have the notebook open. If you haven't
already, go ahead and click on the
collab link we shared.
You can scan the QR code on the screen
to access the starter code. Let's make
sure that everyone has this open before
we continue. So Sarah, taking a step
back, what exactly is user research?
User research is a systematic process of
gathering and analyzing information
about your target audience. The goal is
learning user behaviors and needs,
validating product ideas, and making
better product decisions. Everyone from
startups to big companies like Google,
Netflix, or Dropbox all have teams that
conduct user research before launching a
new feature or creating a product
roadmap. It's something that all
companies invest heavily in because it's
essential for building products people
actually want. It also takes an
extremely long amount of time. So now
let me walk you through how this is
traditionally done. We always start with
a user question. Then for each user you
create interview questions. This takes
time to get right because you need
open-ended questions that actually get
useful insights. Bad questions lead to
bad data. You need questions that reveal
motivations, not just surface level
preferences.
That's right. Next comes recruiting
participants. And this is where things
get very expensive. You need to find the
right people who match your target
demographic. This is often the biggest
bottleneck, finding qualified
participants who represent your actual
users.
Exactly. That's two to three weeks just
to find the right people to talk to.
Then you're scheduling interviews around
everyone's availability. Coordinating
schedules, sending reminders, dealing
with no-shows is a logistical nightmare.
Then conducting interviews, another one
to two weeks of actual conversations,
transcribing, and organizing responses.
Each interview might be 30 to 60 minutes
plus transcription time plus organizing
all of that qualitative data.
And finally, analyze responses, pulling
out themes, identifying patterns,
writing up actual insights. That's
another week. And this is really where
most research projects die. You have all
this data, but turning into actual
insights is a lot of hard work. And
here's the complete timeline. User
questions all the way through to final
insights. All in all, we're looking at 6
weeks total, and that's if everything
goes smoothly. This timeline really
kills innovation speed. Startups really
can't afford to wait six weeks to get
user feedback.
By the time you get results, your
product might have already changed or
your competitors might have moved ahead.
In the era of AI assisted coding,
engineer teams can build products before
PM and design teams can validate if
they're a good idea to build. This 6
week research timeline is simply too
long for modern organizations. And
that's what's so exciting about what
we're building today. What if we could
automate this entire process? Instead of
recruiting real people, we can create AI
personas that represent your target
users. AI user research creates multiple
AI personas and runs hundreds of
simulated interviews in minutes. We're
talking about genuine AI agents that can
think and respond like real users. So
now instead of scheduling interviews, we
can simulate them instantly. The AI can
roleplay as different types of users and
give you realistic responses. Look at
this. It's the same process, but every
step is automated and happens in
seconds, not weeks. Our system has four
AI powered components. We AI generate
the interview question based on your
research topic. Then the AI creates
diverse personas that match your target
demographic. Next, the AI runs simulated
interviews between the researcher and
the persona. And finally, the AI
analyzes all the responses to give you
actionable insights.
That's right. Each step that took weeks
can now take seconds. And you can
iterate on research or product questions
in real time. This is the magic. 6 weeks
becomes 60 seconds. You can test
multiple research approaches instantly
and get feedback immediately. When you
can test product feedback or ideas that
quickly, suddenly you can iterate very
quickly or instantaneously. Now, let's
switch over to the code. You can access
the starter code here and we'll be
walking through the notebook step by
step. If you haven't already, make sure
you have the starter code open. We're
about to dive into the technical
implementation. First, we need to
install all the necessary packages in
your notebook. Find the first code cell
and run it. You'll see this command. Go
ahead and run that cell now. Click on it
or press shift enter or click the play
button. While that's installing, you
should see some output showing the
packages being downloaded. This installs
Cerebrus and Langraph. First, let's talk
about the brain of our operation, the
LLM. For today's workshop, we're using
Llama 3.37B, which is from Meta's latest
family of open- source AI models running
on Cerebris for lightning fast
inference. Speed is always critical
here, especially when we're conducting
hundreds of AI interviews with multiple
AI users and questions. Every
millisecond counts. With Cerebras
delivering over 2,500 tokens per second,
we can simulate an entire research
study, 10 personas, five questions each,
plus analysis in under 60 seconds. With
traditional inference, this could take
upwards of 20 minutes. So with Cerebras,
you could run 20 user simulations in the
same time it would take you to run one
with other platforms. And as a final
note on Cerebras, this is the AI
processor running these models, the
Cerebrris wafer scale engine, WSC3. It's
a massive AI chip that delivers the
fastest inference in the world. As you
can see in the benchmark, we're
delivering over 2500 tokens per second
with Llama 3.3, five times faster than
the next best provider.
And the second platform we're going to
be using here is Langchain and Langraph.
Today, we're focused on Langraph for
workflow orchestration and Langchain for
integrations as well as structured
output. And we use Langmith for tracing
and observability. This gives us the
building blocks to create complex
stateful AI systems that can handle
multi-step research processes. We're
going to be using three key lang chain
features today. Model abstraction so we
can easily swap different providers,
standard interfaces that make our code
clean and maintainable, and structured
outputs to ensure that AI responses are
properly formatted. These features let
us focus on the research logic instead
of wrestling with AI SDKs. Now, Langraph
gives you low-level components for
building many types of AI applications,
including agents or workflows. And we
can lay out these applications in any
way we want as a series of steps or
nodes. Each node is just a Python
function, giving you full control over
the logic within each step. Think of it
like having a team of AI researchers,
each with their own expertise working
together seamlessly.
We connect those nodes or steps into a
workflow. We start with configuration.
This just generates the personas, runs
interviews in a loop, and then
synthesizes results. Langraph handles
all that orchestration and state
management for us.
The beauty is that each node can focus
on one job really well and Langraph
connects them intelligently. For a
second step, now let's set up our LLM
running on Cerebrris. Our LLM is the
brain of the operation. and handles
question generation, persona creation,
simulated responses and analysis. This
function is our interface to servers and
we'll be calling this function multiple
times throughout our code. For step
three, let's talk about state. As we
conduct all these simulated interviews,
we want some way to track the shared
information across each of these steps
and nodes. Langraph has a state object
which we can access within each node and
update once we finish running the node.
So because every node can read from and
update this state object, it can track
our questions, personas, or anything
else we want over the course of this
research process. Now, without proper
state management, AI systems can't
always maintain context across time.
Look at how the state flows through our
system. Each node read what it needs and
updates what others will use. This
shared state is what makes our multi-
aent system coherent instead of just a
bunch of disconnected AI calls. Now look
at the interview state, typed dict. It
defines exactly what information flows
through our system. Research questions,
personas, interview tracking, and final
results. This structure keeps everything
organized.
Now, for our next step, we need to build
our specialized agents. So, each node is
a specialist that performs one specific
task and updates the shared state, which
other nodes can then use.
This is where the magic happens. Each AI
agent has a clear job and does it really
well. The key insight is that nodes
don't just process data, they update a
shared state. This lets subsequence
nodes build on the previous work. It's
like a relay race where each runner
hands off exactly what the next runner
needs. For our workshop today, there are
four main nodes. Each has a specific
role in the research pipeline. The
configuration node is our first node and
is our entry point. It gets the research
question from the user and generates
interview questions automatically. This
node will initialize our research
process. It prompts the user the
research topic and types of users we
want to interview. It then generates a
configurable number of questions about
the research topic to seed our
interviews. The second node is the
personas node which creates diverse user
profiles with rich characteristics.
Different ages, backgrounds,
communication styles, everything that
makes interviews realistic. This is
where we get the diversity that makes
our research valuable. Each persona
brings a different perspective.
So once we have our personas in the
third node, which is our interview node,
we can just conduct interviews which is
actual QA simulation. Each persona
responds in character, maintaining their
personality throughout. And this is
really the most complex node. It's where
it has to manage the conversation and
flow between personas and keep them
consistent. The last node is our
synthesis node. It analyzes all the
completed interviews and generates
actionable insights. It looks for
patterns, themes, and practical
recommendations. This is where raw
interview data becomes business
intelligence.
For the next step, we need to set up our
router. In our flow, two things can
happen after an interview is completed.
Our program can move on to the next
persona or if we've gone through every
single interview, we can end the
program. The interview router decides
what happens next. Continue interviewing
the next persona or move to synthesis.
This is what makes the workflow
intelligent and adaptive. It's simple
logic, but it's what makes our system
autonomous. No human intervention needed
once it starts.
And now we connect everything together,
add the nodes, define the connections,
set up conditional routing, look at the
build interview workflow. It's
surprisingly clean for such a powerful
system. So, Langraph handles all the
complexity for us. We define the
structure and it manages the execution.
Finally, the moment of truth. We Let's
see it in action. We can input a
question and our target persona. Watch
how it generates personas with distinct
personalities, conducts interviews where
each persona responds differently and
synthesizes insights.
So, from research question to actual
insights in under a minute. That's
really the power of AI automated
research.
Great. Now we have a fully working AI
powered user research system, but let's
keep going and make it even more robust.
This part is completely optional, but
here are a few ways that you can expand
your code. A worthwhile expansion is to
implement multi-question interviews.
Instead of getting asked a series of
individual questions, the agent is able
to follow up and dig deeper into an
interview's response, creating more
natural conversations and uncovering
deeper insights. Some of our key changes
is that we've implemented enhanced
state. So, we've added follow-up
tracking and conversation context. We've
added a follow-up generator, so the AI
creates contextual follow-up questions
based on the responses. We've added a
smart interview flow, so it decides when
to follow up or move to the next
question. Conversation memory, so
personas remember and build on previous
answers. And finally, enhanced
synthesis, so analyzing the conversation
patterns for deeper insights.
Another thing you can do here is use
languid to understand what's happening
under the hood. You can look in detail
at the interviews between your personas
and the responses. And this is really
essential for auditing quality, finding
bugs or tracking costs. Building
effective AI applications often requires
two central things. Look at your data,
set up evaluations to test performance
as you update, for example, models over
time. And Langwith is a great way to do
both of these.
You can see exactly how each node
performed and optimize your prompts and
logic. And that's it. In less than an
hour, you've built a sophisticated user
research system that can generate
personas, conduct hundreds of user
interviews, and to synthesize actionable
insights in seconds.
Remember, you've all started codes. Keep
experimenting. Try adding your own
research questions, customizing persona
generation for your target demographics,
or expand the interview flow with
follow-up questions. You can also think
about using new models as they come out
and are available on Cerebras.
If your research system isn't working
perfectly right now, don't worry.
Sometimes it takes a few tries to get
the API connection set up correctly or
you might need to refresh and run the
cells again.
The most important thing is that you now
understand the LAN graph architecture
and have complete working code to take
with you. You can adapt this for any
type of user research like product
validation or market research. So happy
researching.
Learn how to build real-world AI apps in this 3-part workshop series. You'll learn to build voice agents, deep research tools, multi-agent workflows, and more. You’ll get hands on with today’s most popular tools, sample code, and open-source repos so you can follow along and build fast. This workshop series leverages Cerebras, the world's fastest AI inference provider. Get a free Cerebras API key (w/ increased rate limits) at https://cloud.cerebras.ai?referral_code=freecodecamp In this workshop, you'll learn how to build real-world AI apps in this 3-part workshop series. You'll learn to build voice agents, deep research tools, multi-agent workflows, and more.You’ll get hands on with today’s most popular tools, sample code, and open-source repos so you can follow along and build fast. ⭐️ Workshop 1: Building Voice Agents with LiveKit and Cerebras ⭐️ Learn how to build a sophisticated real-time voice sales agent that can have natural conversations with potential customers. You'll create both single-agent and multi-agent systems where specialized AI assistants handle sales, technical support, and pricing inquiries. ⭐️ Workshop 2: Creating Research Assistants with Exa and Cerebras ⭐️ Build your own AI-powered research assistant that can intelligently search the web, analyze information, and provide comprehensive answers with proper citations. You'll create a "Perplexity-style" tool that rivals commercial AI search platforms. ⭐️ Workshop 3: Developing Multi-Agent Workflows with LangChain and Cerebras ⭐️ Build an AI-powered user research system that automatically generates user personas, conducts interviews, and synthesizes insights using LangGraph's multi-agent workflow. You'll create a complete research automation system that can deliver comprehensive user insights in under 60 seconds. ⭐️ Code ⭐️ https://inference-docs.cerebras.ai/cookbook/agents/sales-agent-cerebras-livekit https://inference-docs.cerebras.ai/cookbook/agents/build-your-own-perplexity https://inference-docs.cerebras.ai/cookbook/agents/automate-user-research 🏗️ Thanks to Cerebras for providing a grant to make this course possible. ⭐️ Contents ⭐️ 00:00 Introduction 01:31 Build a sales agent with LiveKit 23:32 Build your own deep research with Exa 37:05 Automate user research with LangChain