How to debug voice agents with LangSmith | DailyDevLists

Loading video player...

Full Transcript

976 words • EN

Voice is one of the most natural ways to

interact with AI. And as the models are

getting better, I'm excited about new

use cases and interaction patterns that

it's going to unlock, especially in

industries like education and customer

service. It's surprisingly easy to get

started building a voice agent. And so

let's go through that in this video. I'm

Tannushri and I'm going to show you how

to build a voice agent, specifically a

French tutor with this framework called

Pipecat. going to walk through how it

works end to end and then we've also

hooked up observability into Langmith so

we can peel back the layers and show you

what happens in each step of your voice

agent.

So let's start by just an overview of

how uh how this voice agent works. So

there's three main steps in the voice

agent. Uh there's a speech to text or

ST.

There's the LM call and this is text in

and text out. It's just a regular um

sort of a regular textbased model. And

then lastly is the text to speech step

that takes the text and then adds audio

to it.

So I'll show you a quick demo of of an

agent I've built. Um this is I'm

learning French. This is a French tutor.

Um and let's actually let's give it a

whirl and I can show you um how how it

looks like.

Cool. Let's take a look at the resulting

trace in Lenman so we can see exactly

what happened in each step.

All right. Um, what's really nice is

these traces are are laid out exactly as

I showed in the diagrams earlier. So,

you can see there's one turn of this

conversation. This is the speech to text

node. Um, actually, interestingly

enough, um, it didn't quite understand

what I was saying here. Um, I'm I'm

using a local model just for the sake of

of the demo. And so that's probably why

the the transcription step didn't go uh

didn't go as expected.

This is the LLM call and um system

prompt helps guide the LLM in you know

what the context is and how I want it to

respond. Um looks like it it you know

kind of saw enough context here and then

asked me uh said it said it was doing

well. asked me how I was doing. And then

finally is the text to speech steps. And

the reason there's multiple here is uh

the audio is actually streamed back um

which is great UX rather than waiting

for the entire audio. Um uh it's being

streamed back to to me, the user.

Um and so it's pretty cool that you can

kind of uncover and see all of the

layers here. Um, one thing I've been

doing a bunch of testing with is

actually instead of using uh the local

model I'm using, um, uh, testing various

models, seeing which, uh, transcription

service works best for my use case. And

so I'll actually show you really quick

um

that if I switch to using um not a local

model and using using an open AI model

directly the transcription works much

better. Let's give it a try.

Okay. And you can see that this

transcription step was way better. I

have all of the debug logs streaming in

here too. And so this shows the same

thing that if I if I pulled up the trace

um with with the new model um uh that

that would show as well. So let's let's

peel back the layers a little bit and go

into how this works. Uh I'm I'm using

Pipecat to build this agent. Pipecat is

a real time voice and multimodal uh open

source framework. Um and what I've

really liked about it is it's easy to

swap out different models. We tested

with two speechto text models in this

demo and it was really just a line of

code to swap it out. And so really the

core logic of this script is um in a

couple places. So this is this is the

area in the script where I declare which

models I want to use for each step of

the pipeline.

Um this is the system prompt. We saw

this in lang. This gets sent to the LLM

call. Um and then this is the meat of it

where the pipeline gets constructed. So,

it's taking um audio input from my

microphone, the various steps in the

pipeline. Um uh and then I also have

some additional information coming

through here that I'll I'll go over. Um

so, couple things. Namely, I have some

span processors here. Um the reason for

these is I wanted to record the audio

conversation so that I could then upload

it along with my lang traces. Um, and

this is a great best practice for when

you're tracing voice agents is is you

want to see the transcription, but also

having the audio side by side is really

helpful. So I have like the full audio

of the conversation as well as the audio

for each turn. Um, which makes debugging

really great. And then also you can send

something like this to an eval pipeline

and it has all the information you need.

And then um so the the last big chunk of

logic in this app is is I've set up

tracing to Langmith. Um we use open

telemetry to send data from Pipecat to

Langmith. Um and it is all handled for

you with with the import.

So that was the demo of building a voice

agent. Uh check out Pipecat and

Langmith. Give it a try. I think there's

some really really fun types of

applications to build and uh share what

you build with us.

How to debug voice agents with LangSmith

LangChain

82 days ago

6:27

AI Framework Development

Rank #1

Description

Learn how to debug and improve a AI voice agent using LangSmith. We’ll walk through tracing conversations, spotting failures, and iterating on your agent. In this demo we use LangChain and Pipecat, an open source framework for voice and multimodal conversations. LangChain repo: https://github.com/langchain-ai/langchain LangChain docs: https://docs.langchain.com/langsmith/trace-with-pipecat Pipecat repo: https://github.com/pipecat-ai/pipecat

Watch on YouTube

Video Details

Category

AI Framework Development

Featured Date

December 17, 2025

Quality Rank

#1

AI Recommended