Loading video player...
In this video, a full voice agent is built end‑to‑end using the LangChain voice‑agent framework, AssemblyAI, OpenAI, and Cartesia. The focus is on the STT → Agent → TTS “Sandwich” architecture: Browser audio streamed over WebSockets AssemblyAI for low‑latency speech‑to‑text A LangChain agent on top of an OpenAI model for reasoning and tools Cartesia for natural, real‑time text‑to‑speech You’ll learn: Why voice agents are becoming the next UX layer for AI assistants and automations The trade‑offs between Sandwich vs speech‑to‑speech architectures How to wire microphone PCM → AssemblyAI → LangChain agent → Cartesia Where latency, streaming, and control points really live in the stack By the end, you’ll have a clear blueprint to build and customize your own production‑ready voice agent.