Loading video player...
At VapiCon 2025, Justin Uberti, Head of Realtime at OpenAI, addresses the three most common questions he receives about building voice AI. Uberti, who leads the team developing ChatGPT Voice and the Realtime API, breaks down what actually defines a voice agent, compares chained architecture versus speech-to-speech models, and tackles the notoriously difficult problem of turn detection. He argues that while we're still in the scaffolding phase of voice AI, the path forward lies in end-to-end training rather than human-engineered solutions, drawing on lessons from AI history to predict where the technology is headed next. š” Topics Covered: * Defining voice agents * Chained vs. speech-to-speech architecture * Turn detection challenges * End-to-end training vs. engineered solutions * Future of natural conversation AI š¤ Speaker: Justin Uberti - Head of Real-Time, OpenAI š Subscribe to our calendar to see upcoming events https://lu.ma/vapi_events š Our website https://vapi.ai