Loading video player...
In this episode of Roo Cast, host Hannes Rudolph is joined by RooCode's lead developer Matt and CEO Danny, along with special guest Brian Fiocca from OpenAI. Brian kicks things off by sharing his extensive background in the startup world (including YC and Rescue Time) and explains his current role on the Applied Startups team at OpenAI. He then reveals the surprising story of how he first discovered Roo Code: he was looking for open-source evaluation suites to test how models like GPT-Five perform inside different coding tools and found RooCode's eval harness. The conversation dives deep into a detailed comparison of GPT-5 and GPT-5 Codex, with Brian explaining the key architectural differences and why one is more adaptable while the other is hyper-optimized for a specific harness. The team then explores the new frontier of "Evals as a Service," discussing how to move beyond simple correctness benchmarks (like Sweebench) and create "performance review" style evals for agentic tasks. The discussion also covers the critical importance of the Responses API for preserving chain-of-thought, the future of context memory, and Brian's top recommendations for the RooCode codebase. Resources Mentioned: OpenAI: https://openai.com/ Y Combinator: https://www.ycombinator.com/ Zed (Editor): https://zed.dev/ Minimax: https://minimax.chat/ Anthropic: https://www.anthropic.com/ Vercel AI SDK: https://sdk.vercel.ai/ Slack: https://slack.com/ Linear: https://linear.app/ CHAPTERS 0:00 - Welcome Brian Fiocca from OpenAI 0:51 - Brian's startup background (Rescue Time, YC) 2:23 - How OpenAI uses RooCode's open-source evals 4:02 - Brian's daily workflow: Testing all the tools 5:38 - Evaluating GPT-Five with the Responses API 8:14 - The "vibe test": A new eval rubric 9:52 - Using "LLM as a judge" for evals 10:32 - The big debate: GPT-5 vs. GPT-5 Codex 14:20 - Why GPT-Five's instruction following is "100%" 16:14 - "Evals are all you need" 18:35 - When new models saturate old evals 19:51 - Evals as "performance reviews" for agents 22:32 - Using job descriptions to create agent personas 24:20 - Understanding model reasoning via "status updates" 28:28 - Prompt engineering: Just ask the model 30:05 - Why the Responses API is critical (vs. Chat Completions) 33:05 - Stateful responses and SWE-bench scores 36:27 - Using GPT-Five Pro for architecture planning 38:53 - The "Team Sonnet" vs. "Team GPT-Five" speed debate 43:17 - Solving context rot with "retrospection" 47:42 - Brian's top 2 recommendations for Roo Code 47:52 - Parallel tool calling 49:03 - Use the apply patch format 50:48 - The Zed editor's agent harness approach 52:44 - How can the Roo Code community help OpenAI? 54:16 - Lightning round: What's next in AI? 55:43 - Lightning round: Editor or no editor? 56:53 - Lightning round: The future of API vs. subscription models 58:59 - Closing remarks Welcome to Roo Cast, the official podcast of Roo Code, your AI-powered development team integrated directly into VS Code. Whether you're a solo developer, part of a startup, or contributing to open-source projects, this podcast is your resource for insights, updates, and community stories. Each week, we explore topics such as: Feature Deep Dives: Learn how to maximize Roo Code's capabilities in your workflow. Community Spotlights: Hear from developers enhancing their productivity with Roo Code. Behind-the-Scenes: Exclusive insights into upcoming developments and community contributions. Live Q&A Sessions: Real-time discussions, feedback, and support from our Discord community. Subscribe now and join a community redefining software development with AI. Website: https://roocode.com/ GitHub: https://github.com/RooVetGit/Roo-Code Discord: https://discord.gg/roocode Reddit: https://www.reddit.com/r/RooCode/ LinkedIn: https://www.linkedin.com/company/roo-code X: https://x.com/roo_code Bluesky: https://bsky.app/profile/roocode.bsky.social TikTok: https://www.tiktok.com/@roo.code