Loading video player...
In this video, I’ll be talking about the new GPT‑5.1, GPT‑5.1 Codex, and Mini Codex models, along with their performance, pricing, benchmarks, and real‑world coding and agentic test results across multiple tasks. -- Key Takeaways: 🚀 OpenAI has released GPT‑5.1, GPT‑5.1 Codex, and Mini Codex with major improvements in instruction following and tool usage. ⚡ GPT‑5.1 Codex performs strongly, scoring 9th overall, and shows major improvements over the previous generation. 🐼 Creative tests like SVG generation, 3D graphics, and simulations show mixed results — some great, some not so great. 💻 Coding performance varies: Rust CLI and Go TUI work well, but Godot, Nuxt, and Rust apps fail to run. 📉 Mini Codex performs poorly, placing 32nd, making it unsuitable for complex tasks. 💬 GPT‑5.1 High Reasoning ranks 16th, offering solid but slower performance. 🧪 KiloCode testing shows the model is good for planning and debugging but too slow for pair programming. 💸 Pricing remains unchanged, and caching via Responses API is improved with 24‑hour retention. 🏎️ Speed remains a major downside — only ~18 tokens per second, far slower than competitors. 📊 Overall, the new Codex models are improved but still finicky and slow for real development work.