Loading video player...
In this video, I'll be talking about the alleged new OpenAI GPT-5.1 models that have quietly appeared under stealth names like Firefly, Chrysalis, Cicada, and Caterpillar — and how they compare to GPT-5, Claude, GLM, and Gemini 3 in real-world tests. -- Key Takeaways: 🤖 Four new alleged GPT-5.1 derivative models — Firefly, Chrysalis, Cicada, and Caterpillar — are now appearing on Design Arena and LM Arena. ⚡ Each model seems to have a different reasoning “budget,” scaling from 16 to 256 reasoning units. 🧠 Caterpillar performs best among them, but still underperforms compared to Claude and GLM in coding and reasoning benchmarks. 🎮 Benchmark testing includes tasks like 3D Minecraft, Chessboard logic, SVG generation, and math reasoning. 📉 GPT‑5 Codex seems degraded, possibly signaling new model rollouts or inference optimizations happening behind the scenes. 🏗️ OpenAI’s newer strategy and nonprofit structure raise community concerns about transparency and performance trade‑offs. 🌐 Meanwhile, competitors like Google, MiniMax, and Z‑AI are quietly building smarter, smaller, and more reliable ecosystems. 📊 Overall, GPT‑5.1 (Caterpillar) feels like a modest upgrade — decent for reasoning, but not groundbreaking.