Loading video player...
Timestamps: 00:00 - Intro 01:27 - Testing Introduction 02:52 - Task 1 Codebase Understanding 04:49 - Task 1 Result Overview 07:38 - Task 2 Control Modification 09:11 - Claude Task 2 Result 09:58 - Codex Task 2 Result 10:31 - Gemini Task 2 Result 10:49 - Task 2 Result Overview 11:25 - Task 3 Bug Fix & Feature Add 15:20 - Codex Task 3 Result 16:57 - Gemini Task 3 Result 18:28 - Claude Task 3 Result 20:19 - Task 3 Result Overview 23:08 - Task 4 Multimodal-Based Fix 24:53 - Gemini Task 4 Result 26:01 - Codex Task 4 Result 27:18 - Claude Task 4 Result 28:43 - Task 4 Game Overhaul Test 30:21 - Claude Context Limit Hit 32:12 - Task 4 Testing Setup 32:30 - Gemini Task 4 Result 33:34 - Codex Task 4 Result 35:56 - Claude Task 4 Result 37:51 - Gemini Task 4 Second Look 39:20 - Task 4 Result Overview 40:13 - Summary of Results 40:48 - Usage Breakdown 44:10 - Closing Thoughts AI Integration & Consulting: https://bijanbowen.com/ Join the Discord: https://discord.gg/hfaR2exy7S In this video, we put three of the most advanced coding-focused models head-to-head: Gemini 3.1 Pro, GPT-5.3 Codex, and Claude Opus 4.6. Rather than relying on benchmarks alone, this test focuses on practical, real-world coding tasks using their respective CLIs. Each model is given a structured sequence of increasingly complex challenges, starting with codebase understanding and moving through control logic modification, bug fixing, feature expansion, multimodal-based fixes, and ultimately a full game overhaul scenario. The goal is to evaluate how well each system handles sustained, iterative development under realistic conditions.