OpenAI’s New GPT-5 Codex Model — First Impressions | DailyDevLists

Loading video player...

Full Transcript

3,172 words • EN

Hey, so OpenAI just released a brand new model, or really a condensed, refined version

of GPT -5. It's a new coding model called GPT -5 Codex, and it really

caught my attention after kicking the tires just for a little period of time. Boy,

I'm really sold. This thing's kind of awesome. I want to dive straight in, take

a brief look at what they announced, and then use it a little bit and

show you exactly what it's like to use, and maybe some of the small differences

that I've found so far. Something that OpenAI just released today is kind of a

version of GPT -5 focused specifically for software engineering. So this is intended

specifically for agentic programming, is the way they describe it. Codex just got faster, more

reliable, and better at real -time collaboration. And really, actually, I have seen that. So

here you'll see they're saying GPT -5 is a version of GPT -5 further optimized

for agentic software engineering in Codex. All right, I usually will just push through these.

You'll see that on Sweebench, it gets a better score than GPT -5 High. This

is GPT -5 Codex High and GPT -5 High. So that's exciting that they improve

the Sweebench marks as well as some code refactoring tasks. This is a benchmark

that they have that it does better at. And that's actually a pretty useful perspective

of can it refactor a large code base, which is usually something that's going to

be done over a long run. It's kind of one of these things that you

expect it will take many, many callbacks, running tests, verifying things. So that's kind of

a neat one to see. And one of the things that's important here to call

out, I loved this, Codex adapts how much time it spends thinking more dynamically based

on the complexity of the task. And so this looks very much like basically GPT

-5 in ChatGPT, where you ask it a certain question, it comes back immediately. You

ask a different question and it has to think about it. They're applying that down

here inside of the coding model as well. So they're tuning how much kind of

reasoning time they're applying to different problems. This is a really neat idea because sometimes

I'll be in the middle model in medium and give it a problem that I

think, I wish I had selected before I kick this off a high thinking model

because I think what I'm asking it to do is much bigger. That's really exciting

that they might actually be doing that kind of routing inside of the coding models

as well. It says Codex will feel snappier on small, well -defined requests. While you're

chatting and work for longer complex tests like big refactors, during testing, we've seen codecs,

GPT -5 codecs work independently for more than seven hours at a time on a

large complex task, iterating on its implementation, fixing test failures, and ultimately delivering a successful

implementation. All right. But what does that really mean as far as something like token

usage? They do kind of represent that. They've looked at all of the open AI

employee traffic against the model. And they say the bottom 10 % of user turns

sorted by model generated tokens are 93 % fewer tokens. So that

basically means on tasks that were fast and easy to turn around and complete, they

could do far less reasoning and be able to turn it out with far less

essentially cost, far less tokens. And then on the top 10%, it's when it spends

almost twice as much. And they have a chart down here. So it's a much

smaller ask down at the 10 % and a much larger task up here at

the 90%. Basically, a harder problem, more thinking. It knows how to apply more thinking.

But where we spend a vast majority of our time asking for more succinct, smaller

changes or just queries, those kinds of things, it's going to be doing much, much

more efficient. As mentioned, they also call out that they have tuned this specifically for

code reviews. And down here, you can see the number of incorrect comments that GPT

-5 High had versus Codex High. So that's a great reduction. However, of course, these

are their own numbers. Just bear in mind. But at the same time, we've all

seen code review agents. And they kind of are saying, you know, you could do

this or you could do that or this would be a good idea. And we're

looking at them thinking, well, I could do it 100 different ways. I'm not sure

that that's necessarily helpful. I think that's what they're hearing and trying to address with

this model this way. The high impact comments, so much more meaningful comments. And then

how many comments per PR. This one's kind of exciting that it doesn't feel like

it just has to comment every single time on every PR. That's kind of nice.

Less wordy. Enough of that. Let's get into the terminal. Let's take a look at

this thing running. If we run Codex in our terminal, you'll see it come up

and awesome new animation announcing GPT -5 Codex, which is just great. So

if you select GPT -5 Codex model, then it will start with that. But you

can also just select model at any time and you'll see these new Codex models

here up at the top. And I'm inside of an application that I can just

ask it about. All right. And I will say that it comes up much faster.

It does that task much faster. This is one of the things that they are

talking about. Let's say, tell me how it uses Firebase. Okay. And this is how

it comes back. And I will say right away, this already feels quite different. The

way that it communicates its information back to me is a very different experience than

previous versions of GPT -5 or other models. You'll see that it has all of

these links in the middle of the message coming back. What I would say from

this is it's very conversational. This is coming back and giving me an actual

readout. This feels much less like reams of information and more a succinct document

that it's delivering to me. I have worked with it, like I said, for about

an hour and I've gotten quite a few of these back and was very surprised

to see it in the beginning. But in the end, the engineer in me really

appreciates many of the things that they're mentioning. All right, let's try the same thing

here with the old GPT -5 medium. I get its full definition of thinking, multiple

thinking blocks, what it's searching for, kind of all of its pattern of execution. That

absolutely was not seen with the new codex model. Okay. And remember when I told

you that the other one didn't return reams of information? Here, we'll scroll back through

what this one had to say to us just from that one request. This is

everything. This is an enormous request or response to my simple request of how it

uses Firebase Now. Admittedly, it does go through a lot of really great details. GPT

-5 has been a great editing model with code, but it returns it much more

in kind of this blog post format that makes it feel like it's trying to

tell me, create a report for me and tell me a whole story instead of

the first one, which felt much more like a succinct engineering path. Here you go.

Here are the real important aspects you need. I would imagine I could ask to

get to every gory detail if I needed to. So I would greatly prefer the

new codex model that way. All right. But how is it to use, right? That's

actually kind of the important part. I'm going to share my application, my numbers application

that I've shared multiple times on this channel. And I'm going to show that when

I run this application of a recent change, we're seeing missing Firestore index information.

So this is, we're using Firestore as the database. Firestore is a cloud database solution

from Google, and usually you would like the cloud to do some of the lifting

if possible, like filtering or sorting, those kinds of things where possible before it comes

down to the client and not have every client deal with all of the data.

This is, we're missing that. We're, we don't have some solution on the cloud to

be able to filter the data the way we need. So here you go. You're

going to have to do it client side. That's what this is saying. Obviously, we

don't want that. I want to try to see if I can bring codex CLI

or really codex, essentially the codex model to solving this problem. This is the cursor

editor. You could probably use Visual Studio Code just as well. It's basically the same

thing that we're solving here. I'll show you two ways to pull this up. The

way that I use it typically is I might go into terminal mode and just

pull up the terminal panel at the bottom here, bring it up there. There is

another way that we can go through the AI pane over here. I have added

codex as a plugin. You can come out here and get the extension of codex,

bring it into the application, and you'll be able to use a panel like this

or something similar to this. And this is them saying, do you want to use

the codex 5 model? Great. Now we're into using the codex models. You can see

them down here just like we did in the previous. So we'll leave this one

on medium as well. But first, I want to show you in terminal mode because

that's just where we just were. And maybe you can make the same sense of

it here. So I'll load codex down here, open up this panel a little bit

bigger, and also hide the side panel so now it feels more like where we

were. We'll make sure what model we're looking at at this point is the GPT

-5 codex medium model. Great. So that's what we're looking at at this point. And

what I want to do is give it the error message that I was just

previously seeing. So we can say this is missing, all right, missing Firestore Index limited

video query, the problem that we know. So I'm asking it with no other context.

Can you go fix this problem for me? Okay. And what we're seeing here is

now it's going through the thinking part that we didn't see last time, right? So

this is a more complex problem that I'm giving it a random, can you go

find this problem and fix it? It's now looking through saying, okay, this is part

of the video system. You can see that it's doing references to all the different

items and how they're supposed to be used in the system. So this, I think,

is that example of it's trying to apply more thinking where more thinking is needed.

Pretty cool to see, frankly. Okay. Excellent. And so here is the final result. It

added a composite index definition. It went into the system and added this definition

file of a new index that needs to be pushed up to Firestore itself. And

this is a little trick with using Firebase. I can either go log into the

console in the Firestore environment itself, or I can push it through a command line

system that they offer, the tooling that they have at the command line. And that

would be the Firebase action that's being referenced here. The reason I point out that

interesting detail is it's telling me in what I would consider a very succinct definition.

This has been a problem that I've had with GPT -5 as a coding model

for a while, is trying to figure out what it thinks I need to do

next and having to read an enormous document to figure out, what do you want

me to do? You did all this work. You got it to some point. Have

you tested everything? Have you not? So this, so far, has been my experience that

I get a much more succinct response at the end of it. Mileage may vary.

It's very early. So this may not hold. But right now, I've been very geeked

about this. This is one of those things that made me really happy that it

can do a bunch of work and then come back and say, here's your next

step. Run Firebase Firestore indexes against your Firebase project, and that will install the index.

What I'd like to do is say, can you do that? And I will tell

you, Cloud Code is the only editor that I've had, only CLI editor that I've

seen yet, that could do this. So this will be exciting if it will open

up a shell and kind of run this command on our behalf. Let's find out.

Yes, definitely. So it's definitely kicking it off, trying to run it. We'll see if

it's successful. I won't ding it for that. Excellent. Excellent. So it ran it, and

it's saying, yep, okay, I deployed the index. That looks pretty good. It went back

and forth and fixed something that it didn't quite get right the first time around.

All of this, I didn't have to touch anything. So that really is a success,

but there's only one way to really tell if they got it right. Let's take

a look. Nope. But all right, let's kill the server and start it again, maybe?

Nope. Even on restart, doesn't quite work. Let's try one more thing. I'll give it

back to it and say, didn't quite work. Try again, and I'll report their success

or failure. Stupid human. Okay. So all right. All credit to

all hail GPT -5 codec. So all right, what did it tell me at the

end of the last message that we were sitting here? I know y 'all are

probably yelling this at the screen. It says, okay, step one, wait a few minutes,

then rerun this to see how it worked. And of course, I reran it. It

all works perfectly fine. Not a surprise. Excellent. So it took care of it. Last

one. Let's take a look at what it's doing if it's updating a visual project,

just in case that's changed. Okay. If I'm honest, I really am not. I'm not

going to judge anybody by this. This is a project that I wrote, oh, well

over a year ago at this point. So this would have been with maybe early

cursor or mid -level cursor kind of stuff. It has no agentic coding against it

other than cursor and hand -built stuff. But let's take a look in here. Let's

say we search for a game, Silksong. This is just a sample application that I

built, playing with some ideas. If I search for Silksong, which I do believe to

be a very recent game that's very popular, the problem is that there are filters

defined here, Notability and Main Game. So if we say it can be any game

and we don't want a category that has to be assigned to it, then it

shows up. What I want to do is just light this button up with a

color when there's filters applied so that it might hint to you that the reason

you're not saying something is there's a filter. This is a terrible example. It's a

very old application. But let's take a look and give it to GPT -5 Codex

and see if it can just push right through. Here we are in cursor. This

time we're going to use the side panel and we're going to use the Codex

side panel as we saw earlier. And we will use GPT -5 Codex Medium

and run it here locally. Great. So what we're going to say to this is

I'm going to give it a screenshot. OK. And I'm going to tell it when

there are filters applied that aren't just default, like when there is a main game

filter or any other filter that's applied, I kind of want to see this button

in our main accent color. So I want the button to basically light up so

that I can tell that filters are applied. If there's no filters applied, I'd like

to see it like you see it now, which appears to me to be normal.

Let's see how that goes. OK. There it goes. It did all of its work.

Filter buttons now light up whenever the filters deviate from the defaults. And I will

say it definitely works that way. So if we take Silk Song, come back over

here and the default is set up this way. It was just my definition. The

default is set up to have main game. What I need to go in is

tell it main game is not a default object. But when I go away from

whatever the default is, the button lights up as expected and goes away if you

go back to the default. This is fine. Like I said, old project, different definition

really worked quite well to puzzle through what was going on inside of the application.

It did take quite a while to figure all of this out, but it did

excellently well. So I will say my experience with this thing has been great. OK,

so this was just a quick first look at Codex. What is it? GPT -5

Codex as a model that just released. I really think it does exactly what we

want. Gets us to a smaller, more efficient model for the vast majority of the

types of changes that we're asking for and then can think very deeply when it

needs to. To me, that's great. I'm going to probably be on medium a lot

more frequently. Very often, I kind of wander into high more often than I want

to because I'm asking more sophisticated questions every now and then. And other times, I'm

absolutely not. So this to me is a really neat idea. I'll see how the

router works. I want to hear what your experience has been as well or will

be when you start using this. Let me know. Add some comments. Let the others

know what's going on down there. Thanks for coming along for the ride on this

one. And I'll see you in the next one.

OpenAI’s New GPT-5 Codex Model — First Impressions

Matt Maher

167 days ago

14:45

GPT Models & ChatGPT

Rank #1

Description

OpenAI just dropped a brand-new spin on GPT-5, and it’s aimed squarely at software engineers. Meet GPT-5 Codex — a refined coding model that feels faster, more agentic, and surprisingly collaborative in real-time. In this video, I take a first look at GPT-5 Codex, explore what’s new, and run it through real coding scenarios inside Cursor and the terminal. From Firestore indexing fixes to UI tweaks in old projects, I wanted to see if Codex could actually think deeper when needed and stay lightweight on smaller tasks. Highlights from this first run: 🚀 Faster, snappier coding responses 🧠 Dynamic reasoning that scales with task complexity 🔍 Better code refactoring and reviews (with fewer “meh” comments) ⚡ Independently worked for 7+ hours on a tough refactor 💡 More efficient token usage (cheaper, too!) 🖥️ Actually solved a Firestore indexing issue end-to-end If you’ve been curious about what “agentic programming” really looks like in action, this demo is for you. I’ll show you what worked, what still needs polish, and why GPT-5 Codex might be the model I stick with day-to-day. 👉 What do you think — is Codex the real step forward for coding with AI? Drop your thoughts in the comments. #OpenAI #GPT5 #Codex #AIProgramming #AgenticAI #SoftwareEngineering #Cursor #AIcoding #GPT5Codex 00:00 - Intro 00:28 - Announcement 04:13 - Using in CodexCLI 06:34 - Putting it to work 07:25 - Use shell to fix a problem 11:38 - Visual application test 12:39 - Cursor Codex panel 13:56 - Conclusion

Video Details

Category

GPT Models & ChatGPT

Featured Date

December 9, 2025

Quality Rank

#1

AI Recommended