Loading video player...
Hey, so OpenAI just released a brand new model, or really a condensed, refined version
of GPT -5. It's a new coding model called GPT -5 Codex, and it really
caught my attention after kicking the tires just for a little period of time. Boy,
I'm really sold. This thing's kind of awesome. I want to dive straight in, take
a brief look at what they announced, and then use it a little bit and
show you exactly what it's like to use, and maybe some of the small differences
that I've found so far. Something that OpenAI just released today is kind of a
version of GPT -5 focused specifically for software engineering. So this is intended
specifically for agentic programming, is the way they describe it. Codex just got faster, more
reliable, and better at real -time collaboration. And really, actually, I have seen that. So
here you'll see they're saying GPT -5 is a version of GPT -5 further optimized
for agentic software engineering in Codex. All right, I usually will just push through these.
You'll see that on Sweebench, it gets a better score than GPT -5 High. This
is GPT -5 Codex High and GPT -5 High. So that's exciting that they improve
the Sweebench marks as well as some code refactoring tasks. This is a benchmark
that they have that it does better at. And that's actually a pretty useful perspective
of can it refactor a large code base, which is usually something that's going to
be done over a long run. It's kind of one of these things that you
expect it will take many, many callbacks, running tests, verifying things. So that's kind of
a neat one to see. And one of the things that's important here to call
out, I loved this, Codex adapts how much time it spends thinking more dynamically based
on the complexity of the task. And so this looks very much like basically GPT
-5 in ChatGPT, where you ask it a certain question, it comes back immediately. You
ask a different question and it has to think about it. They're applying that down
here inside of the coding model as well. So they're tuning how much kind of
reasoning time they're applying to different problems. This is a really neat idea because sometimes
I'll be in the middle model in medium and give it a problem that I
think, I wish I had selected before I kick this off a high thinking model
because I think what I'm asking it to do is much bigger. That's really exciting
that they might actually be doing that kind of routing inside of the coding models
as well. It says Codex will feel snappier on small, well -defined requests. While you're
chatting and work for longer complex tests like big refactors, during testing, we've seen codecs,
GPT -5 codecs work independently for more than seven hours at a time on a
large complex task, iterating on its implementation, fixing test failures, and ultimately delivering a successful
implementation. All right. But what does that really mean as far as something like token
usage? They do kind of represent that. They've looked at all of the open AI
employee traffic against the model. And they say the bottom 10 % of user turns
sorted by model generated tokens are 93 % fewer tokens. So that
basically means on tasks that were fast and easy to turn around and complete, they
could do far less reasoning and be able to turn it out with far less
essentially cost, far less tokens. And then on the top 10%, it's when it spends
almost twice as much. And they have a chart down here. So it's a much
smaller ask down at the 10 % and a much larger task up here at
the 90%. Basically, a harder problem, more thinking. It knows how to apply more thinking.
But where we spend a vast majority of our time asking for more succinct, smaller
changes or just queries, those kinds of things, it's going to be doing much, much
more efficient. As mentioned, they also call out that they have tuned this specifically for
code reviews. And down here, you can see the number of incorrect comments that GPT
-5 High had versus Codex High. So that's a great reduction. However, of course, these
are their own numbers. Just bear in mind. But at the same time, we've all
seen code review agents. And they kind of are saying, you know, you could do
this or you could do that or this would be a good idea. And we're
looking at them thinking, well, I could do it 100 different ways. I'm not sure
that that's necessarily helpful. I think that's what they're hearing and trying to address with
this model this way. The high impact comments, so much more meaningful comments. And then
how many comments per PR. This one's kind of exciting that it doesn't feel like
it just has to comment every single time on every PR. That's kind of nice.
Less wordy. Enough of that. Let's get into the terminal. Let's take a look at
this thing running. If we run Codex in our terminal, you'll see it come up
and awesome new animation announcing GPT -5 Codex, which is just great. So
if you select GPT -5 Codex model, then it will start with that. But you
can also just select model at any time and you'll see these new Codex models
here up at the top. And I'm inside of an application that I can just
ask it about. All right. And I will say that it comes up much faster.
It does that task much faster. This is one of the things that they are
talking about. Let's say, tell me how it uses Firebase. Okay. And this is how
it comes back. And I will say right away, this already feels quite different. The
way that it communicates its information back to me is a very different experience than
previous versions of GPT -5 or other models. You'll see that it has all of
these links in the middle of the message coming back. What I would say from
this is it's very conversational. This is coming back and giving me an actual
readout. This feels much less like reams of information and more a succinct document
that it's delivering to me. I have worked with it, like I said, for about
an hour and I've gotten quite a few of these back and was very surprised
to see it in the beginning. But in the end, the engineer in me really
appreciates many of the things that they're mentioning. All right, let's try the same thing
here with the old GPT -5 medium. I get its full definition of thinking, multiple
thinking blocks, what it's searching for, kind of all of its pattern of execution. That
absolutely was not seen with the new codex model. Okay. And remember when I told
you that the other one didn't return reams of information? Here, we'll scroll back through
what this one had to say to us just from that one request. This is
everything. This is an enormous request or response to my simple request of how it
uses Firebase Now. Admittedly, it does go through a lot of really great details. GPT
-5 has been a great editing model with code, but it returns it much more
in kind of this blog post format that makes it feel like it's trying to
tell me, create a report for me and tell me a whole story instead of
the first one, which felt much more like a succinct engineering path. Here you go.
Here are the real important aspects you need. I would imagine I could ask to
get to every gory detail if I needed to. So I would greatly prefer the
new codex model that way. All right. But how is it to use, right? That's
actually kind of the important part. I'm going to share my application, my numbers application
that I've shared multiple times on this channel. And I'm going to show that when
I run this application of a recent change, we're seeing missing Firestore index information.
So this is, we're using Firestore as the database. Firestore is a cloud database solution
from Google, and usually you would like the cloud to do some of the lifting
if possible, like filtering or sorting, those kinds of things where possible before it comes
down to the client and not have every client deal with all of the data.
This is, we're missing that. We're, we don't have some solution on the cloud to
be able to filter the data the way we need. So here you go. You're
going to have to do it client side. That's what this is saying. Obviously, we
don't want that. I want to try to see if I can bring codex CLI
or really codex, essentially the codex model to solving this problem. This is the cursor
editor. You could probably use Visual Studio Code just as well. It's basically the same
thing that we're solving here. I'll show you two ways to pull this up. The
way that I use it typically is I might go into terminal mode and just
pull up the terminal panel at the bottom here, bring it up there. There is
another way that we can go through the AI pane over here. I have added
codex as a plugin. You can come out here and get the extension of codex,
bring it into the application, and you'll be able to use a panel like this
or something similar to this. And this is them saying, do you want to use
the codex 5 model? Great. Now we're into using the codex models. You can see
them down here just like we did in the previous. So we'll leave this one
on medium as well. But first, I want to show you in terminal mode because
that's just where we just were. And maybe you can make the same sense of
it here. So I'll load codex down here, open up this panel a little bit
bigger, and also hide the side panel so now it feels more like where we
were. We'll make sure what model we're looking at at this point is the GPT
-5 codex medium model. Great. So that's what we're looking at at this point. And
what I want to do is give it the error message that I was just
previously seeing. So we can say this is missing, all right, missing Firestore Index limited
video query, the problem that we know. So I'm asking it with no other context.
Can you go fix this problem for me? Okay. And what we're seeing here is
now it's going through the thinking part that we didn't see last time, right? So
this is a more complex problem that I'm giving it a random, can you go
find this problem and fix it? It's now looking through saying, okay, this is part
of the video system. You can see that it's doing references to all the different
items and how they're supposed to be used in the system. So this, I think,
is that example of it's trying to apply more thinking where more thinking is needed.
Pretty cool to see, frankly. Okay. Excellent. And so here is the final result. It
added a composite index definition. It went into the system and added this definition
file of a new index that needs to be pushed up to Firestore itself. And
this is a little trick with using Firebase. I can either go log into the
console in the Firestore environment itself, or I can push it through a command line
system that they offer, the tooling that they have at the command line. And that
would be the Firebase action that's being referenced here. The reason I point out that
interesting detail is it's telling me in what I would consider a very succinct definition.
This has been a problem that I've had with GPT -5 as a coding model
for a while, is trying to figure out what it thinks I need to do
next and having to read an enormous document to figure out, what do you want
me to do? You did all this work. You got it to some point. Have
you tested everything? Have you not? So this, so far, has been my experience that
I get a much more succinct response at the end of it. Mileage may vary.
It's very early. So this may not hold. But right now, I've been very geeked
about this. This is one of those things that made me really happy that it
can do a bunch of work and then come back and say, here's your next
step. Run Firebase Firestore indexes against your Firebase project, and that will install the index.
What I'd like to do is say, can you do that? And I will tell
you, Cloud Code is the only editor that I've had, only CLI editor that I've
seen yet, that could do this. So this will be exciting if it will open
up a shell and kind of run this command on our behalf. Let's find out.
Yes, definitely. So it's definitely kicking it off, trying to run it. We'll see if
it's successful. I won't ding it for that. Excellent. Excellent. So it ran it, and
it's saying, yep, okay, I deployed the index. That looks pretty good. It went back
and forth and fixed something that it didn't quite get right the first time around.
All of this, I didn't have to touch anything. So that really is a success,
but there's only one way to really tell if they got it right. Let's take
a look. Nope. But all right, let's kill the server and start it again, maybe?
Nope. Even on restart, doesn't quite work. Let's try one more thing. I'll give it
back to it and say, didn't quite work. Try again, and I'll report their success
or failure. Stupid human. Okay. So all right. All credit to
all hail GPT -5 codec. So all right, what did it tell me at the
end of the last message that we were sitting here? I know y 'all are
probably yelling this at the screen. It says, okay, step one, wait a few minutes,
then rerun this to see how it worked. And of course, I reran it. It
all works perfectly fine. Not a surprise. Excellent. So it took care of it. Last
one. Let's take a look at what it's doing if it's updating a visual project,
just in case that's changed. Okay. If I'm honest, I really am not. I'm not
going to judge anybody by this. This is a project that I wrote, oh, well
over a year ago at this point. So this would have been with maybe early
cursor or mid -level cursor kind of stuff. It has no agentic coding against it
other than cursor and hand -built stuff. But let's take a look in here. Let's
say we search for a game, Silksong. This is just a sample application that I
built, playing with some ideas. If I search for Silksong, which I do believe to
be a very recent game that's very popular, the problem is that there are filters
defined here, Notability and Main Game. So if we say it can be any game
and we don't want a category that has to be assigned to it, then it
shows up. What I want to do is just light this button up with a
color when there's filters applied so that it might hint to you that the reason
you're not saying something is there's a filter. This is a terrible example. It's a
very old application. But let's take a look and give it to GPT -5 Codex
and see if it can just push right through. Here we are in cursor. This
time we're going to use the side panel and we're going to use the Codex
side panel as we saw earlier. And we will use GPT -5 Codex Medium
and run it here locally. Great. So what we're going to say to this is
I'm going to give it a screenshot. OK. And I'm going to tell it when
there are filters applied that aren't just default, like when there is a main game
filter or any other filter that's applied, I kind of want to see this button
in our main accent color. So I want the button to basically light up so
that I can tell that filters are applied. If there's no filters applied, I'd like
to see it like you see it now, which appears to me to be normal.
Let's see how that goes. OK. There it goes. It did all of its work.
Filter buttons now light up whenever the filters deviate from the defaults. And I will
say it definitely works that way. So if we take Silk Song, come back over
here and the default is set up this way. It was just my definition. The
default is set up to have main game. What I need to go in is
tell it main game is not a default object. But when I go away from
whatever the default is, the button lights up as expected and goes away if you
go back to the default. This is fine. Like I said, old project, different definition
really worked quite well to puzzle through what was going on inside of the application.
It did take quite a while to figure all of this out, but it did
excellently well. So I will say my experience with this thing has been great. OK,
so this was just a quick first look at Codex. What is it? GPT -5
Codex as a model that just released. I really think it does exactly what we
want. Gets us to a smaller, more efficient model for the vast majority of the
types of changes that we're asking for and then can think very deeply when it
needs to. To me, that's great. I'm going to probably be on medium a lot
more frequently. Very often, I kind of wander into high more often than I want
to because I'm asking more sophisticated questions every now and then. And other times, I'm
absolutely not. So this to me is a really neat idea. I'll see how the
router works. I want to hear what your experience has been as well or will
be when you start using this. Let me know. Add some comments. Let the others
know what's going on down there. Thanks for coming along for the ride on this
one. And I'll see you in the next one.
OpenAI just dropped a brand-new spin on GPT-5, and it’s aimed squarely at software engineers. Meet GPT-5 Codex — a refined coding model that feels faster, more agentic, and surprisingly collaborative in real-time. In this video, I take a first look at GPT-5 Codex, explore what’s new, and run it through real coding scenarios inside Cursor and the terminal. From Firestore indexing fixes to UI tweaks in old projects, I wanted to see if Codex could actually think deeper when needed and stay lightweight on smaller tasks. Highlights from this first run: 🚀 Faster, snappier coding responses 🧠 Dynamic reasoning that scales with task complexity 🔍 Better code refactoring and reviews (with fewer “meh” comments) ⚡ Independently worked for 7+ hours on a tough refactor 💡 More efficient token usage (cheaper, too!) 🖥️ Actually solved a Firestore indexing issue end-to-end If you’ve been curious about what “agentic programming” really looks like in action, this demo is for you. I’ll show you what worked, what still needs polish, and why GPT-5 Codex might be the model I stick with day-to-day. 👉 What do you think — is Codex the real step forward for coding with AI? Drop your thoughts in the comments. #OpenAI #GPT5 #Codex #AIProgramming #AgenticAI #SoftwareEngineering #Cursor #AIcoding #GPT5Codex 00:00 - Intro 00:28 - Announcement 04:13 - Using in CodexCLI 06:34 - Putting it to work 07:25 - Use shell to fix a problem 11:38 - Visual application test 12:39 - Cursor Codex panel 13:56 - Conclusion