Loading video player...
Andrew Mayne: Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. In this episode, we're going to speak
Andrew Mayne: with OpenAI co-founder and president, Greg Brockman, and Codex engineering lead, Thibault Sottiaux. And we're going to talk about Agenda coding, GPT-5 codex, and where things might be heading in 2030.
Greg Brockman: Just bet that the greater intelligence will pan out in the long run.
Thibault Sottiaux: It's just really optimized for what people are using GPT-5 within codex for.
Greg Brockman: How do you make sure that AI is producing things that are actually correct?
Andrew Mayne: We're here to talk about codex, which first, I've been using it since actually since I worked here with the first version of this. And then now
Andrew Mayne: you guys have the new version of this. I was playing with it all weekend long. And I've been
Andrew Mayne: very, very impressed by this. And it's amazing how far this technology has come in a few years.
Andrew Mayne: I would love to find out the early story.
Andrew Mayne: Like, where did the idea of even using a language model to code come from?
Greg Brockman: Well, I mean, I remember back in the GPT-3 days seeing the very first signs of life of take a doc string, a Python definition of a function name, and then watching the model complete the code.
Greg Brockman: And as soon as you saw that, you knew this is going to work.
Greg Brockman: This is going to be big.
Greg Brockman: And I remember at some point we were talking about these aspirational goals of imagine if you could have a language model.
Greg Brockman: that would write a thousand lines of coherent code, right?
Greg Brockman: That was like a big goal for us.
Greg Brockman: And the thing that's kind of wild
Greg Brockman: is that that goal has come and passed.
Greg Brockman: And I think that we don't think twice about it, right?
Greg Brockman: I think that while you're developing this technology,
Greg Brockman: you really just see the holes, the flaws,
Greg Brockman: the things that don't work.
Greg Brockman: But every so often, it's good to like step back
Greg Brockman: and realize that like actually things
Greg Brockman: have just like come so far.
Thibault Sottiaux: It's incredible how used we get
Thibault Sottiaux: to things improving all the time
Thibault Sottiaux: and how to just become like a daily driver
Thibault Sottiaux: You just use it every day and then you reflect back to like a month ago.
Thibault Sottiaux: This wasn't even possible.
Thibault Sottiaux: And this just continues to happen.
Thibault Sottiaux: I think that's quite fascinating, like how quickly humans adapt to new things.
Greg Brockman: Now, one of the struggles that we've always had is the question of whether to go deep in a domain.
Greg Brockman: Right.
Greg Brockman: Because we're really here for the G, right, for AGI, general intelligence.
Greg Brockman: And so to first order, our instinct is just push on making all the capabilities better at once.
Greg Brockman: Coding has always been the exception to that, right?
Greg Brockman: We really have a very different program that we use to focus on coding data, on code metrics, on trying to really understand how do our models perform on code.
Greg Brockman: And, you know, we've started to do that in other domains, too.
Greg Brockman: But for programming and coding, that that's been like a very exceptional focus for us.
Greg Brockman: And, you know, for GPT-4, we really produced a single model that was just a leap on all fronts.
Greg Brockman: But we actually had trained, you know, the codex model.
Greg Brockman: And I remember doing like a Python sort of focused model.
Greg Brockman: Like we were really, really trying to push the level of coding capability back in 2021 or so.
Greg Brockman: And I remember when we did the Codex demo, that was maybe the first demonstration of what we'd call Vibe coding today.
Greg Brockman: I remember building this interface and having this realization that for just standard language model stuff, the interface, the harness is so simple.
Greg Brockman: You're just completing a thing and maybe there's a follow-up turn or something like that.
Greg Brockman: But that's it.
Greg Brockman: for coding that you actually, this text comes to life, right? You need to execute it, that it needs
Greg Brockman: to be hooked up to tools, all these things. And so you realize that the harness is almost like
Greg Brockman: equally part of how you make this model usable as the intelligence. And so that is something that I
Greg Brockman: think we kind of knew from that moment. And it's been interesting to see as we got to more capable
Greg Brockman: models this year and really started to focus on not just making the raw capability, like how do you
Greg Brockman: win at programming competitions, but how do you make it useful, right? Training in a diversity
Greg Brockman: of environments, really connecting to how people are going to use it, and then really building the
Greg Brockman: harness, which is something that Thibaut and his team have really pushed hard.
Andrew Mayne: Could you unpack a harness, what that means in simple terms?
Thibault Sottiaux: Yes, it's quite simple. You have the model, and the model is just capable of input-output.
Thibault Sottiaux: And what we call the harness is how do we integrate that with the rest of the infrastructure so that
Thibault Sottiaux: the model can actually act on its environment? So it's the set of tools, it's the way that it's
Thibault Sottiaux: looping. So the agent loop, as we refer to it as the agent loop. And it's in essence,
Thibault Sottiaux: it's fairly simple. But when you start to integrate these pieces together and really
Thibault Sottiaux: train it end to end, you start to see like pretty magical behavior and an ability of the model to
Thibault Sottiaux: really act and create things on your behalf and be a true collaborator. So think about it a little
Thibault Sottiaux: bit as, you know, the harness being your body and the model being your brain.
Andrew Mayne: Okay. It is, It's interesting to see how far it came, like GPT-3 days where you literally had to write commented code and say,
Andrew Mayne: this function does this with its Python, put your hashtag in front of that, whatever.
Andrew Mayne: And it's just interesting to see how the models have now become just naturally, intuitively good at coding.
Andrew Mayne: And you mentioned that trying to determine between a general purpose model or saying how important code is.
Andrew Mayne: Was it just outside demand, people telling me one of these models is better at code?
Andrew Mayne: Or was this coming internally because you guys wanted to use this more?
Greg Brockman: both yeah absolutely both and i remember you know in i think 2022 is when we worked with github to
Greg Brockman: produce github copilot and the thing that was very interesting there was that for the first time you
Greg Brockman: really felt what is it like to have an ai in the middle of your coding workflow and how can it
Greg Brockman: accelerate you and i remember that there were a lot of questions around the exact right interface do
Greg Brockman: you want ghost text so it just does a completion do you want a little drop down with a bunch of
Greg Brockman: different possibilities. But one thing that was very clear was latency was a product feature.
Greg Brockman: And the constraint for something like an autocomplete is that 1500 milliseconds,
Greg Brockman: right? That's like the time that you have to produce a completion. Anything is slower than
Greg Brockman: that. It could be incredibly brilliant. No one wants to sit around waiting for it. And so the
Greg Brockman: mandate that we had, the clear signal we had from users and from the product managers and all the
Greg Brockman: people thinking about the product side of it is get the smartest model you can subject to the
Greg Brockman: latency constraint. And then you have something like GPT-4, which much, much smarter, but it's not
Greg Brockman: going to hit your latency budget. What do you do? Is it a useless model? Like, absolutely not. The
Greg Brockman: thing you have to do is you change the harness, you change the interface. And I think that that's
Greg Brockman: like a really important theme is you need to kind of co-evolve the interfaces and the way that you
Greg Brockman: use the model around its affordances. And so super fast, smart models is going to be great,
Greg Brockman: but the incredibly smart, but slower models, it's also worth it. And I think that we've always had
Greg Brockman: a thesis that the, you know, that the returns on that intelligence are worth it. And it's never
Greg Brockman: obvious in the moment because you're just like, well, it's just going to be too slow. Why would
Greg Brockman: anyone want to use it? But I think that our approach has very much been to say just that
Greg Brockman: the greater intelligence will pan out in the long run.
Andrew Mayne: It was hard for me to wrap my head around where that was all headed back when working on the GitHub copilot, because at that point,
Andrew Mayne: were used to, like you said, the completion, ask to do a thing, it completes a thing. And I think
Andrew Mayne: I didn't really understand how much more value you would get out of building a harness, adding all
Andrew Mayne: these capabilities there. And it just seemed like all you just need is the model. But now you realize
Andrew Mayne: the tooling, everything else matters can make such a big difference. And you brought up the idea of
Andrew Mayne: modalities. And now we have CLI, Codex CLI. So I can go in the command line, I can do this. There's
Andrew Mayne: a plugin for VS Code, so I can go use this there. And then also I can deploy stuff to the web and do
Andrew Mayne: that. And I don't think I fully kind of comprehend the value of that. And so like, how is this
Andrew Mayne: something you're using? How are you kind of deploying these things yourself? Like, where are
Andrew Mayne: you finding the most, you know, utility out of it?
Thibault Sottiaux: I think just to go back a little bit, like the first signs that we saw is like, we had a lot of developers at the company outside of the company,
Thibault Sottiaux: like our users use ChatGPT to help them debug like very complex problems. And one thing that we
Thibault Sottiaux: clearly saw is like, people are trying to get more and more context into ChatGPT. And you're
Thibault Sottiaux: trying to get bits of your code and stack traces and things.
Thibault Sottiaux: And then you paste that and you present that to a very smart model to get some help.
Thibault Sottiaux: And interactions were starting to get more and more complex up to some point where we
Thibault Sottiaux: realized like, hey, maybe instead of the user driving this thing, maybe let the model actually
Thibault Sottiaux: drive the interaction and find its own context and then find its way and be able to debug
Thibault Sottiaux: this hard problem by itself so that you can just sit back and watch the model do the work.
Thibault Sottiaux: So it's like sort of like reversing that interaction that led to this, I think, thinking a lot more about the harness and giving the model the ability to act.
Greg Brockman: And we iterated on form factors.
Greg Brockman: I mean, I remember at the beginning of the year, we had a couple of different approaches.
Greg Brockman: We had sort of the async agentic harness, but we also had the local experience and a couple of different implementations of it.
Thibault Sottiaux: We actually started to play a little bit with this idea of running it in the terminal.
Thibault Sottiaux: And then we felt that was not AGI-pilled enough.
Thibault Sottiaux: We needed the ability to run this at scale and remotely and just close the laptop and have the agent just continue to do its work.
Thibault Sottiaux: And then you can maybe follow it on your phone and interact with it there.
Thibault Sottiaux: That seemed very cool.
Thibault Sottiaux: So we pushed on that.
Thibault Sottiaux: But we actually had a prototype of it fully working in a terminal.
Thibault Sottiaux: And people were using that productively at OpenAI.
Thibault Sottiaux: We decided to not launch this as a product.
Thibault Sottiaux: It didn't feel like polished enough.
Thibault Sottiaux: It was called 10X because we felt like it was giving us this 10X productivity boost.
Thibault Sottiaux: But then we decided to just experiment with different form factors and really go all in with the async form factor initially.
Thibault Sottiaux: And now we've kind of gone back a little bit of that and re-evolved and said, hey, actually, this agent, we can bring it back to your terminal.
Thibault Sottiaux: We can bring it in your IDE.
Thibault Sottiaux: But the thing that we're really trying to get right is like this entity, this collaborator that's working with you and then bringing that to you in the tools that you're already using as a developer.
Greg Brockman: And there are other shots on goal as well.
Greg Brockman: So we had a version where there was a remote daemon that would connect to a local agent.
Greg Brockman: And so you kind of could get both at once.
Greg Brockman: And I think that part of the evolution has been that there's almost this matrix of different ways you could try to deploy a tool.
Greg Brockman: There's this async.
Greg Brockman: It has its own computer off in the cloud.
Greg Brockman: There's the local that it's running synchronously there.
Greg Brockman: You can blend between these.
Greg Brockman: There's been a question for us of how much do we focus on trying to build something that is externalizable, that is useful in the diversity of environments that people have out there versus really focus on our own environment and try to make it so that things work really well for our internal engineers.
Greg Brockman: And one of the challenges has been we want to kind of do all of this.
Greg Brockman: We ultimately want tools that are useful to everyone.
Greg Brockman: But if you can't even make it useful for yourself, how are you going to make it extremely useful for everyone else?
Greg Brockman: And so part of the challenge for us has been really figuring out where do we focus and how do we achieve the sort of biggest bang for the buck in terms of our engineering efforts?
Greg Brockman: And, you know, for me, one of the things that's been an overarching focus has been we know that coding and building very capable agents is one of the most important things that we can do this year.
Greg Brockman: At the beginning of the year, we set a company goal of an agentic software engineer by the end of the year.
Greg Brockman: And figuring out exactly what that means and how to substantiate that and how to bring together all the opportunity and all the kind of compute that we have to bear on this problem.
Greg Brockman: Like that has been a great undertaking for many, many people at OpenAI.
Andrew Mayne: So you mentioned that you had the tool 10X and that was an internal tool.
Andrew Mayne: And that seemed to be something at some point you said, oh, this is really useful to other people.
Andrew Mayne: It's got to be hard to decide when to do that and when not to.
Andrew Mayne: and how much to sort of prioritize that.
Andrew Mayne: We've seen cloud code has become extremely powerful,
Andrew Mayne: which I imagine is probably a similar story
Andrew Mayne: with something that was used internally
Andrew Mayne: and then became something deployed.
Andrew Mayne: When you start to think about next steps of,
Andrew Mayne: where do you decide to take it next?
Andrew Mayne: Where do you decide to put the emphasis?
Andrew Mayne: You mentioned before, I can now run things in the cloud,
Andrew Mayne: run these web, do these kind of agentic-like tasks
Andrew Mayne: where I walk away.
Andrew Mayne: And my problem is just, it's such a new modality.
Andrew Mayne: It's really, really hard for me to think about.
Andrew Mayne: But sometimes these things have to sit around for a while and people sort of discover them independently.
Andrew Mayne: And have you found that internally that somebody says, oh, now I get it?
Greg Brockman: I'd say absolutely.
Greg Brockman: Right.
Greg Brockman: And I think that, you know, my perspective is that we kind of know the shape of the future, right, of the long term.
Greg Brockman: It is very clear that you're going to want an AI that has its own computer that is able to run, you know, delegate to a fleet of agents and be able to solve multiple tasks in parallel.
Greg Brockman: You should wake up in the morning, you're sipping your coffee, you know, answering questions for your agent, like providing some review, being like, oh, no, this wasn't quite what I meant.
Greg Brockman: This workflow clearly needs to happen.
Greg Brockman: But the models aren't quite smart enough for this to be the way that you interact with them.
Greg Brockman: And so having an agent that is really there in your terminal, in your editor to help you with the way that you do your work that looks very similar to the way you would have done it a year ago.
Greg Brockman: That's also the present.
Greg Brockman: And so I think that the way that we've seen it is almost we're blurring together. Here's what the future looks like, but we also can't abandon the present and thinking about how do you bring AI into code review and how do you make it so that it appears proactively and does work for you that's useful.
Greg Brockman: And then you have a whole new challenge as well of if you have a lot more PRs, like how do you actually sort through those to the ones that are the ones you actually want to merge?
Greg Brockman: And so I think we've kind of seen all of this opportunity space and we've seen people start to change how they develop within OpenAI, how they even structure their code bases.
Thibault Sottiaux: Yeah, I think there are two things to that effect that really combine and mean, you know, this is where we're at today is one infrastructure is hard.
Thibault Sottiaux: And we would love for, you know, all of everyone's code and like tasks and packages to be like perfectly containerizable.
Thibault Sottiaux: And so we can run them at scale.
Thibault Sottiaux: That's not the case.
Thibault Sottiaux: Like people have very thorough and complex setups that probably only runs on their laptop.
Thibault Sottiaux: And we want to be able to leverage that and meet, you know, people where they are so that, you know, they don't have to configure things specifically for Codex.
Thibault Sottiaux: That gives you this very easy entry point into experiencing, you know, what a very powerful coding agent can do for you.
Thibault Sottiaux: And it's also at the same time, lets us experiment with, you know, what the right interface is.
Thibault Sottiaux: Six months ago, we weren't playing with these kinds of tools.
Thibault Sottiaux: And this is all very new and evolving fast.
Thibault Sottiaux: And we have to continue to iterate here and innovate on like what the right interface and what the right way to collaborate with these agents are.
Thibault Sottiaux: And we don't feel like we have really nailed that yet.
Thibault Sottiaux: that's going to continue to evolve,
Thibault Sottiaux: but bringing it to like a zero setup,
Thibault Sottiaux: extremely easy to use out of the box,
Thibault Sottiaux: you know, allows a lot more people to benefit from it
Thibault Sottiaux: and like play with it.
Thibault Sottiaux: And for us to get the feedback
Thibault Sottiaux: so that we can continue to innovate,
Thibault Sottiaux: that's very important.
Greg Brockman: I remember at the beginning of the year
Greg Brockman: talking to one of our engineers,
Greg Brockman: who I think is really fantastic.
Greg Brockman: And he was saying that ChatGPT,
Greg Brockman: we had this integration
Greg Brockman: where it could automatically see the context
Greg Brockman: in this terminal.
Greg Brockman: And he's like, it's transformative
Greg Brockman: because he doesn't have to like copy paste errors.
Greg Brockman: He just like can instantly be like, hey, like, you know, what's what's the bug?
Greg Brockman: And it would just tell him and it was great.
Greg Brockman: Right. And you realize that it was an integration that we built that was so transformative.
Greg Brockman: It wasn't about a smarter model.
Greg Brockman: And I think that one thing that's very easy to get confused by is to really focus on only one of these dimensions and be like, which one matters?
Greg Brockman: Because the answer is they kind of both matter.
Greg Brockman: And the way I've always thought about this, I remember when we were originally releasing the API back in 2020,
Greg Brockman: is there's two dimensions to what makes an AI desirable.
Greg Brockman: There's intelligence, which you can think of as one axis.
Greg Brockman: And then there's convenience, which you can think of as latency.
Greg Brockman: You could think of as cost.
Greg Brockman: You could think of as the integrations available to it.
Greg Brockman: And there's some acceptance region, right?
Greg Brockman: Where it's like, if the model's incredibly smart,
Greg Brockman: but it takes you like a month to run it or something,
Greg Brockman: like you still might, right?
Greg Brockman: If what's going to output is such a valuable piece of code
Greg Brockman: or cure for a certain disease or something like that,
Greg Brockman: okay, fine, like it's worthwhile.
Greg Brockman: If the model is incredibly not that intelligent, not that capable, then all you want to do is autocomplete.
Greg Brockman: So it has to be incredibly convenient, zero cognitive tax for you to think about what it's suggesting, that kind of thing.
Greg Brockman: And where we are is, of course, somewhere on the spectrum now.
Greg Brockman: We now have smarter models that are reasonably less convenient than autocomplete, but still more convenient than you have to sit around and wait for a month for the answer to appear.
Greg Brockman: And so I think that a lot of our challenge is figuring out when do you invest in pulling that convenience to the left?
Greg Brockman: When do you invest in pushing the intelligence up?
Greg Brockman: And it's a massive design space.
Greg Brockman: It's what makes it fun.
Andrew Mayne: Yeah.
Andrew Mayne: I don't know if you remember, but I made an app that was featured on the launch back in 2020, AI channels.
Andrew Mayne: Uh-huh.
Greg Brockman: Of course.
Andrew Mayne: And that was, yeah, the challenge was, GPT-3 app was very capable, but I had to write these like 600 word prompts to get it to do stuff.
Andrew Mayne: And because it's six cents per thousand tokens in the latency, I'm like, I don't think this is the world for this right now.
Andrew Mayne: Yes.
Andrew Mayne: And then GPT-3.5 and GPT-4, then all of a sudden you see all that capabilities.
Andrew Mayne: And it was hard for me to say why, but then you see that all of a sudden the things that come together.
Andrew Mayne: And you mentioned, you know, the idea of just having, you know, the model be able to see the context inside of the, you know, where you're working.
Andrew Mayne: And I remember when I was copy pasting using ChatGPT into my workspace and it reminded me going into a grocery store and refusing to get a cart and just carrying everything to the checkout.
Andrew Mayne: I'm like, this is terribly inefficient.
Andrew Mayne: Once you put things on wheels, it works really well.
Andrew Mayne: And I think we're seeing all kinds of those unlocks now.
Andrew Mayne: Now the problem I deal with is when I sit down to work on something is, do I go into CLI?
Andrew Mayne: Do I go use the VS Code plugin?
Andrew Mayne: Do I go into Cursor?
Andrew Mayne: Do I use some other tool?
Andrew Mayne: And how do you guys figure this out?
Thibault Sottiaux: Right now, we're still at the experimentation phase where we're trying different ways for you to interact with the agent and bring it where you're already productive.
Thibault Sottiaux: So, for example, Codex is now in GitHub.
Thibault Sottiaux: You can mention Codex and it will do work for you.
Thibault Sottiaux: If you do add Codex, fix this bug or move the tests over here, it will go and run off and do it with its own little laptop on our data centers.
Thibault Sottiaux: And you don't have to think about it.
Thibault Sottiaux: but if you're working with files in a folder,
Thibault Sottiaux: you know, then you have that decision that
Thibault Sottiaux: are you going to do it in your IDE?
Thibault Sottiaux: Are you going to do it in terminal?
Thibault Sottiaux: What we're seeing is users are developing,
Thibault Sottiaux: like power users are developing
Thibault Sottiaux: very complex workflows with the terminal more.
Thibault Sottiaux: And then when you're actually working
Thibault Sottiaux: on a file or a project,
Thibault Sottiaux: you prefer to do it in the IDE.
Thibault Sottiaux: It's a bit more of a polished interface.
Thibault Sottiaux: You can undo things.
Thibault Sottiaux: You can see the edits.
Thibault Sottiaux: You know, it's not like just scrolling by you.
Thibault Sottiaux: And then the terminal is just an amazing,
Thibault Sottiaux: also vibe coding tool where, you know,
Thibault Sottiaux: if you don't really care that much
Thibault Sottiaux: about the code that's being produced,
Thibault Sottiaux: you know, you can just generate a little app.
Thibault Sottiaux: It's much more about that interaction.
Thibault Sottiaux: It elevates the interaction more
Thibault Sottiaux: instead of focusing on the code.
Thibault Sottiaux: So it's more focused on the outcome.
Thibault Sottiaux: And it's just sort of like depends
Thibault Sottiaux: on what you want to do,
Thibault Sottiaux: but it's still very much
Thibault Sottiaux: an experimentation phase right now.
Thibault Sottiaux: And we're trying different things out.
Thibault Sottiaux: And, you know, it's going to continue
Thibault Sottiaux: like that, I think.
Greg Brockman: Yeah, I really agree with that.
Greg Brockman: And I also think that a lot of our direction
Greg Brockman: will be more integration across these things.
Greg Brockman: I guess people are capable of using multiple tools, right?
Greg Brockman: You already have your terminal, your browser,
Greg Brockman: your GitHub web interface,
Greg Brockman: your repo on your local machine.
Greg Brockman: Each of these is something people have kind of learned
Greg Brockman: when it's appropriate to reach for what tool.
Greg Brockman: And I think that because we're in this experimentation phase,
Greg Brockman: that these things can feel very disparate
Greg Brockman: and very different.
Greg Brockman: And like, you have to kind of learn a new set of skills
Greg Brockman: and the affordances of the relevant tool.
Greg Brockman: And I think that a lot of as we're iterating what's on us is to really think about how do these fit together.
Greg Brockman: And so you can start to see it right with the codex IDE extension being able to run remote codex tasks.
Greg Brockman: And I think that ultimately our vision is that there should be a AI that has access to its own computer, its own clusters, but is also able to look over your shoulder.
Greg Brockman: Right. They can also come and help you locally. And these shouldn't be distinct things.
Thibault Sottiaux: Right. And it's like this one coding entity that is there to help you and collaborate with you.
Thibault Sottiaux: Like when I collaborate with Greg, you know, I don't complain that sometimes you're on Slack.
Thibault Sottiaux: Sometimes I talk to you in person.
Greg Brockman: Sometimes you complain.
Thibault Sottiaux: Sometimes we interact like through a GitHub review.
Thibault Sottiaux: Like this seems like very natural when you interact with other humans and collaborators.
Thibault Sottiaux: And this is also where, you know, how we're thinking about Codex as an agentic like entity that is really meant to just supercharge you when you're trying to achieve things.
Andrew Mayne: So let's talk about some of the ways of using it, like agents.md.
Andrew Mayne: Do you want to explain that?
Thibault Sottiaux: Yeah, agents.md is a set of instructions that you can give to Codex that lives alongside your code.
Thibault Sottiaux: So that Codex has a little bit more context about how to best navigate the code and accomplish the tasks.
Thibault Sottiaux: There are two main things that are useful to put in agents.md that we find is like helping with it's like a compression thing where it is a little bit more efficient for the agent to just read codex.md.
Thibault Sottiaux: instead of like exploring the entire code base.
Thibault Sottiaux: And then preferences that are actually not clear
Thibault Sottiaux: in the code base itself,
Thibault Sottiaux: where you would be like, you know,
Thibault Sottiaux: actually tests should be over here
Thibault Sottiaux: or, you know, I like things to be done
Thibault Sottiaux: in this particular fashion.
Thibault Sottiaux: And those two things, you know, preferences
Thibault Sottiaux: and then explaining to the agent
Thibault Sottiaux: how to navigate the code base effectively
Thibault Sottiaux: are very useful things to have in agents.mg.
Thibault Sottiaux: Yep.
Greg Brockman: And I think that there's something deeply fundamental here
Greg Brockman: of how do you communicate to an agent
Greg Brockman: that has no context what you want,
Greg Brockman: what your preferences are,
Greg Brockman: and to try to save it a little bit of the kind of spin up
Greg Brockman: a human would require, right?
Greg Brockman: We do this for humans, right?
Greg Brockman: We write readme.mds.
Greg Brockman: And this is just a convention for a name of a file
Greg Brockman: that an agent should go look at.
Greg Brockman: But there's also something that's a little point in time, right?
Greg Brockman: That the agents right now don't have great memory, right?
Greg Brockman: It's like if you're running your agent for the 10th time,
Greg Brockman: Has it really benefited from the nine times that it went and solved a hard problem for you?
Greg Brockman: And so I think that we have real research to do to think about how do you have memory?
Greg Brockman: How do you have an agent that really just goes and explores your code base and really deeply understands it and then is able to leverage that knowledge?
Greg Brockman: And so this is one of the examples, and there are many, where we see great fruit on the horizon for further research progress.
Andrew Mayne: It's a very competitive landscape now.
Andrew Mayne: There was a point where, you know, OpenAI kind of came out of nowhere for a lot of people.
Andrew Mayne: And all of a sudden there was GPT-3, then there was GPT-4.
Andrew Mayne: And then I think Anthropics building great models.
Andrew Mayne: And Gemini, you know, from Google has gotten really good.
Andrew Mayne: How do you guys see the landscape?
Greg Brockman: How do you see your placement there?
Greg Brockman: I mean, I think that there's a lot of progress to be had.
Greg Brockman: I focus a little less on the competition and a little more on the potential, right?
Greg Brockman: Because we started OpenAI 2015 thinking that AGI is going to be possible, maybe sooner than people think.
Greg Brockman: And we just want to be a positive force in how it plays out, right?
Greg Brockman: And really thinking about what does that mean, trying to connect that to practical execution has been a lot of the task.
Greg Brockman: And so as we started to figure out how to build capable models that are actually useful, right, that can actually help people, actually bringing that to people is this really critical thing.
Greg Brockman: And you can look at choices that we've made along the way, for example, releasing ChatGPT and making ChatGPT free tier available widely.
Greg Brockman: That's something that we do because of our mission, because we really think about we want AI to be available and accessible and benefit everyone.
Greg Brockman: And so in my view, the most important thing is to continue on that exponential progress and really think about how to bring it to people in a positive and useful way.
Greg Brockman: So I really look at where we're at right now is that these models, like there's the GPT-4 class of pre-trained models.
Greg Brockman: There's reinforcement learning on top of it to make it just much more reliable and smart.
Greg Brockman: It's like you think about if you've just sort of read the internet, you've just observed a bunch of sort of human thought.
Greg Brockman: And you're trying to write some code for the first time, you're probably going to have a bad time of it.
Greg Brockman: But if you've had the ability to actually try to solve some hard code problems, you have a Python interpreter, you have access to the kinds of tools that humans do, then you're going to be able to become much more robust and refined.
Greg Brockman: So we now have these pieces working together in concert, but we got to keep pushing them to the next level.
Greg Brockman: It's very clear that things like being able to refactor massive code bases, like no one's cracked that just yet.
Greg Brockman: There's no fundamental reason we can't.
Greg Brockman: And I think the moment you get that, I think refactoring code is one of the killer use cases for enterprise, right?
Greg Brockman: It's, you know, if you could bring down the cost of code migrations by, you know, 2x, I think you'll end up with 10x more of them happening.
Greg Brockman: Think about the number of systems that are stuck in COBOL.
Greg Brockman: And there's no COBOL programmers being trained, right?
Greg Brockman: It's just like it's strictly like, you know, building liability for the world to have this dependency.
Greg Brockman: Like the only way through is by building systems that can actually tackle that.
Greg Brockman: So I just think it's a massive open space.
Greg Brockman: The exponential continues and we need to stay on that.
Andrew Mayne: My favorite thing today that happened was there was a tweet from OpenAI, which was showing people how to use the CLI to switch from the completions API to the responses API.
Thibault Sottiaux: That's a great use.
Thibault Sottiaux: I expect to see more of that.
Thibault Sottiaux: You know, where you have special instructions given to Codex in order to go do like refactorings reliably.
Thibault Sottiaux: And then you just set it off and it does it for you.
Thibault Sottiaux: That's like a wonderful thing.
Thibault Sottiaux: migrations are some of the worst things.
Thibault Sottiaux: Nobody wants to do migrations.
Thibault Sottiaux: Nobody wants to change from one library to the other
Thibault Sottiaux: and then make sure that everything still works.
Thibault Sottiaux: If we can automate most of that,
Thibault Sottiaux: that's going to be a very beautiful contribution.
Greg Brockman: I think there's a lot of other ground as well.
Greg Brockman: I think that security patching is a good example of something
Greg Brockman: that I think will become very important soon,
Greg Brockman: and that's something we're being very thoughtful about.
Greg Brockman: I think that being able to actually have AIs that produce new tools, right? You think about how important the Unix set of standard tools has been and AIs, they're actually able to build their own tools that are useful for you, are useful for themselves.
Greg Brockman: You can actually build up a ladder of complexity there or utility there to be able to just like continue to improve this flywheel of efficiency.
Greg Brockman: AIs that are actually really not just writing code, but able to execute their own, be able to administer services or be able to do SRE work and things like that.
Greg Brockman: I think all of that is on the horizon.
Greg Brockman: It's like starting to happen, but it's not really happening yet in the way that we would like to see.
Thibault Sottiaux: One big one that we cracked internally at OpenAI and then we decided to release it as a code review, where we started to notice that the big bottleneck for us was with increased amounts of code needing to be reviewed is like the amount of review simply that people had to do on the teams.
Thibault Sottiaux: And so we decided to really focus on a very high signal codex mode where it's able to review a PR and really think deeply about the contract and the intention that you were meaning to implement and then look at the code and validate whether that intention is matched and found in that code.
Thibault Sottiaux: And it's able to go layers deep, look at all the dependencies, think about the contract and really raise things that some of our best employees, some of our best reviewers wouldn't have been able to find unless they were spending hours really deeply thinking about that PR.
Thibault Sottiaux: And we released this internally first at OpenAI.
Thibault Sottiaux: It was quite successful and people were upset actually when it broke because they felt like they were losing that safety net.
Thibault Sottiaux: And it accelerated teams and including the Codex team tremendously.
Thibault Sottiaux: The night before we released the IDE extension, one of the top engineers on my team was like cranking out 25 PRs.
Thibault Sottiaux: And we were finding quite a few bugs automatically.
Thibault Sottiaux: Codex was finding quite a few bugs.
Thibault Sottiaux: And, you know, we were able to put out an IDE extension that was almost bug free the next day.
Thibault Sottiaux: So the velocity there is incredible.
Greg Brockman: And it's very interesting that for the code review tool in particular, people were very nervous about having this enabled because I think our previous experience with every auto code review experiment that we've tried is that it's just noise.
Greg Brockman: Right.
Greg Brockman: You just get an email from some bot and you're like, oh, another one of those things.
Greg Brockman: You ignore it.
Greg Brockman: And I think we've had kind of the opposite finding from where we are now.
Greg Brockman: And it really shows you when the capability is below threshold, it just feels like this thing is like totally net negative.
Greg Brockman: I don't want to hear about it.
Greg Brockman: I don't want to see it.
Greg Brockman: Once you kind of crack above some threshold of utility, suddenly people want it, right, and get very upset if it gets taken away.
Greg Brockman: And I think also our observation is if something kind of works in AI right now, one year from now, it'll be incredibly reliable, incredibly mission critical.
Greg Brockman: And I think that that's where we're going with code review.
Thibault Sottiaux: Part of the interesting things there with CodeReview as well is like bringing humans along and really have this be a collaborator, including and review.
Thibault Sottiaux: And one thing we taught a lot about is like, how can we raise those findings so that you are actually excited to read this finding and you might even learn something, including when it's wrong.
Thibault Sottiaux: Like, you know, you can actually understand its reasoning.
Thibault Sottiaux: Most of the time, like actually more than 90% of the time it's right.
Thibault Sottiaux: And you often learn something as the person who authored the code or someone who is helping review the code.
Greg Brockman: Yeah, just circling back to what we were saying earlier about the rate of progress and sometimes stepping back and thinking about how things were earlier.
Greg Brockman: I remember for GPT-3 and for GPT-4 really focusing on the doubling down problem.
Greg Brockman: Do you remember if the AI would say something wrong and you'd point out the mistake?
Andrew Mayne: Oh, it would argue with you.
Greg Brockman: Oh, yeah.
Greg Brockman: It would try to convince you that it was right.
Greg Brockman: We're so far past that being the core problem.
Greg Brockman: I'm sure it happens in some obscure edge cases, just like it does for humans.
Greg Brockman: But it's really amazing to see that we're at a level where even when it's not quite zeroed in on the right thing, it's highlighting stuff that matters.
Greg Brockman: It has pretty reasonable thoughts.
Greg Brockman: And I always walk away from these code reviews thinking like, huh, OK, yeah, that's a good point.
Greg Brockman: I should be thinking about that.
Andrew Mayne: We're now just getting to launch a GPT-5.
Andrew Mayne: And as the recording of this podcast, we now have GPT-5 Codex.
Thibault Sottiaux: Which we're tremendously excited about.
Thibault Sottiaux: Very excited.
Andrew Mayne: Why should I be excited about this, gentlemen?
Andrew Mayne: Sell me on this.
Thibault Sottiaux: GPT-5 Codex is a version of GPT-5 that we have optimized for Codex.
Thibault Sottiaux: And we talked about the harness.
Thibault Sottiaux: And so it's optimized for the harness.
Thibault Sottiaux: We really consider it to be like one agent where you couple the model very closely to the set of tools.
Thibault Sottiaux: And it's able to be even more reliable.
Thibault Sottiaux: One of the things that this model exhibits is an ability to go on for much longer and to really have that grit that you need on these complex refactoring tasks.
Thibault Sottiaux: But at the same time, for simple tasks, it actually comes way faster at you and is able to reply without much thinking.
Thibault Sottiaux: And so it's like this great collaborative where you can ask questions about your code, find where this piece of code is that you need to change or better understand, plan.
Thibault Sottiaux: But at the same time, once you let it go onto something, it will work for a very, very long period of time.
Thibault Sottiaux: We've seen it work internally up to seven hours for very complex refactorings.
Thibault Sottiaux: We haven't seen other models do that before.
Thibault Sottiaux: And we also have really worked tremendously on code quality.
Thibault Sottiaux: And it's just really optimized for what people are using GPT-5 within Codex for.
Andrew Mayne: So when you talk about working longer and you say it worked up to seven hours, you're not just talking about it keeps putting things back into context,
Andrew Mayne: that it's actually making decisions,
Andrew Mayne: deciding what's important and moving forward?
Thibault Sottiaux: Yes.
Thibault Sottiaux: So imagine like a really tricky refactoring.
Andrew Mayne: Right.
Thibault Sottiaux: We've all had to deal with those
Thibault Sottiaux: where you've decided that your code base is unmaintainable.
Thibault Sottiaux: You need to make a couple of changes
Thibault Sottiaux: in order to move forward.
Thibault Sottiaux: So you make a plan and then you let the model go.
Thibault Sottiaux: You let Codex, GPT-5 Codex go at it.
Thibault Sottiaux: And it will just like work its way through all of the issues,
Thibault Sottiaux: get the test to run, get the test to pass,
Thibault Sottiaux: and just completely finish the refactoring.
Thibault Sottiaux: This is like one of the things that we've seen it do seven hours.
Greg Brockman: Yeah the thing that i find so remarkable is that the core intelligence of these
Greg Brockman: models is clearly so just like stunning right i think that even three six months ago i think our
Greg Brockman: models were better than i am at navigating our internal code base right to find a specific piece
Greg Brockman: of functionality and that requires some really sophisticated
Andrew Mayne: Are you gonna have to let yourself go? Are you like "Greg, i'm sorry..."
Greg Brockman: because that's the thing is i get to do more it's like is what i want spend my time doing is what I want people to know me for is like being able to find functionality
Greg Brockman: code base. Like absolutely not. Right. That's not how I define my value as an engineer, what I want
Greg Brockman: to spend my time on as an engineer. And now I think that that to me is the core of it, right?
Greg Brockman: That there's this amazing intelligence and that it can, first of all, suck away all the like kind
Greg Brockman: of mundane, boring parts. And certainly some of the, there are some fun parts too, right? Like,
Greg Brockman: you know, I think that really thinking about the architecture of things, it's a great partner,
Greg Brockman: But I get to choose how I spend my time. Right. And I get to think about how many of these agents you want running on what tasks, how do I break down things? And so I view it as increasing the opportunity surface for programmers. And, you know, I'm an Emacs user through and through. I, you know, I started using, you know, VS Code and Cursor and Windsurf and these things, partly to just just try things out, but partly because I like the diversity of different tools, but it's really hard to get me out of my terminal.
Greg Brockman: Wow. And so but, you know, I have found that we're now above threshold where I really find myself missing the like I'm like doing some refactor.
Greg Brockman: I'm like, why am I typing this thing? Right. Like, you know, it's like you're trying to remember exactly the syntax for a specific thing or like trying to, you know, sort of do these very mechanical things.
Greg Brockman: I'm like, I just want to like have an intern go do the thing. But I have that now in my terminal.
Greg Brockman: And I think it's really amazing that we're at the point that you have this core intelligence and that you get to pick and choose when and how to use it.
Andrew Mayne: Please add a whisper to the extension, too, because now I just love to talk to the model and tell it to do things.
Greg Brockman: Yeah.
Greg Brockman: Yeah, you should be able to video chat with your model.
Greg Brockman: I think we're heading towards a real collaborator, a real coworker.
Andrew Mayne: Well, yeah, let's talk about the future.
Andrew Mayne: Where do you see this headed?
Andrew Mayne: Where do you see what's exciting about the agentic future?
Andrew Mayne: How are we going to be using these systems?
Thibault Sottiaux: We have strong conviction that the way that this is headed is large populations of agents somewhere in the cloud that we as humanity, as people, teams, organizations supervise and steer in order to produce great economical value.
Thibault Sottiaux: So if we're going a couple of years from now, this is what it's going to look like.
Thibault Sottiaux: It's millions of agents working in our and companies' data centers in order to do useful work.
Thibault Sottiaux: Now the question is, how do we get there gradually?
Thibault Sottiaux: And how do we get to experiment on the right form factor and the right interaction patterns here?
Thibault Sottiaux: One of the things that is incredibly important to solve is the safety, security, alignment of all of this.
Thibault Sottiaux: So that agents can perform useful work, but in a safe way.
Thibault Sottiaux: And you get to always stay in control as the operator, as a human.
Thibault Sottiaux: And this is why for Codex CLI, by default, the agent operates in a sandbox and is able to edit files randomly on your computer.
Thibault Sottiaux: And we're going to be continuing to invest a lot in making basically the environment safe, invest in understanding when humans need to steer, when humans need to approve certain actions, giving more and more permissions so that your agent has its own set of permissions that you allow it to use and maybe escalate permissions when you allow it to do exceptionally more risky things.
Thibault Sottiaux: And so figuring out this entire system and then making it multi-agent and steerable by individuals, teams, organizations, and aligning that with the whole intent of organizations, this is where it's headed for me.
Thibault Sottiaux: It's a bit nebulous, but it's also very exciting, I think.
Greg Brockman: Yeah, I think it's exactly right.
Greg Brockman: I mean, I think at a zoomed in level, there's a bunch of technical problems that need to be solved.
Greg Brockman: Like Thibaut is kind of getting at scalable oversight.
Greg Brockman: Right. How do you as a human manage agents that are out there writing lots of code?
Greg Brockman: Right. You probably don't want to read every line of code.
Greg Brockman: Probably right now, most people do not read all the code that comes out of these systems.
Greg Brockman: But how do you?
Greg Brockman: But how do you maintain trust?
Greg Brockman: Right. How do you make sure that that AI is producing things that are actually correct?
Greg Brockman: And I think that there are technical approaches.
Greg Brockman: And we've been thinking about these kinds of things since probably 2017 is the first time we published some strategies for how you can have.
Greg Brockman: humans or weaker AIs start to supervise even stronger AIs and kind of bootstrap your way to
Greg Brockman: making sure that as they're doing very capable, important tasks, that we can maintain trust and
Greg Brockman: oversight and really be in the driver's seat. So that's a very important problem. And it really is
Greg Brockman: exemplified in a very practical way through thinking about more and more capable coding agents.
Greg Brockman: But I think there's also other dimensions that are very easy to miss because I think at each level
Greg Brockman: of AI capability, people kind of overfit to what they see and think, oh, this is AI. This is what
Greg Brockman: AI is going to be. But the thing we haven't quite seen yet is AI is solving really hard novel
Greg Brockman: problems. Right now you think of it as, okay, I need to do my refactor. You at least have a shape
Greg Brockman: of what that thing would be. It'll do a lot of the work for you, save a lot of time. But what
Greg Brockman: about solving problems that are fundamentally unsolvable through any other means? And I think
Greg Brockman: of this not necessarily just in the coding domain, but think of it in medicine, producing
Greg Brockman: new drugs.
Greg Brockman: Think of it in material science, producing new materials that have novel properties.
Greg Brockman: And I think that there's a lot of new capability coming down the pike that is going to unlock
Greg Brockman: these kinds of applications.
Greg Brockman: And so for me, one big milestone is the first time that you have an artifact produced by
Greg Brockman: an AI that is extremely valuable and interesting unto itself.
Greg Brockman: Not because it was produced by an AI, not because it was cheaper to produce, but because it's simply like a breakthrough.
Greg Brockman: It's simply something that is just novel and that the AI, you don't even necessarily have it to be autonomously created by the AI, but just in partnership with humans and that the AI has a critical dependency.
Greg Brockman: And so I think we're starting to see signs of life on this kind of thing.
Greg Brockman: We're seeing it in life sciences where humans ask, you know, human experimenters ask o3 for five ideas of experimental, you know, protocols to run.
Greg Brockman: They try out the five of them.
Greg Brockman: Four of them don't work.
Greg Brockman: One of them does.
Greg Brockman: And the kind of feedback that we've been getting, and this was back in the o3 days, is that it's kind of the results are at the level of what you'd expect from like a third or fourth year PhD student, which is crazy.
Greg Brockman: Yeah.
Greg Brockman: Crazy.
Greg Brockman: And that was o3.
Greg Brockman: Right?
Greg Brockman: GPT-5 and GPT-5 Pro.
Greg Brockman: We're seeing totally different results there.
Greg Brockman: There we're seeing research scientists saying, okay, yeah, this is doing real novel stuff.
Greg Brockman: And sometimes it's, again, it's not just on its own solving these grand theories, but it's together in partnership being able to just stretch far beyond where a human unassisted could go.
Greg Brockman: And that to me is like one of the critical things that we need to continue to push on and get right.
Andrew Mayne: One of the challenges I have when talking to people kind of about the future, and I want to hear you guys talk about this, is that people tend to imagine the future as kind of the present, but with like shiny clothes and robots.
Andrew Mayne: And they think about like, well, then what happens when robots do all the code and all that?
Andrew Mayne: And you brought up a fact that like the things you like to do and the things you don't care to do.
Andrew Mayne: Where are we in 2030?
Andrew Mayne: What does it look like?
Andrew Mayne: It was five years ago, GPT-3.
Andrew Mayne: Now, five years from now.
Andrew Mayne: 2030.
Thibault Sottiaux: We didn't have these still six months ago.
Thibault Sottiaux: So it's hard to picture exactly what this is going to look like five years from now.
Thibault Sottiaux: But one thing that is.
Andrew Mayne: I'm going to pop out of the bushes five years from now with this podcast and be like, you said this.
Greg Brockman: Well, your agent will do it for you.
Andrew Mayne: Yeah, it's going to.
Thibault Sottiaux: So one thing that's important is like the things that are the pieces of code that are critical infrastructure and underpinning society, we need to continue to understand and have the tools to understand.
Thibault Sottiaux: And this is why also we were thinking about code review is like and code review should help you understand that code and be this teammate that helps you dive into the code written by someone else, potentially help with AI.
Greg Brockman: And I would actually argue that we already have a problem of there's lots of code out there that is not necessarily secure.
Greg Brockman: Right.
Greg Brockman: This happens all the time.
Greg Brockman: I remember like Heartbleed back, I guess it's almost 12 years ago or something.
Greg Brockman: Critical vulnerability and a key piece of software used across the Internet.
Greg Brockman: And you realize that that's not singular, right, that there's lots of vulnerabilities out there that no one has found.
Andrew Mayne: All these packages and stuff from NPM and all these packages that are just sitting there that people put exploits into.
Greg Brockman: And the way that it's always worked is that there's a cat and mouse game between attackers getting more sophisticated, defenders getting better.
Greg Brockman: And I think that with AI, you're like, well, maybe it's going to like which side will advantage the most.
Greg Brockman: Maybe it'll just sort of accelerate this cat and mouse.
Greg Brockman: But I think that there's some hope that actually you can unlock fundamental new capabilities through AI.
Greg Brockman: for example, formal verification that are sort of an end game for defense.
Greg Brockman: And I think that that to me is very exciting is thinking about not just how do you continue this,
Greg Brockman: like, you know, sort of never ending rat race, but how do you actually end up with increased
Greg Brockman: stability, increased understandability? And I think that there's other opportunities like that
Greg Brockman: for us to really understand our systems in a way that right now it's almost, you know, we're sort
Greg Brockman: at the edge of human understanding of the traditional software systems that have been built.
Thibault Sottiaux: One of the reasons we built Codex is to improve the infrastructure and the code out there in the
Thibault Sottiaux: world, not necessarily to increase the amount of code in the world. And so this is a very important
Thibault Sottiaux: point where it's also helping find bugs, helping refactor, helping find more elegant, more
Thibault Sottiaux: performant implementations that achieve the same thing or actually are more general,
Thibault Sottiaux: but not necessarily ending up with like 100 million lines of code that you don't understand.
Thibault Sottiaux: One thing that I'm really excited about is how Codex can help teams, individuals,
Thibault Sottiaux: just write better code, be better software engineers,
Thibault Sottiaux: and end up with simpler systems that are actually doing more things for us.
Greg Brockman: I think part of the 2030 outlook is we will be in a world of material abundance.
Greg Brockman: I think that AI is going to make it much easier than you could almost imagine to create anything you want.
Greg Brockman: And that will probably be true in the physical world in addition to the digital world in ways that are hard to predict.
Greg Brockman: But I think it will be a world of absolute compute scarcity.
Greg Brockman: And we've seen a little bit of what this is like within OpenAI.
Greg Brockman: The way that different research projects fight over compute or that the success of the research program is determined by the compute allocation is something that is hard to overstate.
Greg Brockman: And I think that we're going to be in a world where your ability to produce and create whatever you imagine will be limited, partly by your imagination, but partly by the compute power behind it.
Greg Brockman: And so one thing we think about a lot is how do we increase the supply of compute in the world?
Greg Brockman: We want to increase the intelligence, but also the availability of that intelligence.
Greg Brockman: And fundamentally, it is a physical infrastructure problem, not just a software problem.
Thibault Sottiaux: I know with GPT-5, I think one thing that's quite amazing is we're able to give it as part of the free, the plus plan, the pro plan.
Thibault Sottiaux: It's like you can use Codex with your plus plan.
Thibault Sottiaux: You get GPT-5, like the same version that everyone else gets.
Thibault Sottiaux: And it's like this incredible intelligence.
Thibault Sottiaux: But the model is also incredibly cost effective in that way.
Andrew Mayne: I think that was one of the things that really stood out for me was I thought the model was much more capable, but it came out at the same price point or some ways cheaper than the previous model.
Andrew Mayne: And that was something like, wow, that patterns are great.
Greg Brockman: I think the degree to which we are improving the intelligence and cutting prices is something that is very easy to miss, take for granted.
Greg Brockman: But it's actually crazy, right?
Greg Brockman: I think we did like an 80% price cut on o3 or something like that.
Greg Brockman: If you just look at your point of like $0.06 per thousand tokens back for GPT-3 level intelligence.
Andrew Mayne: Yeah.
Andrew Mayne: There was an article that came out.
Andrew Mayne: One of the newspapers was complaining that, well, these reasoning models have made it more expensive.
Andrew Mayne: but they didn't compare reasoning models to reasoning models in like the last six to seven months and how much more efficient they've become.
Greg Brockman: Yep. And that, that will just continue,
Greg Brockman: you know, on the compute scarcity point, one thing that I find very sort of suggestive is
Greg Brockman: thinking about, you know, right now people talk about building big, you know, big fleets of a
Greg Brockman: million GPUs of millions of GPUs, that level of, of GPUs. But if we reach a point, which is probably
Greg Brockman: not in that far future where you're going to want agents running on your behalf constantly,
Greg Brockman: right? Like it's reasonable for every person to want a dedicated GPU just for them running their
Greg Brockman: agent. And so now you're talking almost 10 billion GPUs that we need. We're orders of magnitude off
Greg Brockman: of that. And so I think that part of our job is to figure out how to supply that compute,
Greg Brockman: how to make it exist in the world, but how to make the most out of the like very limited compute that
Greg Brockman: exists right now. And, you know, that's an efficiency problem. It's also an increase that
Greg Brockman: intelligence problem. But yeah, I think it's very clear that bringing this to fruition is going to
Greg Brockman: be just like a lot of work and a lot of building.
Thibault Sottiaux: So one of the interesting things about agents and the relationship to GPUs and them acting is that it is very beneficial to have a GPU also close to
Thibault Sottiaux: you. Because, you know, when it's acting and doing 200 tool calls over the span of like a couple of
Thibault Sottiaux: minutes, it's always is doing this back and forth between the GPU and like your laptop and executing
Thibault Sottiaux: those two goals, getting that context back, and then continuing to reflect.
Thibault Sottiaux: And so bringing GPUs close to people is a great contribution there as well.
Thibault Sottiaux: And it really benefits because it reduces the latency tremendously of the entire interaction
Thibault Sottiaux: and the entire rollout.
Andrew Mayne: Gentlemen, we get the question that comes up periodically about the future, about labor,
Andrew Mayne: about all of this.
Andrew Mayne: Number one, learn to code, not learn to code.
Thibault Sottiaux: I think it's a wonderful time to learn to code.
Greg Brockman: I think, yeah, I agree. Definitely learn to code, but learn to use AI. That to me is the most important thing.
Thibault Sottiaux: There's something tremendously enjoyable about using Codex to learn about a new programming language.
Thibault Sottiaux: A lot of people on my team were new to Rust and we decided to build a core harness in Rust.
Thibault Sottiaux: And it's been really great seeing how quickly they can pick up a new language just by using Codex, asking questions, exploring a code base that they don't know.
Thibault Sottiaux: and still achieving great results.
Thibault Sottiaux: Obviously, we also have very experienced REST engineers
Thibault Sottiaux: to continue to mentor and make sure that we have a high bar.
Thibault Sottiaux: But it's just a really fun time to learn to code.
Greg Brockman: I remember the way that I learned to program
Greg Brockman: was by W3 Schools Tutorials, PHP, JavaScript, HTML, CSS.
Greg Brockman: And I remember when I was building some of my first applications
Greg Brockman: and I was trying to figure out how to,
Greg Brockman: I didn't even know the word for it, serialized data, right?
Greg Brockman: And I came up with some sort of structure
Greg Brockman: that had some special sequence of characters
Greg Brockman: that was serving as a delimiter.
Greg Brockman: And what would happen if you actually had
Greg Brockman: that sequence of characters in your data?
Greg Brockman: Like, let's not talk about that.
Greg Brockman: So that's why I had to have a very special sequence.
Greg Brockman: And this is the kind of thing
Greg Brockman: where you're not gonna have a tutorial
Greg Brockman: that will flag this kind of issue for you.
Greg Brockman: But will Codex in its code review be like,
Greg Brockman: hey, there's JSON serialization,
Greg Brockman: just use this library?
Greg Brockman: Absolutely.
Greg Brockman: And so I think that the potential to accelerate
Greg Brockman: make it so much easier to code
Greg Brockman: so you don't have to sort of reinvent all these wheels
Greg Brockman: and that it can ask the question for you
Greg Brockman: or answer the question for you
Greg Brockman: that you don't even know that you needed to ask.
Greg Brockman: Like that to me is why I think it's like
Greg Brockman: a better time than ever to build.
Andrew Mayne: I've learned a lot just by looking at how it solves a problem.
Andrew Mayne: Found new libraries, found new methods and stuff.
Andrew Mayne: That's often, I like to sometimes give it like a crazy task.
Andrew Mayne: Like how would you create your own language model
Andrew Mayne: with only a thousand lines of code
Andrew Mayne: and what would you try to do?
Andrew Mayne: And sometimes it might fail,
Andrew Mayne: but then you can look at the direction it tried to do it
Andrew Mayne: and you go, oh, I didn't even know that was a thing.
Thibault Sottiaux: One of the things as well is that, you know, the people who are most successful coding with AI also have really studied, you know, fundamentals of software engineering and put the right framework in place, right?
Thibault Sottiaux: Architecture have taught about how to structure their code base and then are getting help from AI, but still, you know, following that general blueprint.
Thibault Sottiaux: And that, you know, really accelerates you and allows you to go like much further than, you know, you would be able to go if you actually didn't understand the code that's being written.
Andrew Mayne: Since you've launched this, since you've made this available to GPT-5, since you've been able to deploy things with Codex, what have you seen as usage rates?
Thibault Sottiaux: Yeah, usage has been exploding.
Thibault Sottiaux: So we've seen more than 10x growth in usage from across users and the users that were using it already are using it much more as well.
Thibault Sottiaux: So we're seeing more sophisticated usage and people are using it for longer periods of time as well.
Thibault Sottiaux: We have now included in the PLUS and PRO plan with generous limits, and that's contributed a lot to being successful.
Greg Brockman: Yeah, I think that the vibes, I think, also have really started to shift as people, I think, are starting to realize how you need to use GPT-5, right?
Greg Brockman: I think it's a little bit of a different flavor.
Greg Brockman: I think that we have our own spin on the right harnesses and tools and the ecosystem of how these things fit together.
Greg Brockman: And I think that once it clicks for people, then they just go so fast.
Andrew Mayne: Gentlemen, thank you so much for joining us here and talking about this. Any last thoughts?
Greg Brockman: Thank you for having us. Yeah, we're really excited about everything that comes next. I think we have so much to build. Progress continues on the exponential. And I think really bringing these tools to be usable and useful by everyone is core to our mission.
Thibault Sottiaux: Yeah, thanks for having us. I'm also super excited. Now that we have Codex and it keeps improving, we're also getting accelerated and building better Codex every day. And personally, I think I spend more time talking to Codex now than most people. And it's really how I feel the AGI, and I hope more people will be able to benefit from it.
What happens when AI becomes a true coding collaborator? OpenAI co-founder Greg Brockman and Codex engineering lead Thibault Sottiaux talk about the evolution of Codex—from the first glimpses of AI writing code, to today’s GPT-5 Codex agents that can work for hours on complex refactorings. They discuss building “harnesses,” the rise of agentic coding, code review breakthroughs, and how AI may transform software development in the years ahead. Chapters 1:15 – The first sparks of AI coding with GPT-3 4:00 – Why coding became OpenAI’s deepest focus area 7:20 – What a “harness” is and why it matters for agents 11:45 – Lessons from GitHub Copilot and latency tradeoffs 16:10 – Experimenting with terminals, IDEs, and async agents 22:00 – Internal tools like 10x and Codex code review 27:45 – Why GPT-5 Codex can run for hours on complex tasks 33:15 – The rise of refactoring and enterprise use cases 38:50 – The future of agentic software engineers 45:00 – Safety, oversight, and aligning agents with human intent 51:30 – What coding (and compute) may look like in 2030 57:40 – Advice: why it’s still a great time to learn to code