Loading video player...
Code review of two comment mark.
>> Hey everyone, I'm Roma
>> and I'm Maya.
>> Codex needs to do two things really well
to be an effective coding teammate.
First, it needs to work with all your
tools and second, it also needs to plug
into all of your team workflows. Code
review is one of the most important
workflows for any engineering team and
we want to help there as well. With GPT5
and now GP5 codeex, we train these
models specifically to find bugs and
investigate some issues. Maya, why don't
you tell us more?
>> Sure. I work on alignment team within
OpenAI research and code review is a
very important research project for us.
As the AI capabilities grow and we have
more and more powerful coding agents, we
are producing a lot of code and human
verification is becoming the bottleneck.
So we need to make sure that we're also
training powerful models to help humans
in verification and that our
verification abilities are scaling as
fast as AI capabilities. That's why we
trained code review models. And we could
have done it just like as a research
exercise internally, but it's actually
way more useful to have it interact with
our OpenAI code base and also release it
externally.
>> Well, very much align with our
philosophy at OpenAI of like iterative
deployment and making sure the
technology and contact with reality. So,
I'd love to enable code review. Can you
uh show us how to do it?
>> Absolutely. So the first thing you will
have to do is to go in your codex web
setting and just enable it.
>> So that's all it takes. And now going
forward to understand like any PR
submitted to that repo will be
automatically reviewed by codeex.
>> Exactly. So for example here I quickly
coded up some simple extension. I
created a pull request and it's marked
as ready for review. The code review
agents will pick it up and start
reviewing it.
>> In the meantime, there's a little bit of
an emoji with the eyes over here.
>> Exactly. Um, why don't you uh show us
one where Codeex has already completed
the review.
>> So, here's another one. It looks a bit
different because here I triggered the
review already when the PR was in a
draft stage and I didn't yet wanted real
humans to look at it.
>> So, that's an interesting technique. So,
you manually add codeex with the message
and it sounds like in this case you also
wanted to have specific instructions,
right?
>> Yeah. So you can always comment at
codeex review this PR and you can add
some extra information for example
something that would help the agent
understand the PR or you can instruct it
to focus on some specific area here what
I love about Codex code review and I
want people to understand like it's not
just static analysis right like the
model has access to tools it's able to
check its own work to run tests and
commands can you show us what happened
behind the scene for this PR
>> yeah so one important thing to note is
the model doesn't only look at the diff.
It has access to the whole repository.
It can track down dependencies. It can
try to understand it in a broader
codebase which is very important when
you're working on a very complex code
where you're not the only contributor
and you don't understand it in its
entirety.
>> And here it sounds like you can also
review all of the logs that led to that
particular review, right?
>> Yeah. So here the model was not only
browsing through the codebase but it
actually decided to form some hypothesis
and write some Python code to test the
hypothesis and check whether uh it's
actually correct and show it on a couple
of examples.
>> That's really cool. And I'm curious like
how was this model trained? Like did we
do anything specific to like make it so
good at like catching bugs? Yeah, so it
is part of the codeex training, but we
added some specific tasks in which we
wanted to really prioritize catching
bugs that actually matter and people
would be willing to fix in real life. We
also made sure that we aimed for very
high precision rate. When we evaluated
it against the previous generation of
the models, we have shown that it has
much lower incorrect comments rate. But
to be fair, the most important
evaluation is just people using it in
practice. and it finding the real bugs
and simultaneously not being annoying by
outputting too much comments.
>> Yeah. So, you showed us like a couple
examples with GPTOSS. Now, I'd love to
touch more on like how we use it
internally at OpenAI.
>> Yeah, we've already used it for a while
and we have gotten a lot of examples of
it really saving us from having critical
issues. So it saved us from having some
important training run bugs that would
potentially delay some important model
releases or some configurations that
would not be normally visible just from
the div alone. It also allows us to more
confidently contribute to the code bases
that we're not fully familiar with. So
for example here we have Alex our
codeexpm contributing towards VS code
extension and implementing a change code
review model decided that it's not the
correct way to implement it.
>> Right. It sounds like it caught a bug in
terms of like removing a react prop here
with the CSS.
>> Yeah, it was not really possible to spot
this bug without understanding of the
broader
>> code here. But then I love Alex's
followup as well at Codex. It's fair
enough. Fix it up because you can keep
on going with this conversation when and
ask Codex to take it over.
>> Yeah. After a code review bot produces a
finding, you can always ask Codeex to
trigger a new task and actually fix it,
which is a very nice workflow to do.
>> Yeah, that's really amazing. And really
the vision of having Codeex at any point
and any surface as your teammate that's
able to kind of do work for you. And
what I love about like codeex code
review here in this case is that you may
have like amazing engineers on your team
but they may not have like many hours to
spend to kind of not just look at a diff
but like run every possible scenario in
their head. And here you have codeex
that's able to do that in its own time.
And one thing I wanted to touch on as
well is that we now have like agents.mmd
right this like open format for coding
agents including codeex um to kind of
like follow some specific instructions.
Uh have you like used agent MD like in
your case to kind of give codeex code
review some extra instructions?
>> We actually train the model specifically
to be steerable either with uh with
custom user instructions or look for
agents MD within the codebase. Uh so if
you add some custom code review
guidelines to your package that would
help model to understand the code ways
but also potential requirements about
what it should pay special attention to
in code review or maybe what kind of
problems to ignore and don't bother your
developers about.
>> That's awesome.
>> Uh you can even like ask some custom
special instructions about the style it
responds to. So, for example, since I
needed a little bit more validation, I
wanted Codeex to tell me every time I
make a bug that I'm still an amazing
programmer.
>> That's awesome. And you are
uh what we've been looking at here so
far has been Codex reviewing code in the
cloud, but just recently we also
introduced review in the Codeex CLI. So,
you can have a review in the terminal as
well.
>> So, very often you want to review the
code already before it makes it into the
GitHub. For that purpose, CLA is perfect
and it also can execute more things
locally on your computer. So you could
just slash review in order to tell the
model to review your current changes.
And we're going to improve on this
feature a lot over the next weeks.
>> So it's basically like making sure all
of your changes that you're introducing
locally don't have like bugs uh before
you submit them to to a PR.
>> Exactly.
>> Amazing. Well, there you have it. Codeex
code review. So with Codeex, you now
have an amazing coding agent to pair
with locally in the terminal or in your
code editor. An awesome teammate to send
tasks to in the cloud at any time. You
also have this ability for Codeex to
review your changes locally or your PRs
on GitHub before your co-workers even
get to them. It's going to help you
catch bugs before they hit production.
And we can't wait to see how that's
going to accelerate your team and
ultimately help you ship better and
safer products. Thanks for watching.
Back to work.
>> Indeed.
Maja Trębacz and Romain Huet show you how to set up Codex to automatically review new pull requests in GitHub and in the Codex CLI, and discuss the research approach we used to train our models to generate quality code reviews. Timestamps: 00:00 Intro 01:23 Automatic code review in GitHub 02:34 Codex code review vs static analysis 03:31 Training models for high-signal code reviews 04:18 How OpenAI uses Codex code review 05:57 Customizing code review with AGENTS.md 06:58 Code reviews in the Codex CLI Try it yourself: Code review: http://chatgpt.com/codex/code-review Learn more: Codex: openai.com/codex Developer docs: https://developers.openai.com/codex/cloud/code-review