Loading video player...
We are now at this point of the model
release cycle. That's right. Anthropic
just released Opus 4.5. And I've jumped
straight to what you want to hear. Yes,
it is the best on the coding benchmarks.
But even better, the price is now three
times cheaper. It's only going to be $5
for a million input tokens now and $25
for a million output. That is going to
be the big headline of this release.
Opus now might actually become the model
that you're going to use most of the
time instead of saving it for just the
important tasks. The first thing that I
actually tried out on Opus 4.5 here is
the same test that I gave to Gemini 3
Pro when it released, and that's
generating a Minecraft clone from a
single prompt. You can see what Opus 4.5
has come up with here. And I've got to
say, using this, this is the best result
that I've ever gotten back from a model
on this single prompt test. You can see
we're able to move around. Everything is
pretty smooth. The FPS is really nice.
We can break blocks here, and we can
place them and even select our blocks
down in the block selector and fly
around the map if we want to as well.
This feels fully playable based on the
prompt that I gave it. I even managed to
write out subscribe, something you
should definitely do. And I just noticed
it has a day and night cycle as well.
For comparison, this is what Gemini 3
Pro gave me last week when I was testing
it out. You can see we do have a world
that is procedurally generating. I would
say it does look okay for a single
prompt experiment, but we aren't able to
break blocks. You can see the movement
is a little bit chaotic, and we can't
place them either. So, to me, Opus 4.5
is an absolute massive winner on this
single prompt Minecraft test that I like
to run on these models. And overall,
I've just gotten really good results
with it. You can see I asked it in a
single prompt here to build me a Lego
builder website that utilizes 3JS to
allow the user to build various Lego
pieces. And the result I got back was a
completely working Lego builder. You can
see we can pan around here. We can stack
pieces on top of each other. We can
change the color. We can switch this
into remove mode. And we can even choose
different Lego pieces. I'm actually just
amazed that we're at the point where
models can generate this so easily.
Obviously, those prompts were pretty
simple ones, but you'll see in the
benchmarks later, this model can
definitely code. But first, what I think
is more interesting about Opus 4.5 is it
uses dramatically fewer tokens than its
predecessors to reach similar or better
outcomes. So when you combine this with
the new effort parameter that the model
has, you can actually choose between
minimizing your time and spend or
maximizing its capability. In that
testing, they actually found that Opus
4.5 with a medium effort matches Sonic
4.5's best SWB verified score, but with
76% fewer output tokens. And even when
you have it on its highest effort level,
it actually beats Sonic 4.5 and uses 48%
fewer tokens doing so. When you combine
that with the fact that it's now three
times cheaper, I think this model is
going to become a daily driver for a lot
of people. If we take a look at the
benchmark results now, you can see it's
the best at software engineering with
GPT 5.1 Codeex Max coming in second. And
something quite funny is they actually
tested OPUS against their own take-home
exam that they give to prospective
employees. And Opus 4.5 scored higher
than any human candidate has ever scored
on this test. Is it over for us? Well,
if we take a look at the Arc AGI
benchmark, it actually scores second
best on this and it only loses out
slightly to the Deep Think Mode of
Gemini 3 Pro, but you can see it is a
massive leap since Opus 4.1. Taking a
look at the rest of the benchmark suite,
it wins on most of these. It only loses
out on graduate reasoning, visual
reasoning, and multilingual Q&A. But as
you can see, Anthropic really does focus
on that coding part. So, if you're not
using it for coding, it's still going to
be incredibly good, just maybe 1% less
than others. Another funny benchmark
that I actually noticed it was losing on
is the vending machine benchmark. Gemini
3 Pro can apparently still make you just
a little bit more money when it runs a
vending machine. Overall though, I think
the lower price of this model is going
to be incredibly compelling, especially
for organizations, and it just shows
that Opus is maintaining that reputation
of being incredible at coding. Let me
know what you think in the comments down
below. Are you going to switch over to
this model now that it's a little bit
cheaper? While you're there, subscribe.
And as always, see you in the next one.
Anthropic just dropped Opus 4.5. Its cheaper, smarter, and an absolute beast at coding. Here’s everything new, the benchmarks, and whether it’s finally time to switch. Let me know what you think! 🔗 Relevant Links Opus 4.5: https://www.anthropic.com/news/claude-opus-4-5 ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 0:00 Intro 0:12 Pricing 0:25 Coding 1:49 Token Efficiency Improvements 2:28 Benchmarks 3:23 Thoughts