Loading video player...
This is agents browser, an open- source
headless browser CLI built in a weekend
by a single VEL employee that lets your
agent do anything in the browser from
dragging and dropping to uploading an
image and even toggling offline mode.
But why would anyone use this over
something like a browser use which has
way more features? And is Versel getting
into the agent browser space? Hit
subscribe and let's get into it.
2026 is the year of AI agents writing,
reviewing, and testing all of your code.
No more tab completions. In fact,
developers are even moving away from
idees entirely in favor of doing
everything in the terminal since all
we're really doing now is reviewing
code. And to help with this movement, we
need the agents to actually interact
with and test the code they've written.
Because the last thing you want to do as
a developer is to open up the browser to
test each feature an army of agents have
written one by one because that's just
tedious. This is where Vel's new agent
browser comes in handy. Written by Chris
Tate in both Rust and Typescript. I'll
explain why later. This tool makes it so
easy for an agent to interact with the
browser using CLI commands that do a
bunch of things like creating an
accessibility snapshot that provides an
accessibility tree and references to
elements of a page, reference based
actions that takes the references from
the tree and apply relevant actions to
them. There's also semantic locators if
you don't want to use references that
allow you to find an element based on
its area ro its text content its label
and so much more. In fact, let's go
through a quick demo of how it works.
Now, here is a little login page with an
email and password. And it's built with
Shadien React MV not because of a cell
or anything. It just happened to be
built that way. Now, there's one problem
with this whole page and it's that right
now I'm blinding my users because it's
in light mode. So I want there to be a
dark mode which I've actually gone ahead
and asked the agent to do but as you can
see it hasn't done it correctly. I mean
okay this text changes but nothing else.
So let's go ahead and get the agent to
fix this using agent browser. So right
now I'm using open code with the GLM 4.7
model but of course agent browser can
work with any agent and any model. I've
gone ahead and told it that dark mode is
broken and it should test it with agent
hyphen browser on the specific port.
What's important is this part of the
command to run agent hyphen browser-help
to see the available commands because
there's no slash commands, no skills.
I've just installed agent browser
globally with npm. I'm going to hit
enter and then it checks the available
commands, uses the agent browser
snapshot functionality to create a
snapshot of the page which shows
documents heading paragraph text and
even images. It's then clicked on the
relevant element and taken a screenshot
to see if dark mode is working. And this
is the screenshot if you're curious.
From here, it's gone ahead and fixed the
issue before taking another screenshot
of the fixed dark mode. And it's finally
finished the task, which we can test by
clicking up here. And we have a page
with perfect dark mode. Let's try
another test. Actually, while this was
running, I had another agent in the
background fix another issue. You may
have noticed that if I press the login
button, it will take me straight here
without any validation, which of course
isn't good. So I went ahead and asked it
to fix the issue with the validation in
this project and it did something
actually really interesting. It first
checks the available commands from Asian
browser and then if we scroll down it
fixes the issue and even makes a bash
script. So it's over here to test that
it works. So it echoes the first test,
adds an empty input, clicks the login
button and then expects these errors.
It's made a few tests here, but it's
actually made an even better bash test
down below, which we can see over here
that makes use of agent browser eval to
run some JavaScript code. So now we can
see if I press the login button, we get
some validation. This looks like an
email, but it's actually a placeholder.
If I give it an email over here, just a
made up one, and hit login, it says
enter a valid email, which I can do like
this. And then I can enter a password
before it takes me to this dashboard.
So, basically, agents have addressed two
issues with this app and tested it
themselves to validate it works using
the agent browser plug-in. I would say
if I could do it again, I'd go with a
model that has multimodal support so you
can actually read the screenshots it
takes instead of using GLM 4.7. But now,
let's get into some architecture. How
does agent browser actually do all of
these things? Well, after an agent runs
a command like agent browser click at
E2, this gets sent to a Rust binary that
passes that command and converts it to
JSON. The benefit of it being Rust is
that it's fast, resource efficient, and
stays running after it's spawned. This
JSON then gets sent to a node demon
through a Unix socket, and this demon
manages the Chromium browser. What's
cool about this is that a demon is run
on each session, meaning multiple
browsers can be controlled. Once the
demon validates the output, it launches
the browser which is headless by default
and executes the action using playright.
And once it's completed, it sends the
output also in JSON back to the agent.
So the CLI and from here the agent can
do whatever it wants. So it can send
even more commands to the ROS passer or
it can end the whole loop. All of this
is super impressive for a weekend's
project. But how does it compare to
browser use or even using the Playright
MCP server? Well, browser use can be
used with or without an external agent
because it can run the full agent
reasoning loop. So, plan, action,
observe, and replan all on its own
without the use of anything else. It
also has a Python and TypeScript SDK for
fine grained control. It has a skills
marketplace, an MCP server, and uses
better stack to track the API status,
which is very cool. Agent browser on the
other hand is much simpler and can only
be used with an external agent. So you
the developer have to provide an agent
like cursor, claw code, open code and
the agent only interacts with agent
browser through CLI commands. When it
comes to comparing agent browser with
the Playright MCP server right now,
agent browser only supports Chromium
browsers. So no Firefox or Safari, which
I believe the Playright MCP server does
support. In fact, it supports everything
that Playright can do, but with an MCP
server that is perfect for agents. The
only kind of downside is if you already
have loads and loads of MCP tools, then
adding a bunch more could confuse the
agents since it has more tools available
to choose from autonomously, and it
might not choose the right one the first
time. Basically, choosing between
browser use, the Playright MCP server,
and agent browser really depends on your
use case and what you want from your
agent. For me personally, I like the
simplicity and ease of installation from
agent browser. And I primarily use
Chromium browsers, so not too fussed
about not having Firefox or Safari
access. So, I'll use it for now and see
how it goes. Also, while I've got your
attention, we are so close to hitting
that 100K subscriber mark. So, if you
haven't already, go ahead and hit the
subscribe button.
Agent Browser is a headless browser automation CLI that Vercel developed for AI agents (like Claude Code), created by Chris Tate in just a single weekend. This powerful tool combines a fast Rust CLI with Node.js to give AI agents complete control over web browsers through simple commands like open, click, fill, and snapshot, all with a unique ref-based system that makes element selection deterministic and AI-friendly. With support for multiple browser engines, session management, and seamless integration into existing AI workflows, Agent Browser provides a robust solution for web automation that works perfectly with tools like Claude Code and other AI agents. š Relevant Links Tweet from Chris Tate - https://x.com/ctatedev/status/2010400005887082907 Agent browser GH - https://github.com/vercel-labs/agent-browser ā¤ļø More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ š± Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack š Chapters: 0:00 Intro 0:22 Agents are taking over in 2026 0:52 Introducing agent-browser by Vercel 1:30 Agent browser demo 1 on React + Vite proj 3:00 Agent browser fixes form validation 4:17 How agent browser works 5:06 Agent browser vs Browser Use vs Playwright MCP 6:30 My thoughts on agent-browser