Claude Code Can Now Control Your Browser (Thanks to Vercel) | DailyDevLists

Loading video player...

Full Transcript

1,370 words • EN

This is agents browser, an open- source

headless browser CLI built in a weekend

by a single VEL employee that lets your

agent do anything in the browser from

dragging and dropping to uploading an

image and even toggling offline mode.

But why would anyone use this over

something like a browser use which has

way more features? And is Versel getting

into the agent browser space? Hit

subscribe and let's get into it.

2026 is the year of AI agents writing,

reviewing, and testing all of your code.

No more tab completions. In fact,

developers are even moving away from

idees entirely in favor of doing

everything in the terminal since all

we're really doing now is reviewing

code. And to help with this movement, we

need the agents to actually interact

with and test the code they've written.

Because the last thing you want to do as

a developer is to open up the browser to

test each feature an army of agents have

written one by one because that's just

tedious. This is where Vel's new agent

browser comes in handy. Written by Chris

Tate in both Rust and Typescript. I'll

explain why later. This tool makes it so

easy for an agent to interact with the

browser using CLI commands that do a

bunch of things like creating an

accessibility snapshot that provides an

accessibility tree and references to

elements of a page, reference based

actions that takes the references from

the tree and apply relevant actions to

them. There's also semantic locators if

you don't want to use references that

allow you to find an element based on

its area ro its text content its label

and so much more. In fact, let's go

through a quick demo of how it works.

Now, here is a little login page with an

email and password. And it's built with

Shadien React MV not because of a cell

or anything. It just happened to be

built that way. Now, there's one problem

with this whole page and it's that right

now I'm blinding my users because it's

in light mode. So I want there to be a

dark mode which I've actually gone ahead

and asked the agent to do but as you can

see it hasn't done it correctly. I mean

okay this text changes but nothing else.

So let's go ahead and get the agent to

fix this using agent browser. So right

now I'm using open code with the GLM 4.7

model but of course agent browser can

work with any agent and any model. I've

gone ahead and told it that dark mode is

broken and it should test it with agent

hyphen browser on the specific port.

What's important is this part of the

command to run agent hyphen browser-help

to see the available commands because

there's no slash commands, no skills.

I've just installed agent browser

globally with npm. I'm going to hit

enter and then it checks the available

commands, uses the agent browser

snapshot functionality to create a

snapshot of the page which shows

documents heading paragraph text and

even images. It's then clicked on the

relevant element and taken a screenshot

to see if dark mode is working. And this

is the screenshot if you're curious.

From here, it's gone ahead and fixed the

issue before taking another screenshot

of the fixed dark mode. And it's finally

finished the task, which we can test by

clicking up here. And we have a page

with perfect dark mode. Let's try

another test. Actually, while this was

running, I had another agent in the

background fix another issue. You may

have noticed that if I press the login

button, it will take me straight here

without any validation, which of course

isn't good. So I went ahead and asked it

to fix the issue with the validation in

this project and it did something

actually really interesting. It first

checks the available commands from Asian

browser and then if we scroll down it

fixes the issue and even makes a bash

script. So it's over here to test that

it works. So it echoes the first test,

adds an empty input, clicks the login

button and then expects these errors.

It's made a few tests here, but it's

actually made an even better bash test

down below, which we can see over here

that makes use of agent browser eval to

run some JavaScript code. So now we can

see if I press the login button, we get

some validation. This looks like an

email, but it's actually a placeholder.

If I give it an email over here, just a

made up one, and hit login, it says

enter a valid email, which I can do like

this. And then I can enter a password

before it takes me to this dashboard.

So, basically, agents have addressed two

issues with this app and tested it

themselves to validate it works using

the agent browser plug-in. I would say

if I could do it again, I'd go with a

model that has multimodal support so you

can actually read the screenshots it

takes instead of using GLM 4.7. But now,

let's get into some architecture. How

does agent browser actually do all of

these things? Well, after an agent runs

a command like agent browser click at

E2, this gets sent to a Rust binary that

passes that command and converts it to

JSON. The benefit of it being Rust is

that it's fast, resource efficient, and

stays running after it's spawned. This

JSON then gets sent to a node demon

through a Unix socket, and this demon

manages the Chromium browser. What's

cool about this is that a demon is run

on each session, meaning multiple

browsers can be controlled. Once the

demon validates the output, it launches

the browser which is headless by default

and executes the action using playright.

And once it's completed, it sends the

output also in JSON back to the agent.

So the CLI and from here the agent can

do whatever it wants. So it can send

even more commands to the ROS passer or

it can end the whole loop. All of this

is super impressive for a weekend's

project. But how does it compare to

browser use or even using the Playright

MCP server? Well, browser use can be

used with or without an external agent

because it can run the full agent

reasoning loop. So, plan, action,

observe, and replan all on its own

without the use of anything else. It

also has a Python and TypeScript SDK for

fine grained control. It has a skills

marketplace, an MCP server, and uses

better stack to track the API status,

which is very cool. Agent browser on the

other hand is much simpler and can only

be used with an external agent. So you

the developer have to provide an agent

like cursor, claw code, open code and

the agent only interacts with agent

browser through CLI commands. When it

comes to comparing agent browser with

the Playright MCP server right now,

agent browser only supports Chromium

browsers. So no Firefox or Safari, which

I believe the Playright MCP server does

support. In fact, it supports everything

that Playright can do, but with an MCP

server that is perfect for agents. The

only kind of downside is if you already

have loads and loads of MCP tools, then

adding a bunch more could confuse the

agents since it has more tools available

to choose from autonomously, and it

might not choose the right one the first

time. Basically, choosing between

browser use, the Playright MCP server,

and agent browser really depends on your

use case and what you want from your

agent. For me personally, I like the

simplicity and ease of installation from

agent browser. And I primarily use

Chromium browsers, so not too fussed

about not having Firefox or Safari

access. So, I'll use it for now and see

how it goes. Also, while I've got your

attention, we are so close to hitting

that 100K subscriber mark. So, if you

haven't already, go ahead and hit the

subscribe button.

Claude Code Can Now Control Your Browser (Thanks to Vercel)

Better Stack

48 days ago

6:52

Claude & Anthropic Ecosystem

Rank #2

Description

Agent Browser is a headless browser automation CLI that Vercel developed for AI agents (like Claude Code), created by Chris Tate in just a single weekend. This powerful tool combines a fast Rust CLI with Node.js to give AI agents complete control over web browsers through simple commands like open, click, fill, and snapshot, all with a unique ref-based system that makes element selection deterministic and AI-friendly. With support for multiple browser engines, session management, and seamless integration into existing AI workflows, Agent Browser provides a robust solution for web automation that works perfectly with tools like Claude Code and other AI agents. 🔗 Relevant Links Tweet from Chris Tate - https://x.com/ctatedev/status/2010400005887082907 Agent browser GH - https://github.com/vercel-labs/agent-browser ❤️ More about us Radically better observability stack: https://betterstack.com/ Written tutorials: https://betterstack.com/community/ Example projects: https://github.com/BetterStackHQ 📱 Socials Twitter: https://twitter.com/betterstackhq Instagram: https://www.instagram.com/betterstackhq/ TikTok: https://www.tiktok.com/@betterstack LinkedIn: https://www.linkedin.com/company/betterstack 📌 Chapters: 0:00 Intro 0:22 Agents are taking over in 2026 0:52 Introducing agent-browser by Vercel 1:30 Agent browser demo 1 on React + Vite proj 3:00 Agent browser fixes form validation 4:17 How agent browser works 5:06 Agent browser vs Browser Use vs Playwright MCP 6:30 My thoughts on agent-browser

Video Details

Category

Claude & Anthropic Ecosystem

Featured Date

January 13, 2026

Quality Rank

#2

AI Recommended