Loading video player...
In this video, I'm going to show you how
you can leverage cloud code within
Chrome that allows new capabilities that
previously weren't possible. Your
browser is already logged into
everything. Think Gmail, Google Docs,
Sheets, YouTube. This could be Gemini.
This could be any different work
applications that you're using. All of
these are already authenticated. These
sessions exist. The cookies are already
there. And Claude can now use all of
that. Now, what's really exciting with
what they just announced is you can have
parallel actions across different tabs,
cross data transfer, and complex
multi-step workflows that don't require
tools like Playright, Selenium, or
Puppeteer. And one of the novel and
interesting things with this is you can
actually even leverage other AI web apps
directly from Cloud Code. And I'll show
you exactly how to do that. And the
benefit of this is everything that
you're already logged into within your
browser, Claude Code, is now going to be
able to access this. I do want to touch
on how this is a little bit different
than some of the other automation tools
that are out there. Think things like
Selenium, Playright, Puppeteer, all of
these browsers that are out there. They
have isolated browser context. They're
good for testing applications. But one
of the big issues with all of them is
when it spins up that Chromium instance
is there's no cookies or sessions, at
least not in a way that's easy to set
up. You have to authenticate every time.
Not to mention, there's going to be a
fresh install with the different
projects that you leverage it in. Now
claude in Chrome leverages your actual
browser. It can use all your existing
sessions and it's already logged in and
works just like you do. And the other
benefit of this is you don't actually
need different API keys to leverage
different services. So what do I mean by
that? So basically if you think of all
the services that you might use
throughout the day, think notion, maybe
it's air table, Figma, whatever your job
is, there might be internal tools that
you're leveraging or just other AI tools
that are out there. You don't actually
have to get the API key to wire this up
to get all of this working. Now in terms
of how you can use this, you can say
things like use Gemini to generate an
image with Nano Banana and then go and
post that to Slack. What it will do is
it will actually go through those
different services. It can leverage the
commands to copy, it can download, it
can use a browser just like you would.
It is now leveraging cloud code to
extend that harness capabilities to be
much more general and not just
necessarily focused on code. Now what's
within cloud code? Now there are a lot
of core functions that are very similar
to something like playwright, selenium
or puppeteer. It allows you to navigate,
read pages, take screenshots, click
videos that you can use within
documentation or debugging different
things. Now, additionally, what you can
do with this is you can actually
leverage JavaScript, which makes this
that much more powerful. Now, in terms
of how to set this up, it's pretty
straightforward. All that you have to do
is get the Claude and Chrome extension.
I'll put the links to all of this within
the description of the video. And this
web page also has a number of really
good examples in terms of how you can
use this. Now, another interesting thing
with Claude in Chrome is you effectively
have a sidebar like you see in these
images here. So, you're going to be able
to also use this directly within Chrome
if you'd like. Now, in terms of how you
leverage this within Cloud Code is
through an MCP server. So, it's going to
translate all of those requests into the
different browser actions that you're
asking. Additionally, what you can do is
say in that Gemini example of going and
getting an image, you can go and save
those images out locally. And what Cloud
Code will do is it will actually
orchestrate the browser and if need be,
it can leverage your file system. Say
you're trying to create a document and
you want to aggregate some information
or go and reach for some assets. You can
go and do all of that. Now, in order to
get set up, you do need Google Chrome.
You do need the Cloud and Chrome
extension. You do need the Claude Code
CLI. And then you also need a paid cloud
plan to get started with this. Now,
there's two ways to use this. You can
use this directly within your browser
within the side panel. You can chat
along with any page. You can watch all
of the actions in real time. This is
similar to the comment browser from
Perplexity. Also, the Atlas browser from
Chat GPT. and it also has a number of
built-in shortcuts within here. Now, the
one thing that I do want to mention
regardless of whether you're using this
within the side panel or within cloud
code is you do have to understand some
of the risk because this is a new
environment and it's going to be within
an environment where it's authenticated
to all of your different services. You
do have to be extra careful in terms of
where it's navigating and what it is
actually doing because as they say front
and center on the website, malicious
actors can hide instructions in
websites, emails, and documents to trick
the AI into taking harmful actions
without your knowledge. One of the
things that they did to help mitigate
this is it will actually ask your
approval either on every action or if
you want to auto approve on a domain,
you have to actually approve each
domain. And I think this is a pretty
good safeguard because you don't want to
be reading a malicious blog post for
instance that might have some hidden
prompt injection within that and then
try and take you to another site and
take your information or something to
that effect or run some JavaScript or
what have you. You do want to be mindful
of the different sites that the agent is
visiting and what it's doing. Now, in
terms of how you can leverage this, you
can't have shortcuts or skills that will
trigger different workflows of what you
want the agent to do within cloud
Chrome. Now, if you think about that,
think of all the tasks that you do
day-to-day within the web browser.
Imagine all of a sudden actually having
some of those actions automated. This is
the type of system that allows you to do
that. Say for instance, if you want to
summarize things, take screenshots, or
research particular topics, you can do
that through a handful of ways within
Cloud Code. whether it's setting up
slash commands or setting up skills that
will be triggered when you ask for that
particular action if it's something
reusable. Now for a quick demonstration.
So within cloud code once you have the
MCP server set up and you have the
claude in Chrome extension all installed
if you forward/mcp and you go within the
tool for cla and chrome you can see all
of the different actions that it can
take. It can navigate. It can resize
windows. It can create little videos
like I mentioned. It can upload images.
It can get page text. It can get the
context of the different tabs. That's
one thing to know with it is it actually
can go through different tabs or if you
want to paralyze different things, you
can do that. Additionally, you can read
console logs as well as network
requests. Now, just to demonstrate this,
I'm going to say let's go to the Gemini
web app and I want to put within the
prompt box that I want to generate an
image that says hello world and then
save that locally within this directory.
What it will do is it will read the
current tabs that it has within the
group. Now, what is a group? When you're
using Claude in Chrome, you'll notice
that it does have this little grouping
mechanism that you will see in all the
different windows that it's active.
You'll also notice that it will say
Claude has started debugging the
browser. And that's how it controls the
browser through the Chrome extension.
Now, what you'll see within here is we
see Gemini is loaded. Let me click on
the prompt box and enter a request to
generate an image with hello world text.
And the cool thing with this is if we
look at the initial request, it tried to
type generate an image. it realized the
text didn't appear within the box from
the screenshot that it took and then
from there it went and tried a different
approach. So instead of the position it
actually went and it got the ref for
that DOM element to go and click that
and it actually went ahead and generated
our image. Now that we have that it's
opened up our image for us. We can see
that Gemini has generated the image with
hello world text on the chalkboard. Now
I need to save it. Let me click on the
image to see the download options and
then from there we can see the image in
a full view at the top corner. Let me
click to download the image. And then we
can see that it actually went ahead and
it downloaded the full-size image. Now,
here we can see it's now asking, can I
download this image from Gemini? It will
be saved to your downloads folder as a
PNG. And then I can move it to the
current directory. I'll go ahead and
I'll say yes. Then from there, we can
see the image was downloaded. Now, let
me move it to your current directory as
a cleaner file system. And then here we
go. The generated image of Hello World
is now saved within the directory that
I'm currently in. It has the features of
a chalkboard. Now if I go and I click
that path, I can now see that I have
this image locally. This is just a
really quick example just to show you
what is now possible. Now another
benefit of this is you can actually use
this to debug different web applications
since it can read your console logs as
well as your network requests as well as
execute JavaScript and actually record
different demos. You can use this for
reports, documentation or when you're
actually building something with cloud
code or testing a feature, you can go
and leverage it. You can say something
like debug my app, check for console
errors, inspect API responses, so on and
so forth. Now, the possibilities, as you
might imagine, they're quite endless.
You can fill out forms, dashboard
extraction, social media management,
research, testing, personal use cases,
just things that you do day-to-day. Now,
with this tool is you can largely
automate a lot of these different
processes. All in all, just think about
all of the different things that you do
within the browser day-to-day. All of
them are different depending on who we
are, whether it's within our job, the
personal things that we do within the
browser, all of the repetitive clicks,
copy and pasting things from one tab to
another. What if we could actually have
Claude do all of that for us? Now, while
it might not be able to do all of it, it
could potentially be able to do a large
portion of it. Otherwise, that's pretty
much it for this video. I'll put all the
links within the description of the
video, but otherwise, if you found this
video useful, please like, comment,
share, and subscribe. Otherwise, until
the next
In this video, learn to utilize Claude Code within Chrome to access new automation capabilities that were previously impossible. Leveraging your existing browser sessions, Claude Code enhances cross-tab actions, data transfer, and multi-step workflows without additional tools like Selenium or Playwright. Discover how to integrate and automate tasks across various web applications, use AI web apps directly, and manage workflows securely. Follow along for a detailed setup guide, a demonstration, and insights on the potential risks and benefits of this innovative tool. Use Claude Code with Chrome: https://code.claude.com/docs/en/chrome 00:00 Introduction to Leveraging Claude Code in Chrome 00:21 Understanding Browser Sessions and Capabilities 00:47 Comparison with Other Automation Tools 01:27 Using Claude Code for Various Services 02:24 Setting Up Claude Code in Chrome 05:01 Demonstration of Claude Code in Action 07:20 Debugging and Advanced Use Cases 08:18 Conclusion and Final Thoughts