Loading video player...
So recently Anthropic released this
article called effective harnesses for
longrunning agents. And basically it
walks you through how you can take
clawed code and have it run for a very
long time and build out an entire
application. And the main shortcomings
that they come up with in this article
is that longunning agents that basically
continuously iterate on your project end
up causing a lot of issues. They end up
breaking existing functionality as the
project gets larger. They lose context
as to what's already been implemented
and what they need to implement next.
And often it's just very problematic.
And so they put a lot of time into
trying to understand a better way to
make a longunning agent to basically
build out a full stack application from
a highle uh prompt. For example, build
me an application where I can use AI LLM
to generate images. A prompt is simple
as that and then basically build out an
entire application with hundreds of
features. Now through reading this I was
kind of skeptical cuz I have been coding
with LM for a while and I know the
shortcomings as well. Like it's not
perfect. You do have to babysit a lot.
But they do have this repo that I
downloaded and I decided to just pull it
in, clone it, you know, get Python set
up and they kind of walk you through
that process and then I was able to run
it for 24 hours on my own computer just
to kind of verify is this article
actually onto something or is this just
a bunch of baloney. So let me demo what
this managed to build out in 24 hours.
Now I basically started with a single
prompt. I want like a a dashboard for
modifying a bunch of images and using AI
to generate stuff. I can't remember
exactly what my prompt was, but that's
basically what it was. So, let me log in
and kind of demo some of the key
features. It'll be too long to actually
demo everything because there's a lot of
features, but here is the overview and
we can go to the canvas to start
generating new images, right? So, I'm
going to go over here and I'm going to
say a giant dog walking through a city
and we can go ahead and select whatever
aspect ratio that we want. Maybe I'll
make it more like uh, you know,
vertical. You can choose the model. I'll
do Flux Pro because it's pretty nice.
And then you can go over here and change
how many steps I have. These modifiers
I'll say like cinematic.
I'll say depth of field. How about that?
And we'll just go ahead and generate
this image and see what happens. And
again, you can see down here, here are
all the other images that I've been
generating. And you can click on them.
You can go to a preview. You can zoom
in. You can go over here. You can
actually like create a variation of it.
You can remix it with a new prompt. You
can upscale. You can download it, which
brings it to your computer. Let me
create a variation of this blue chicken
as well. And you'll see that start
getting kicked off in my queue. Let's go
back to the the giant dog walking
through uh the city. Okay. So, with
this, what I can do is I'm going to go
back and I'm going to go to image to
video and I'm going to drag this in. I'm
going to make it short so it doesn't
cost me a bunch of money. Uh, a dog
walking towards a camera and I'm going
to use WAN 2.5 for this model. And then
we are going to just go ahead and keep
it vertical as well because I think
that's the original one that I did. It
looks like I failed to do that, but I
will show you that I basically took this
and I did image the video with this
chicken.
Okay. And then if this worked, we would
have had like an animated video of this
dog walking. I don't know why it's
broken. Again, having an agent run for
24 hours on your computer. It's going to
have bugs still, but it got a lot of
stuff done. I can go to my gallery. I
can view all my images. I can go in. I
can filter through them. I can click
into them like I showed you. I can go to
a collection and I can add various
things to my collection. So, let's just
add that wolf or that dog back to this
collection, which gets added down here.
I can set it as my cover. I'll go back
to collections. It shows up. I can go to
my projects. I can add images to this. I
can then share this project with other
people. So if people wanted to actually
like download these images, I can share
them um on the individual images
themselves. I can go and I can share the
im individual images. I'll create a
share link and then later I can go back
and I can delete that share link so
people can't actually download it
anymore. Uh there's also the ability to
mark things as trash and then that gets
cleaned up periodically after 30 days or
I can just empty out the trash bag the
trash bin. There's also this batch tool.
I haven't dived into this yet. I don't
know even what the purpose of it is, but
it basically just figured out all these
features and it started implementing
these and then I can view these
different models like some of the images
that they created. I can click use this
model and that'll take me well that's a
bug. It should take me to the actual
canvas page. We got a settings page with
all these different user preferences and
we have we have a credits page with like
usage. I can go and buy more credits. So
that's a quick overview of some of the
features. Again, it's not 100% perfect.
There are bugs and you do have to come
through and you have to kind of fix
these bugs manually. But the most
amazing thing about this is I literally
kicked this thing off and then I went to
bed and came back and it had like 30
more features added and then I kicked it
off again because sometimes it does
crash. I kicked it off again and I went
out shopping for a couple hours. I came
back and checked it and it had like 10
more features added. So I'm going to
show you how I got this little repo set
up locally to build out basically this.
And I do have a more deeper dive that
I'm going to have on my course which by
the way my course is now live. If you
want to go check out my
agenticjumpstart.com
course, I have a bunch of videos t
talking about how to do prompt
engineering, context engineering, how to
do agentic coding, I have almost 11
hours of content right now. I'm still
adding more as this, you know, the AI
basically progresses. I'm going to keep
on adding more videos and modules. So, I
think it's a very very high value if you
want to learn more about Opus 4.5,
UPD5.1 codecs, cursor, etc. Here are
some of the uh actual videos that I
have. So, like we have spec driven
development, I kind of talk about that.
MCPs, agentic mindset, and then I walk
you through how to set up cursor. I walk
you through how to set up cloud code and
the most important part in my opinion is
I show you my actual workflow of
building out a full stack web
application. So we're using tan stack
start with shad CN with drizzle for the
OM with Postgress and I basically build
out an entire application using the
exact workflow I use when I do agent
coding. And I think that's the most
valuable part. And then I also just
throw in some bonus videos as well. And
then I had this autonomous coding
section that I'm going to start building
out because I do think autonomous coding
is going to be much more powerful soon.
So yeah, go check out my course
agentjumpstart.com. We got a free
community, a lot of people, and I do
plan to keep building it out. So let's
check out this repo. I basically went
and I cloned the repo. And if I go to my
code over here, you can see I have it
already cloned out. And once you clone
it, there is an autonomous coding
folder. So you can go into this folder
and also you can load up the read me if
you want. And they kind of walk you
through how to get this set up. You have
to have the claw code anthropic CLI tool
installed. You have to have this uh
requirements file installed. You have to
have Python set up and pip set up. But
once you have all that stuff set up, you
can actually export an Enthropic API
key. There's also a way to set up with
your claude code ooth key, which I
actually did in this project. So I can
kind of walk you through how I did that.
So then when you have it all set up, you
can run a command to basically kick it
off. But before you do that, you can go
into prompts here and then there is an
appspec that you can kind of modify. So
this is where I defined the app I wanted
built out. I said an AI image design
studio creative AI imager image
generation platform and then if you
scroll down you can see that it lists
out a highle design and goal of like
what we're using. So this case it says
next JS16 tailwind CSS tan stack query
we have forms with react hook form shad
for the component library the back end
is using postgrass with drizzle OM with
file.ai AI for the actual image
generation all the models file storage
with S3 compatible better off for the
authentication
um deployment with docker compose and
basically it just gives this a good
guideline of everything that the LLM is
going to need like cloud code reads this
every single time to verify that it's on
the right path to build out the project
kind of like what I showed over here and
then you have like some core features
that document the landing page
authentication dashboard image
generation and you have all this like
look how long this file is this is a
huge file. Now, I didn't type this out
by hand. If you clone this repo, they
have an app spec that kind of describes
how to build a claw code clone. All I
did was I went here to Gemini 3. It's a
really good model for basically doing,
you know, longer writing or
documentation. I just went here and I
said, "Hey, refactored this entire thing
to actually just be an AI image
generation studio application." That's
literally all I did. And I I clicked
submit. And then it came through and it
refactored this entire app spec, all
like 10,000 lines of it to be geared
towards an app uh an AI generation
studio application. And this is the core
thing that it builds everything off of.
So keep that in mind. I'm going to show
you now actually how you can run this.
So let's go down to one of these
terminals. I'm going to go ahead and go
to this one. Now the way I had this set
up on my machine is I did want to do a
venv type of setup. Now if you're not
familiar with Python, it basically like
a virtual environment, right? And so
what you can do is you can go into the
autonomous coding folder and I can just
say source venv bin activate. You might
also have to set one up. Uh if you know
how to do that in Python, it's pretty
easy. But now I have this shell set to
that venv. And then also I had to export
this claw code ooth token which I'll
kind of show you in the codebase where I
had to change that to get this working.
But you have those two things set up and
all you have to do is just run your
autonomous agent. I'm going to go ahead
and just give it a max iterations of
one. Uh and then we're going to go ahead
and say project directory. I'm going to
say awesome
image gin. Okay, so this when you kick
this thing off, it's going to run the
Python script and it's going to run the
initializer, which is the first agent.
And they do kind of talk about this in
the documentation. And if you go back to
the blog post, they do talk about the
init initializer agent. And that is
basically this prompt here. And that
tells it to read through the app spec
and basically create a giant JSON file
of all really simple features and a
couple of steps of how you can basically
verify that the feature still works and
it has a boolean to track if it's been
implemented or not. And then down here
it says, "Read through the app spec and
create a minimum of 200 features." So
all the features you saw in my AI image
design studio. I didn't have to write
these out. It basically just did it for
me. It figured out all the best features
I could use and I just came back and
checked it and kind of refactored the UI
a little bit to make it look a little
bit nicer for this demo. Um, but that's
basically the output that you get. Now
you can read through this and you can
modify. That's the cool thing is you can
modify the initializer prompt and make
it as custom as you want. But overall I
probably wouldn't change this one. And
when it's done though, we can go to the
generations folder. And you can see we
have an awesome image gen uh folder
that's getting created. And soon this
thing will be done. The initializer
script does take around like 5 minutes
to actually finish because it has to
generate a giant JSON file with a ton of
information. Um but when it's done, I
can kind of show you this. Hopefully it
finishes soon. It's then going to start
going through all of the the features
one by one. So let me go to the features
list for the image design studio. This
is something I've already been kind of
working on. You can see it goes through
and it starts just creating passes true
for like tons and tons of features. So
let me just show you how many we have
done now. We have 129 that are actually
passing. So it basically picks one
feature at random and it tries to find
one that has like the highest priority
and then when it finds the highest
priority feature, it then rates the
feature using claw code and then it
tries to use puppeteer. So this is the
the the interesting part and I think the
key takeaway of these autonomous agents
I run forever is that you have to have a
way for the agent to verify what it's
doing. And so the way that this script
kind of works, the Python script works
is it loads up a puppeteer agent and it
goes to your actual web application with
a browser and then it can take
screenshots and it can click around. And
so it basically runs through these
steps. every single time it tries to
implement a new feature, it goes back
and it tries to verify that, hey, like
the feature I just added, is it correct?
And until it can pass these steps, it
will not continue on to the next
feature. And so when it's first starting
off, you'll see it basically load up
Puppeteer, it clicks through these
steps, then it marks this feature as
done and it goes to the next one. Now,
as the project gets larger and you get
to like 50 or, you know, 100 features,
it does start to slow down a lot because
what it does is it picks a couple of
features at random and it tests them out
and it verifies that it still works and
then it moves on and it adds your new
feature and then it runs the test for
that new feature and then it moves on to
the next one. Right? And so you can tell
loading up Puppeteer, taking a
screenshot, sending that screenshot, the
clawed code, it's slow. It's a very slow
process and when it finds a bug with
your tests, it then has to go and try to
fix it and then it spends a bunch of
time doing that. But overall, like this
is some of the outputs that you're going
to get. And I just basically let this
code all night when I was sleeping. I
let it code when I went out shopping and
I didn't have to do much. And I could
see the power of actually hooking this
up to existing projects and having like
a list of features you want to add in
and just letting it cook and come back
and you have like a fully working
prototype that then you can pass off to
a real experienced developer using cloud
code to really start polishing it all,
fixing little bugs here and there and
just taking it from, you know, vzero
iteration all the way to production.
Now, in order to get this running, I did
have to kind of modify the codebase a
little bit to support this new token
because I didn't want to use an
anthropic API key. It does cost a lot of
money. So I have my $100 a month claude
code subscription and I wanted to use
that. So you basically just in this file
I added in has OOTH token and I checked
for both of those. So I think I changed
this stuff. And then if I were to go to
the next file I have this check that
verifies that one of these are set
automaker. I think those are the two
main things that I changed. There might
be some other places but honestly if you
can't get it working I was funny. I was
working with someone on Discord to try
to get this working. We just dropped it
into I think the warp terminal and he
said, "Hey, can you get this working
with my claw code oath token?" And in
one shot it basically went through the
code, it fixed it and we're able to get
it running. So like just remember you
have these tools that you can use to
like get you past any bugs that you run
into. But overall, I mean this thing is
pretty cool. Okay, so it looks like it
just wrote my features list. So I can go
ahead and open that one up and you'll
see here we have a giant features list
inside that new directory, the awesome
image gen. And everything is set to pass
as false right now. So eventually it's
going to start going through and it's
going to start building out this
application from the ground up. Now this
will take a while from the the initial
couple features because it has to go and
set up a next.js application project. It
has to go and bring in Drizzle and do
all the mpm installations and whatnot.
But at some point like after a couple
hours you'll actually start seeing a
fully working application and it
honestly just works. Like you don't have
to even like do any babysitting. It just
kind of works and you just let it cook.
You come back and you'll see all these
cool things that it added in. Now, the
very last step is when it gets past
this, I don't know if I'll stay around
for waiting for this to be done, is that
there's another prompt that it uses for
basically coding it all out. And if I go
back to the repo over here, let's go to
the prompts. And there's one called
coding prompt.
Now, the coding prompt basically forces
the cloud code instance to like really
understand what it's doing. So like the
first step is get your bearings
mandatory. It runs through, it does like
git diffs, it does get git logs, it
reads the app spec, it reads your
feature list. It tries to figure out
what features are implemented and what
are not and it adds that all to your
context window and then it starts and
creates your project if it's not already
running and then it runs up to one or
two random features. It basically just
picks two random. It just runs the
tests. It steps through with Puppeteer
that make sure it still works and then
if everything's good, it'll go and
actually start implementing your new
feature. Um, actually step four is
actually it looks and it finds one that
has the most highest priority feature.
Okay, so usually when you're building
out an application, you do have like a
priority chain of like which one should
come first, what should come second,
which one is dependent on something else
and it kind of figures that out through
step four. Step five, it actually
implements it. So it implements it and
then it verifies the feature works with
those steps. Uh, verify with browser
automation, you must verify the features
to the actual UI. It has some guidelines
that help it actually test properly and
then it updates your feature list.json
and then it commits your progress and
then it updates your notes and then it
ends your session so that the next
session is a completely clean context
window. It can start fresh. Now, this
should show you that I'm not like BSing
you. Like I have the image design studio
here. I have 134 commits and I have a
ton of code already. I do have some
commits that are from me where I kind of
cleaned up the the UI, made it look a
little bit nice. But if I go back to my
first commits, you will see like
literally all 130 of these commits all
coming from the agent just adding
feature after feature. Task 49, task
3840, task 3942. It just runs through
that appspec file in your feature list
and just starts adding stuff. Okay. And
here's the initial one where I basically
set up the feature list and whatever. So
I'm kind of blown away by this. I mean,
I've always thought that the agent
coding was the future, but now this
fully autonomous agents, I never thought
it was going to be a realistic thing
because like, you know, when the Devon
demo came out, like I'm like, "Ah, this
is stupid." But now, I think these LLMs
are so good that we're going to see a
lot more projects being 100% created by
just autonomous agents just running
non-stop. Uh, or even just having agents
that run non-stop to check for security
vulnerabilities, checking for
performance issues, checking and fixing
documentation. you could actually kick
off many of these things just constantly
running and checking your codebase and I
think that's going to be the future and
I think you guys want to be part of it
and so if you do want to be part of that
future go check out agenticjumpstart.com
my course I'm going to walk you through
all the stuff I've learned along the way
with agentic coding and more
specifically I do have this new section
that I'm kind of working on called
autonomous coding which I'm going to
start doing an indepth uh overview of
using this repo and I'm actually
starting to build out some other tools
locally so that I can actually have
autonomous agents running on multiple
projects on my computer at the same
time. But other than that, hope you guys
enjoy this video. Go check it out. Do
not sleep on this. I think this is a
really amazing thing.
Buy the course now: https://agenticjumpstart.com course. Join the Agentic Jumpstart community: https://discord.gg/JUDWZDN3VT article: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents anthropics repo: https://github.com/anthropics/claude-quickstarts/tree/main/autonomous-coding --------- Have a video suggestion? Post it here: https://suggestions.webdevcody.com/ My Game https://survivethenightgame.com/ My Courses š¤ https://agenticjumpstart.com āļø https://beginner-react-challenges.webdevcody.com Useful Links š¬ Discord: https://discord.gg/N2uEyp7Rfu š Newsletter: https://newsletter.webdevcody.com/ š GitHub: https://github.com/webdevcody šŗ Twitch: https://www.twitch.tv/webdevcody š¤ Website: https://webdevcody.com