Loading video player...
š Access our AI Builder course & join hundreds of serious AI builders in our community https://www.theaiautomators.com/?utm_source=youtube&utm_medium=video&utm_campaign=tutorial&utm_content=anthropic-harness-v2 š Anthropic Article: https://www.anthropic.com/engineering/harness-design-long-running-apps Anthropic's engineering team just published a deep dive on harness design for long-running agents. And buried in the technical details are some honest admissions and crucial insights that apply to anyone building multi-step AI systems, not just coding agents. The core problem: when you ask an AI agent to evaluate its own work, it approves it. Confidently. Almost every time. Even when the output is mediocre. Their solution borrows from the GAN architecture: separate the agent doing the work from the agent judging it. One generates, one evaluates, and the tension between them drives quality upward. They demonstrated this with a 2D retro game maker (6 hours, fully autonomous) and a Digital Audio Workstation (nearly 4 hours on Opus 4.6), both built without human intervention. In this video, we break down the two failure modes they identified (context anxiety and poor self-evaluation), the 3-agent architecture they built to solve them (planner, generator, evaluator), and how you can apply these principles to your own systems, whether that's contract review, research pipelines, content generation, or data analysis. š Links: AI Builder Series Episode 6 (Harness Engineering): https://www.youtube.com/watch?v=I2K81s0OQto GitHub Repo (PRDs): https://github.com/theaiautomators/claude-code-agentic-rag-series Anthropic Article from November 2025: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents Opus 1 million Context Window: https://www.anthropic.com/news/claude-opus-4-6 Stripe Minions: https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents-part-2 š This connects directly to our AI Builder series where we're building specialized harnesses into a full Python and React app using Claude Code. #AI #HarnessEngineering #AIAgents #Anthropic #LongRunningAgents #AdversarialEvaluation #ClaudeCode #AgenticRAG #LLM #AIBuilder #ContextEngineering #MultiAgentSystems