Loading video player...
As artificial intelligence models become increasingly proficient at writing and analyzing code, their ability to interact with public blockchains presents both significant security enhancements and severe financial risks. To measure these emerging capabilities, researchers have introduced EVMbench, a comprehensive evaluation framework designed to assess how well frontier AI agents can detect, patch, and exploit vulnerabilities within Ethereum smart contracts. The benchmark operates across three distinct modes, requiring agents to audit codebases for hidden flaws, modify vulnerable code while maintaining intended functionality, and execute end-to-end attacks against a simulated live blockchain environment. Recent evaluations using EVMbench demonstrate that advanced models are already capable of discovering and successfully executing complex exploits, underscoring the critical need to continuously monitor AI development to safeguard the massive financial resources currently managed by decentralized infrastructure. https://cdn.openai.com/evmbench/evmbench.pdf https://github.com/openai/frontier-evals