Black lighthouse silhouette graphic – visual asset for The Midas Project Watchtower page.

Watchtower

Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

< Back

Date:

Feb 24, 2026

Anthropic

Change

Major

Anthropic announced its Responsible Scaling Policy v3.0, which replaced the if-then commitment structure that defined previous versions. Under v2.2, if a model crossed a defined capability threshold, specific safety mitigations were mandatory. In extreme cases, Anthropic committed to delete model weights if a model was deemed too dangerous. In v3.0, Anthropic says it cannot commit to following that approach unilaterally if competitors are not doing the same. Chief Science Officer Jared Kaplan told TIME, “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.” 

Holden Karnofsky, who helped shape the new version, acknowledged that the old if-then structure created “distortive pressures” on the company’s risk assessments. He explained that if a model ever crossed the higher capability thresholds, such as AI R&D-5 or CBRN-4, it would trigger a pause or a slowdown that could be extremely damaging to Anthropic. This created, as Karnofsky said, “an enormous amount of pressure to declare our systems lack relevant capabilities.” He added, "I don't think we have actually made unreasonable calls, but I have felt the pressure and wish we weren't in that world." 

RSP v3.0 replaces hard commitments with publicly declared goals that Anthropic will grade its own progress towards. This, combined with the December 2025 decision to submit the lighter Frontier Compliance Framework for compliance with laws like SB 53 (see Dec 22, 2025), results in Anthropic’s enforceable framework containing few hard commitments, and now its voluntary framework no longer contains them either.