Black lighthouse silhouette graphic – visual asset for The Midas Project Watchtower page.

Watchtower

Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

< Back

Date:

May 14, 2025

May 14, 2025

May 14, 2025

Anthropic

Violation

Moderate

Unannounced

Anthropic's original 2023 RSP committed to "define ASL-4 evaluations before we first train ASL-3 models," including both capability thresholds and "warning sign evaluations." Anthropic removed this commitment in RSP version 2.1, which was noted in the PDF but not the changes they shared on the web.

When Claude Opus 4 was deployed under ASL-3, Anthropic stated that capability thresholds for ASL-4 are now defined. However, the RSP still lacks the warning sign evaluations originally promised. Additionally, the AI R&D-5 threshold triggers ASL-4 security but says nothing about ASL-4 deployment standards—a notable gap given that models with advanced AI R&D capabilities would be highly tempting to deploy internally, and internal deployments are in scope for deployment standards under Anthropic's RSP.

Nine days before announcing Claude Opus 4 as an ASL-3 model, Anthropic revised their RSP to lower ASL-3 security requirements. The change removed the requirement to be robust against employees with access to "systems that process model weights" attempting to steal model weights—potentially a large fraction of technical staff. Anthropic justified the change by arguing model theft isn't central to CBRN-3 or AI R&D-4 risks. Critics argue the timing suggests potential prior noncompliance, and that security remains critical as models approach ASL-4 thresholds.