
Watchtower
Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.
< Back
Date:
Anthropic
Change
Moderate
Anthropic updated its Responsible Scaling Policy (RSP) from v3.2 to v3.3, two days before the release of Opus 4.8. In Section 1, “Our Recommendations for Industry-Wide Safety,” this update included a changed description in the “Capability or usage threshold” for “novel chemical/biological weapons production,” shifting the framing from:
AI that can significantly help moderately resourced, expert-backed teams to AI that can substitute for the small number of world-leading specialists who are currently the primary bottleneck.

The mitigations for triggering this threshold remain unchanged. It’s notable that the RSP changelog calls this a “revision,” while the Opus 4.8 system card frames it as a “clarification of the intent of our earlier threshold.” The latter is a generous characterization, given that the new threshold sits at a meaningfully higher capability level than the previous version and is harder to trigger.
There were also changes to Section 3.1, Risk Reports: Scope and Timing. The criteria for including internally deployed models in a Risk Report are now structured around two explicit conditions: (1) the model could pose risks related to high-stakes misalignment or automated R&D, and (2) those risks significantly exceed those of models covered by a prior Risk Report. Additionally, v3.3 names and refines “off-cycle updates,” analyses that Anthropic publishes between Risk Reports when a model’s risk profile changes significantly.
V3.3 also makes minor terminology changes, notably renaming the third capability-threshold row from “High-stakes sabotage opportunities” to “Misaligned AI systems in high-stakes settings.” The threshold text itself is unchanged.
