Anthropic

Released v2.1 of their Responsible Scaling Policy. Key changes include:

  • Added and clarified new capability thresholds and safeguards for CBRN and AI R&D risks
  • Removed commitment to “define ASL-N+ 1 evaluations by the time we develop ASL-N models”
  • Confusingly, AI R&D thresholds which were implicitly the next capability level, ASL-3, are now labeled as “4.” This threshold is still associated with ASL-3 security mitigations.

The full diff highlighting the changes between the old and new policy can be found below: