
Watchtower
Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.
< Back
Date:
Change
Moderate
Google DeepMind updated its Frontier Safety Framework from version 3.0 to 3.1, in a change announced on its website.
FSF v3.1 introduces Tracked Capability Levels (TCLs), which are intended to capture risks that may appear at a lower level of capabilities than the FSF’s “Critical Capability Levels” (CCLs). As with CCLs, models reaching TCLs will be subject to residual risk assessments (assessments of model risks after safeguards are in place) and will only be deployed externally once these risks are deemed acceptable. (For ML R&D and misalignment risks, this also applies to “high-risk” internal deployments.) However, TCLs (unlike CCLs) don’t require a safety case (a formal argument that risks have been reduced to an acceptable level).
In FSF v3.0, risks from misalignment were considered in an "exploratory approach” section that defined “illustrative” risk levels and mitigations but didn’t commit to any risk management process. In FSF v3.1, misalignment risks are described alongside ML R&D risks as an area triggering risk assessments and mitigations. A misalignment TCL is defined when models possess sufficient understanding of their deployment context and ability to circumvent oversight such that “absent additional mitigations, we cannot rule out the model significantly undermining human control.” It’s notable that this risk only rises to the level of a TCL, and not a CCL (and therefore doesn’t require a safety case).
FSF v3.1 also expands on Google DeepMind’s risk assessment and management process (section 1.3). However, there are some meaningful changes: critical capability assessments (assessing how close models are to T/CCLs) are not required for certain external deployments deemed “low risk” (e.g. for a small number of trusted testers). With regards to risk mitigations, v3.1 also adds that “the specific mitigations we implement may be determined when a T/CCL is reached, informed by the threat landscape at that time,” suggesting that the specific mitigations tied to T/CCLs are not locked in and could change over time.
FSF v3.1 also adds a brief section on “Governance and Accountability,” saying “We have in place a well-established and comprehensive internal governance structure designed to ensure the robust implementation of the processes outlined in this Frontier Safety Framework.” It offers no specific details on which personnel or bodies are responsible for this. Google DeepMind’s website identifies the Responsibility and Safety Council and the AGI Safety Council as bodies responsible for responsible AI development, though they aren’t named in the FSF.
A diff of the changes can be found below:
