< Back

Watchtower
Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.
Date:
Moderate
In their blog post, Google describes v3 of their Frontier Safety Framework as a strengthening of the policy.
And in some ways, it is: they define a new harmful manipulation risk category, and they even soften the claim from v2 that they would only follow their promise if every other company does so as well.
But it's weakened in other ways.
Critical capability levels, which previously focused on capabilities (e.g. "can be used to cause a mass casualty event") now seems to rely on anticipated outcomes (e.g. "resulting in additional expected harm at severe scale")
Similarly, for ML R&D, models that "can" accelerate AI development no longer require RAND SL 3. Only models that have been used for this purpose count. But this is a strange ordering -- shouldn't the safeguards precede the deployment (and even the training) of such a model?
Additionally, as pointed out by Zach Stein-Perlman, the CCLs for misalignment, which used to be a concrete (albeit initial) approach, are now described as "exploratory" and "illustrative."
Remember that in 2024 Google promised to *define* specific risk thresholds, not explore illustrative examples.
On the whole, it's good that Google is continuing to update its risk management policies, and they seem to treat the issue with much more seriousness than some competitors.
Read the full diff below:
A diff of the changes can be found below: