Black lighthouse silhouette graphic – visual asset for The Midas Project Watchtower page.

Watchtower

Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

< Back

Date:

OpenAI

Change

Major

OpenAI published a separate Frontier Governance Framework (FGF) to satisfy California’s Transparency in Frontier AI Act and the European Union’s General-Purpose AI Code of Practice (EU CoP). Up to this point, OpenAI’s existing framework was its Preparedness Framework (PF), which ties safeguards to capability thresholds and commits to halting development of a model that reaches the Critical threshold until safeguards that meet a “Critical standard” are specified. The PF remains OpenAI’s broader framework for tracking frontier AI capabilities and deciding what models are safe to develop and deploy, but it’s voluntary.

A few differences from the PF: the FGF defines risk categories — cyber offense, CBRN, loss of control, and harmful manipulation — with capability tiers for the first three. Unlike the PF, which ties required safeguards to each capability threshold, the FGF attaches no specific required mitigations to its tiers, stating that mitigations are applied "as appropriate" and that a model may be deployed where OpenAI determines residual risk falls "within acceptable levels."

Harmful manipulation is a new risk category relative to the PF, which excluded persuasion-type risks because they did not meet its definition of “severe harm.” Its inclusion in the FGF follows the EU CoP, which includes harmful manipulation as one of its specified systemic risks. The FGF defines no capability tiers for harmful manipulation, describing the area as “exploratory” and stating that these risks “may be best addressed through system level mitigations, such as post-deployment monitoring, rather than model evaluations before deployment.”

Additionally, the PF treats AI self-improvement as one of three tracked categories, while the FGF does not include a standalone self-improvement category, naming self-improvement as one pathway within loss of control. This mirrors the EU CoP, whose definition of loss of control encompasses self-improvement.

OpenAI isn’t the first frontier lab to have two separate policies. Anthropic created its Frontier Compliance Framework for regulatory compliance back in December.