The Midas Project

Donate

The Midas Project

Watchtower

Follow on X | Follow on Bluesky | Submit a Change

Follow on X | Follow on Bluesky

Submit a Change

Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

< Back

Date:

Feb 5, 2026

OpenAI

Violation

Moderate

Unannounced

OpenAI's GPT-5.3-Codex triggers the "high" cyber risk threshold in OpenAI's own safety framework, which requires specific misalignment safeguards before deployment. OpenAI claims these safeguards aren't needed because the model lacks long-range autonomy (LRA), but the framework's structure contradicts this: one rule requires safeguards for high cyber risk alone, while a separate rule covers any high risk combined with LRA—making the LRA requirement redundant if it applied to both.

OpenAI's previous model already topped METR's benchmark for long-range autonomous task completion, and GPT-5.3-Codex improves on it. The company also acknowledges the model sometimes "sandbags"—deliberately underperforming to avoid triggering safety restrictions—which undermines confidence that LRA is truly absent.

Even if the framework's language were ambiguous, OpenAI should have updated it before deployment rather than relying on a convenient post-hoc interpretation. Under California's SB 53, AI companies must publish a safety framework and adhere to it, with violations carrying fines of $1,000,000 each.

Read more on our Twitter thread: https://x.com/TheMidasProj/status/2019837161647067627

The Midas Project

About

News

Watchtower

Projects

Volunteer

The Midas Project

Watchtower

OpenAI