The Midas Project

Donate

The Midas Project

Watchtower

Follow on X | Follow on Bluesky | Submit a Change

Follow on X | Follow on Bluesky

Submit a Change

Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.

< Back

Date:

Feb 5, 2026

Anthropic

Violation

Moderate

The evaluation of Opus 4.6, conducted under Anthropic’s voluntary Responsible Scaling Policy (v2.2), found that its qualitative tests for “AI R&D-4” — the ability for a model to “fully automate the work of an entry-level remote-only researcher” — were “saturated,” meaning the benchmarks were too easy to give a clear safety signal.

Rather than developing more rigorous quantitative evaluations, Anthropic did an internal survey of 16 employees. As pointed out by Noam Brown, an OpenAI researcher, on Twitter, asking your own employees whether your product needs additional safety measures before it is released is a questionable substitute for rigorous evaluation.

The way Anthropic conducted the survey also raises further concerns. Five of the 16 survey respondents initially indicated that stronger safeguards might be needed. Anthropic followed up with those five employees, asking them to “clarify their views.” The system card released with Opus 4.6 does not mention any follow-up with the other eleven respondents, the ones whose answers already pointed to the outcome Anthropic wanted.

When you only follow up with people who gave you an inconvenient answer to ask them to clarify their views, you are systematically biasing the results in one direction. Whether or not Anthropic intended to bias the outcome, the process they describe is flawed. At a minimum, the survey should have included external experts as a substitute for inadequate quantitative evaluations.

Anthropic ultimately implemented stronger protections for Opus 4.6 as a precautionary measure. This evaluation was conducted under the RSP, which remains voluntary. Anthropic’s legally enforceable Frontier Compliance Framework does not include the RSP’s structure for binding specific safeguards to specific capabilities thresholds, and thus these issues are not subject to regulatory enforcement under California’s SB 53.

The Midas Project

About

News

Watchtower

Projects

Volunteer

The Midas Project

Watchtower

Anthropic