has finally released a risk evaluation policy. How does it measure up?

Big news: the AI coding startup has released a new risk evaluation policy this week. Referred to as their “AGI Readiness Policy” and developed in collaboration with the nonprofit METR, this announcement follows in the footsteps of Responsible Scaling Policies (RSPs) released by companies like Anthropic, OpenAI, and Google Deepmind. So how does it stack up?

It’s less of a policy, and more of a preview.

The first thing we noticed is that it’s not a full risk evaluation policy — at least not yet. Magic’s policy is effectively a precommitment to implement a full RSP once its models are sufficiently capable.

In some ways, they are in good company. Virtually every risk evaluation policy released thus far has explicitly been a work-in-progress, with details left to be filled in at a future date. 

There’s a good reason for this — we don’t know exactly what the risk landscape for frontier AI systems will be in the coming years, nor what the best ways to evaluate and mitigate those risks are.

However, Magic’s policy is the least detailed we’ve seen so far. This decision was likely made because they are further behind the “AI frontier” — that is, the cutting-edge of large-scale highly-capable AI models — than any other company that has released a policy of this sort.

They’ve chosen two thresholds that will indicate that they are at the frontier, and trigger the need to implement a full RSP, akin to what we’ve seen from other companies:

  1. The first is achieving 50% on LiveCodeBench, an evaluation that measures the performance of AI models in writing, executing, debugging, and understanding code. 
  1. The second is “privately specified thresholds” on a set of “private benchmarks” used internally within the company. If this criterion sounds unusually opaque… that’s because it is. While there are good reasons to keep some safety practices private for security purposes, it’s not clear why Magic can’t be more open about the full set of evaluations that would entail releasing an RSP.

So, how does the rest of the pre-commitment hold up? We’ve analyzed it and found a number of strengths and weaknesses. Here’s our results:

The good: They are taking catastrophic risks seriously, plan to conduct risk evaluation, and are working with third parties.

The AGI Readiness Policy opens by affirming that the Magic team believes “AI development poses the possibility of serious negative externalities on society, including catastrophic risks to public security and wellbeing.”

Like it or not, these risks are real — not science-fiction — and so it’s important to say these things out loud. Magic gets high marks for even acknowledging the significant and catastrophic risks that this technology can pose to society. Their policy explicitly names four threat models that are understood by safety experts to be particularly relevant — cyber attacks, accelerating AI research, autonomous replication, and assisting bad actors in developing biological weapons.

In addition, they correctly recognize the importance of conducting risk evaluation to monitor their models for such dangers. Their policy reads: “Prior to publicly deploying models that exceed the current frontier of coding performance, we will evaluate them for dangerous capabilities and ensure that we have sufficient protective measures in place to continue development and deployment in a safe manner.”

This is a critical component of any AI lab’s safety policy. We would like to see them extend these sorts of commitments to cover non-public (internal or limited access) deployment as well, though this language may be coming in the full policy. We’d also like to see them clarify what these evaluations look like, how they will detect not only dangerous behavior but warning signs of such behavior, and precisely which safety mitigations will be required to address which specific risks and then continue development. We hope this will be present in the full policy.

Finally, it’s good to see that Magic has been working with third parties. They mentioned that their current policy was developed in collaboration with METR, a nonprofit that evaluates frontier AI systems for dangerous risks. They also say third parties may be involved in monitoring whether the thresholds to implement a full RSP have been met and in deciding to change those thresholds. While voluntary commitments aren’t binding, and wishy-washy language like this definitely isn’t binding, it’s good to see that Magic has been working with third-party safety experts and we hope to see this continue in the future.

The bad: Their risk thresholds are extremely high, details are sparse, and there isn’t much information about what they will start doing today.

Despite these strengths, there are lots of flaws in the AGI Readiness Policy in its current state.

Perhaps the most glaring flaw is the extremely high risk thresholds that the policy specifies. The document says that the “critical capability thresholds” that would require safety mitigations include making cyber attacks 10x easier, causing a “dramatic acceleration” in AI progress, AI models autonomously executing cybercrimes, and/or allowing non-experts to synthesize a viable pandemic agent.

When each of these risk thresholds have been met, it may already be too late. These are examples of extreme dangers that we will face in the later stages of AGI development and adoption, and by the time such models exist, the cat may well be out of the bag.

They do say that these are “high-level, illustrative” examples. Perhaps they just intended to demonstrate what the threat models look like at their extremes, and the eventual RSP will have more near-term and practical thresholds. We hope this is the case.

This lack of clarity highlights another problem with the 1.0 version of their AGI Readiness Policy: it is very lacking in detail. The public can’t trust voluntary safety policies with no teeth. If companies want to prove that they intend to adhere to self-imposed evaluation regimes, they should give specific examples of (1) specific evaluations that will trigger the implementation of safeguards and delay of further development/deployment, (2) which specific safeguards will be implemented at each threat level, and (3) which specific supervising committees, third parties, and other accountability mechanisms will help ensure that the policy is adhered to.

Many of these details are missing in Magic’s policy. Again, this may be related to its status as a pre-commitment rather than a full-blown RSP. Still, we would have liked to see more detail in this announcement.

Our final concern is that it remains unclear what Magic is doing today to help ensure its models’ safety, aside from conducting evaluations to determine whether they have met the industry frontier. We believe that non-frontier models may still pose risks due to specific features of their training and deployment. Even if they haven’t yet developed or released a frontier model, they should still be getting in the habit of practicing risk assessment and sharing the results of such assessments with the public.

The big picture

Despite the many flaws in version 1.0 of Magic’s AGI Readiness Policy, we think this is a meaningful step forward for the startup. We would like to see more AI startups discuss these risks and implement risk evaluation policies.

In particular, The Midas Project is currently worried about the startup Cognition. They are similar to Magic in many ways — developing AI coding agents with a small team and a lot of capital — but so far, they have seemingly refused to discuss AI safety or implement any public risk evaluation policy whatsoever.

If you believe Cognition should take AI safety seriously and talk to the public about how they will conduct risk evaluation, sign our petition today. If you’re looking for more ways to get involved, join our action hub.