Tech companies are locked in an all-out race to develop and deploy advanced AI systems. There’s a lot of money to be made, and indeed, plenty of opportunities to improve the world. But there are also serious risks — and racing to move as quickly as possible can make detecting and averting these risks a huge challenge.
In light of this situation, leading scientists and world governments have endorsed model policies known as “red line” commitments to conduct risk evaluation, also known as “responsible scaling policies.” These risk evaluation commitments have a few critical features:
- These policies describe, in advance of developing and deploying new AI products, specific risk thresholds that would be unacceptable to pass without adequate safeguards in place.
- These policies describe how they will use evaluations to determine whether these risk thresholds have been met, and include commitments to share the results of these evaluations publicly.
- These policies describe what safety mitigations must be in place before proceeding in the continued development of advanced models, and before the continued internal and external deployment of such models. When these safety mitigations have not been adequately implemented, the company commits to pause and focus on this.
So how are companies doing when it comes to implementing these policies? This post will share everything we know so far. But first, a disclaimer: our analysis is not comprehensive, nor does it address the relative strengths of individual policies — for that, we recommend in-depth resources including the scorecards put out by the Leverhulme Centre for the Future of Intelligence, AI Lab Watch, and SaferAI.
Instead, we sought to answer a much simpler question: Which companies have actually implemented any form of a “red line” policy, or have made clear their plans to do so?
Released a “red line” risk evaluation policy:
The good news is that things are currently trending in the right direction. Since Anthropic released its responsible scaling policy late last year (the first of its kind), others have followed suit. OpenAI, Google, and Magic.dev have all released similar risk evaluation policies.
This doesn’t mean that these commitments are all perfect, or even sufficient. The aforementioned report from SaferAI compared OpenAI’s “Preparedness” policy to Anthropic’s “Responsible Scaling Policy,” and found that both frameworks “miss some key parts of risk management”.
Google and Magic.dev, on the other hand, have released policies that explicitly leave the details to be filled in at a later date. Google’s Frontier Safety Framework is targeting early 2025 as a date for full implementation, while Magic.dev’s AGI Readiness Policy will only be implemented once their models “exceed a threshold of 50% accuracy on LiveCodeBench,” a popular coding capability evaluation, or else when they reach critical thresholds on private, internal evaluations.
Despite these flaws, it’s clear that things have been moving in the right direction. Not only have these four companies implemented initial risk evaluation policies, but many more have promised to do so.
Committed to release a “red line” risk evaluation policy:
On May 21, 2024, the governments of the UK and South Korea announced that they had secured commitments from several leading AI companies in the United States, China, and the United Arab Emirates to implement “red line” risk evaluation policies.
The full text of the commitments can be read here. In short, the companies agreed to conduct risk assessment, pre-define acceptable risk thresholds for future models, pre-describe the necessary safety mitigations that must be in place for each threshold, describe how they will evaluate whether their models have reached each threshold, and continually monitor and update these practices as needed.
In addition, the companies also committed to providing public transparency concerning their progress on each of the above commitments, allowing governments, academia, nonprofits (like The Midas Project), and the general public to assess the progress that they are making.
These commitments are still quite recent, and we have yet to see any of the companies mentioned release a policy that wasn’t already in place before the commitments were signed. Much will depend on how well these tech giants can follow through with their promise; but The Midas Project will be watching them every step of the way, ready to call out shortcomings if and when they arise.
Failed to release, or even publicly discuss, a “red line” risk evaluation policy:
Despite this progress that is broadly being made in the AI industry, there are still laggards. Our analysis found that one company in particular — Cognition AI (developers of Devin) — has fallen behind the rest of the industry and failed to meet, or even to discuss, the risk evaluation standards that experts have endorsed.
This is why The Midas Project is running a public awareness campaign calling upon Cognition to release an industry-standard “red line” risk evaluation policy. If you agree, consider signing our petition or sharing our campaign on social media.