
Watchtower
Watchtower tracks changes to corporate and government AI safety policies, both announced and unannounced. Click any entry for details.
< Back
Date:
Feb 12, 2026
Violation
Moderate
Google released an updated version of Gemini 3’s “Deep Think” mode, a specialized reasoning mode marketed for breakthrough performance on complex scientific and engineering tasks. The numbers were striking, particularly the score of 84.6% on ARC-AGI-2 (a benchmark featuring abstract reasoning puzzles that LLMs have historically struggled with), compared to 31.1% on the weaker Gemini 3 Pro Preview.
Google’s Frontier Safety Framework (FSF) commits to evaluations whenever subsequent versions of a model introduce "meaningful new capabilities or a material increase in performance." A score that more than doubles its predecessor's should meet that bar.
When initially asked whether this release warranted new safety disclosures, a Google spokesperson told The Midas Project, “Gemini 3 Deep Think is a mode that uses the Gemini 3 Pro, and therefore our Gemini 3 Pro model card and safety evaluations apply to its capabilities." The spokesperson added that while performance gains were evident, "these gains did not introduce any additional risks or cross any FSF thresholds beyond the evaluations which were run in December." When we followed up, a spokesperson later added that “[Google] conducted additional safety evaluations that confirmed that the model was safe to launch.”
The suggestion that the updated version of Deep Think was based on runtime improvements of the 3 Pro model alone appeared to be false. As the Feb. 19 release of Gemini 3.1 Pro later revealed, the updated Deep Think was not a mode running on the existing Gemini 3 Pro, it was running on an entirely new model. For seven days, users were running an unannounced frontier model without the disclosure of safety evaluation results that Google had promised, and with Google spokespeople privately pointing to the evaluation of its predecessor.
The model card for Gemini 3.1 Pro was thin. Zvi Mowshowitz described it as “offering a quick summary that mostly is ‘nothing to see here.’” Google reported, “Following FSF protocols, we conducted a full evaluation of Gemini 3.1 Pro (focusing on Deep Think mode). We found that the model remains below alert thresholds for the CBRN, harmful manipulation, machine learning R&D, and misalignment CCLs. As previous models passed the alert threshold for cyber, we performed more additional testing in this domain on Gemini 3.1 Pro with and without Deep Think mode, and found that the model remains below the cyber CCL.” No more supporting details were provided.
