2026-06-30

Gemini 2.5 Pro Deep Think Sets New Records on Science and Reasoning Benchmarks, Surpassing Fable 5 and GPT-5.5

Summary: Google's Gemini 2.5 Pro with Deep Think debuted on June 22 with the highest published benchmark scores of any public model on science and reasoning, capitalizing on Fable 5's ongoing suspension to claim the top spot.

Key Facts

GPQA Diamond: 82.4% — beats Claude Fable 5 (79.1%) and GPT-5.5 (76.3%), new public-model high
MMLU-Pro: 89.8% — highest score by any publicly available model
Also leads on Humanity's Last Exam (hardest multi-disciplinary benchmark) and LiveCodeBench V6
2 million token context window — enables ingesting full codebases, hours of video, or months of conversation in a single session
Currently available to Google AI Ultra subscribers ($250/month); developer API access coming soon
Deep Think uses parallel thinking (extended inference with multiple simultaneous chains)

Why It Matters

Google timed this launch to close the gap while Fable 5 remains suspended under export controls. The benchmark lead is real, but the moat is narrow — scoring above Fable 5 on GPQA Diamond doesn't automatically translate to agentic or coding workflow dominance where Fable 5 led before its suspension. The 2M token context is a genuine hardware-backed advantage; many competing models top out at 1M. The key caveat: Deep Think remains locked behind a $250/month paywall with no confirmed API GA date, limiting developer adoption while the benchmark halo lasts.

Google: Deep Think now rolling out — Google
Benchmark deep-dive — FAQ.com
Gemini 2.5 Pro complete guide — Ortemtech

Gemini 2.5 Pro Deep Think Sets New Records on Science and Reasoning Benchmarks, Surpassing Fable 5 and GPT-5.5

Key Facts

Why It Matters

Read More

매주 핵심 AI 소식, 한 번에 받기