2026-07-01

Anthropic, Amazon, Microsoft & Google Propose Shared AI Jailbreak Severity Scale

Summary: Anthropic, Amazon, Microsoft, and Google jointly proposed a standardized rubric for rating AI jailbreak severity, timed to coincide with Fable 5's global return.

Key Facts

Four axes score each jailbreak: (1) capability gain beyond existing tools; (2) breadth of that gain across task types; (3) ease of weaponization — single prompt vs. multi-step specialist effort; (4) discoverability — niche knowledge vs. already circulating online
Modeled on CVSS, the Common Vulnerability Scoring System used by security teams to prioritize CVE patches
The Fable 5 jailbreak that triggered the US export ban was retrospectively scored under the rubric to validate the framework
Glasswing is expanding its coordinated disclosure program; other labs invited to participate

Why It Matters

When a single jailbreak can prompt a government to suspend a global AI service overnight, the absence of a shared severity language is a governance gap. A CVSS equivalent for AI jailbreaks gives regulators, enterprises, and labs a common vocabulary — reducing the risk that future incidents escalate to sweeping shutdowns before technical severity is properly assessed. If widely adopted, it could become the baseline for AI vulnerability disclosure standards.

Expanding Project Glasswing — Anthropic
Cross-Lab Jailbreak Rubric — AI Weekly

Anthropic, Amazon, Microsoft & Google Propose Shared AI Jailbreak Severity Scale

Key Facts

Why It Matters

Read More

매주 핵심 AI 소식, 한 번에 받기