Anthropic, Amazon, Microsoft & Google Propose Shared AI Jailbreak Severity Scale
Summary: Anthropic, Amazon, Microsoft, and Google jointly proposed a standardized rubric for rating AI jailbreak severity, timed to coincide with Fable 5's global return.
Key Facts
- Four axes score each jailbreak: (1) capability gain beyond existing tools; (2) breadth of that gain across task types; (3) ease of weaponization — single prompt vs. multi-step specialist effort; (4) discoverability — niche knowledge vs. already circulating online
- Modeled on CVSS, the Common Vulnerability Scoring System used by security teams to prioritize CVE patches
- The Fable 5 jailbreak that triggered the US export ban was retrospectively scored under the rubric to validate the framework
- Glasswing is expanding its coordinated disclosure program; other labs invited to participate
Why It Matters
When a single jailbreak can prompt a government to suspend a global AI service overnight, the absence of a shared severity language is a governance gap. A CVSS equivalent for AI jailbreaks gives regulators, enterprises, and labs a common vocabulary — reducing the risk that future incidents escalate to sweeping shutdowns before technical severity is properly assessed. If widely adopted, it could become the baseline for AI vulnerability disclosure standards.
Read More
- Expanding Project Glasswing — Anthropic
- Cross-Lab Jailbreak Rubric — AI Weekly