본문으로 건너뛰기
All news

Anthropic, Amazon, Microsoft & Google Propose Shared AI Jailbreak Severity Scale

Summary: Anthropic, Amazon, Microsoft, and Google jointly proposed a standardized rubric for rating AI jailbreak severity, timed to coincide with Fable 5's global return.

Key Facts

  • Four axes score each jailbreak: (1) capability gain beyond existing tools; (2) breadth of that gain across task types; (3) ease of weaponization — single prompt vs. multi-step specialist effort; (4) discoverability — niche knowledge vs. already circulating online
  • Modeled on CVSS, the Common Vulnerability Scoring System used by security teams to prioritize CVE patches
  • The Fable 5 jailbreak that triggered the US export ban was retrospectively scored under the rubric to validate the framework
  • Glasswing is expanding its coordinated disclosure program; other labs invited to participate

Why It Matters

When a single jailbreak can prompt a government to suspend a global AI service overnight, the absence of a shared severity language is a governance gap. A CVSS equivalent for AI jailbreaks gives regulators, enterprises, and labs a common vocabulary — reducing the risk that future incidents escalate to sweeping shutdowns before technical severity is properly assessed. If widely adopted, it could become the baseline for AI vulnerability disclosure standards.

Read More

뉴스레터 구독

무료 뉴스레터

매주 핵심 AI 소식, 한 번에 받기

쏟아지는 AI·LLM 뉴스 중 꼭 알아야 할 것만 골라 메일로 보내드려요. 뉴스레터 발송이 시작되면 구독자분들께 가장 먼저 보내드립니다.