본문으로 건너뛰기
All news

OpenAI Launches LifeSciBench: Top Models Pass Only 1 in 3 Life Science Tasks

One-liner: OpenAI released LifeSciBench, an expert-authored benchmark with 750 tasks spanning seven biological domains — and the best available models answer only about one in three correctly.

Key Facts

  • 750 tasks structured across 7 life-science domains (drug discovery, genomics, molecular biology, and more) × 7 experimental workflows
  • Created by 173 scientists at biotechnology and pharmaceutical companies, each task reviewed by additional domain experts
  • Each task pairs a research prompt with supporting artifacts (papers, datasets) and an expert grading rubric
  • Current best-model pass rate: ~33% — deliberately set at a bar reflecting real research complexity, not textbook recall
  • Simultaneously: OpenAI and Molecule.one demonstrated GPT-5.4 as a near-autonomous AI chemist, successfully optimizing a challenging medicinal chemistry reaction

Why It Matters

Most prior science benchmarks test knowledge retrieval, not research judgment. LifeSciBench tests decisions a working scientist would actually face in the lab. A 33% ceiling from today's frontier reveals a wide gap between "reads the literature" and "advances the science" — and gives biotech and pharma teams a concrete yardstick for when AI tools graduate from literature assistants to genuine research partners.

Read More

뉴스레터 구독

무료 뉴스레터

매주 핵심 AI 소식, 한 번에 받기

쏟아지는 AI·LLM 뉴스 중 꼭 알아야 할 것만 골라 메일로 보내드려요. 뉴스레터 발송이 시작되면 구독자분들께 가장 먼저 보내드립니다.