2026-06-18

OpenAI Launches LifeSciBench: Top Models Pass Only 1 in 3 Life Science Tasks

One-liner: OpenAI released LifeSciBench, an expert-authored benchmark with 750 tasks spanning seven biological domains — and the best available models answer only about one in three correctly.

Key Facts

750 tasks structured across 7 life-science domains (drug discovery, genomics, molecular biology, and more) × 7 experimental workflows
Created by 173 scientists at biotechnology and pharmaceutical companies, each task reviewed by additional domain experts
Each task pairs a research prompt with supporting artifacts (papers, datasets) and an expert grading rubric
Current best-model pass rate: ~33% — deliberately set at a bar reflecting real research complexity, not textbook recall
Simultaneously: OpenAI and Molecule.one demonstrated GPT-5.4 as a near-autonomous AI chemist, successfully optimizing a challenging medicinal chemistry reaction

Why It Matters

Most prior science benchmarks test knowledge retrieval, not research judgment. LifeSciBench tests decisions a working scientist would actually face in the lab. A 33% ceiling from today's frontier reveals a wide gap between "reads the literature" and "advances the science" — and gives biotech and pharma teams a concrete yardstick for when AI tools graduate from literature assistants to genuine research partners.

Introducing LifeSciBench — OpenAI
LifeSciBench detailed analysis — MarkTechPost

OpenAI Launches LifeSciBench: Top Models Pass Only 1 in 3 Life Science Tasks

Key Facts

Why It Matters

Read More

매주 핵심 AI 소식, 한 번에 받기