OpenAI Launches LifeSciBench: Top Models Pass Only 1 in 3 Life Science Tasks
One-liner: OpenAI released LifeSciBench, an expert-authored benchmark with 750 tasks spanning seven biological domains — and the best available models answer only about one in three correctly.
Key Facts
- 750 tasks structured across 7 life-science domains (drug discovery, genomics, molecular biology, and more) × 7 experimental workflows
- Created by 173 scientists at biotechnology and pharmaceutical companies, each task reviewed by additional domain experts
- Each task pairs a research prompt with supporting artifacts (papers, datasets) and an expert grading rubric
- Current best-model pass rate: ~33% — deliberately set at a bar reflecting real research complexity, not textbook recall
- Simultaneously: OpenAI and Molecule.one demonstrated GPT-5.4 as a near-autonomous AI chemist, successfully optimizing a challenging medicinal chemistry reaction
Why It Matters
Most prior science benchmarks test knowledge retrieval, not research judgment. LifeSciBench tests decisions a working scientist would actually face in the lab. A 33% ceiling from today's frontier reveals a wide gap between "reads the literature" and "advances the science" — and gives biotech and pharma teams a concrete yardstick for when AI tools graduate from literature assistants to genuine research partners.
Read More
- Introducing LifeSciBench — OpenAI
- LifeSciBench detailed analysis — MarkTechPost