In what year will AI achieve a score of 95% or higher on the PhysBench benchmark?
12
2.5kṀ428
2041
June 5, 2033
26%
2025-2026
43%
2027-2028
50%
2029-2030
50%
2031-2032
57%
2033-2034
59%
2035-2036
67%
2037-2038
69%
2039-2040

Background

PhysBench is a 10 k‑item, video‑image‑text benchmark that tests whether a vision–language model (VLM) can reason about the real‑world physics that governs everyday objects and scenes. It covers four domains—object Properties, object Relationships, Scene understanding and future‑state Dynamics—split into 19 fine‑grained tasks such as mass comparison, collision outcomes and fluid behaviour. Unlike most other benchmarks, humans still outperform AI on the PhysBench.

State of play:

• Human reference accuracy: 95.87 %

• 2024 AI accuracy (O1): 55.11 %

Why reaching human‑level on PhysBench is a big milestone:

Physics‑consistent video generation – A model that masters all four PhysBench domains should be able to create long‑form videos, ads or even feature films in which liquids pour, cloth folds and shadows move exactly as they would in the real world, eliminating today’s physics mistakes seen in AI generated videos. PhysBench is the litmus test for whether next‑generation multimodal models can move from “smart autocomplete” to physically grounded intelligence—a prerequisite for everything from autonomous robots to cinematic movies.

Resolution Criteria

This market resolves to the year bracket in which a fully automated AI system first achieves an average accuracy of 95% or higher (human‑level) on the PhysBench ALL metric.

  • Verification – The claim must be confirmed by either

    1. a peer‑reviewed paper on arXiv

    2. a public leaderboard entry on PhysBench Official Website or another credible source.

  • Compute resources – Unlimited.

Fine Print:

If the resolution criteria are unsatisfied by Jan 1, 2041 the market resolves to “Not Applicable.”

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy