In what year will AI achieve a score of 95% or higher on the PhysBench benchmark?
7
1.5kṀ134
2041
February 4, 2034
17%
Before 2027
44%
Before 2029
50%
Before 2031
50%
Before 2033
50%
Before 2035
50%
Before 2037
59%
Before 2039
59%
Before 2040

Background

PhysBench is a 10 k‑item, video‑image‑text benchmark that tests whether a vision–language model (VLM) can reason about the real‑world physics that governs everyday objects and scenes. It covers four domains—object Properties, object Relationships, Scene understanding and future‑state Dynamics—split into 19 fine‑grained tasks such as mass comparison, collision outcomes and fluid behaviour.

State of play:

• Human reference accuracy: 95.87 %

• Frontier AI as of Dec 2024 (InternVL 2.5‑38B): 51.94 %

Why reaching human‑level on PhysBench is a big milestone:

Physics‑consistent video generation – A model that masters all four PhysBench domains should be able to create long‑form videos, ads or even feature films in which liquids pour, cloth folds and shadows move exactly as they would in the real world, eliminating today’s “physics mistakes” seen in AI generated videos. PhysBench is the litmus test for whether next‑generation multimodal models can move from “smart autocomplete” to physically grounded intelligence—a prerequisite for everything from autonomous robots to cinematic movies.

Resolution Criteria

This market resolves to the year bracket in which a fully automated AI system first achieves an average accuracy of 95% or higher (“human‑level”) on the PhysBench ALL metric.

  • Verification – Must be confirmed by a peer‑reviewed or arXiv paper or an independent leaderboard entry (e.g. LM‑Eval Harness, PapersWithCode).

  • Compute resources – Unrestricted.

  • If no AI model reaches 95.9 % by 31 Dec 2041, the market resolves to “Not Applicable.”

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy