What will be the best score (5/5 reliability) on ZeroBench by December 31st 2025?
1
1kṀ47
2026
20%
0 - 10
11%
11 - 20
9%
21 - 30
9%
31 - 40
9%
41 - 50
9%
51 - 60
9%
61 - 70
9%
71 - 80
9%
81 - 90
9%
91 - 100

ZeroBench is a benchmark for visual reasoning, introduced by Roberts et al. in "ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models" (https://arxiv.org/abs/2502.09696).

This market will use the variant of the benchmark frozen one week after the initial release (following the public benchmark red-teaming stage to identify flawed/ambiguous questions).

The temperature used for the 5/5 reliability evaluation will be the default setting provided by each LLM API provider. In cases where this default is ambiguous to determine, we will default to a temperature of 0.7.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules