Will o1 score ≥60% on the REBUS benchmark?
Plus
3
Ṁ250Dec 30
57%
chance
1D
1W
1M
ALL
Update 2024-22-12 (PST): This market refers to the REBUS benchmark as described in the paper "REBUS: A Benchmark to Evaluate the Rationality of Language Models" (AI summary of creator comment)
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
@derikk after looking at the examples and not getting any correct and then seeing 83% as the human baseline I felt really bad till I read that humans were allowed to Google and use reverse image search.
Related questions
Related questions
What will be the best normalized score achieved on the original 7 RE-Bench tasks by December 31st 2025?
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
55% chance
Will GPT4/Opus report >50% score on ARC in 2024?
13% chance
What score will o1-pro achieve on FrontierMath?
Will an AI SWE model score higher than 50% on SWE-bench in 2024?
20% chance
Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?
43% chance
Will an AI score over 30% on FrontierMath Benchmark in 2025
85% chance
Before 2028, will any AI model achieve the same or greater benchmarks as o3 high with <= 1 million tokens per question?
48% chance
Will an AI score over 80% on FrontierMath Benchmark in 2025
10% chance
Will a Grok AI get >90% on ARC in 2024?
4% chance