Will o1 score ≥60% on the REBUS benchmark?

5

Ṁ1kṀ1.9k

resolved Mar 11

Resolved

YES

1H

6H

1D

1W

1M

ALL

Update 2024-22-12 (PST): This market refers to the REBUS benchmark as described in the paper "REBUS: A Benchmark to Evaluate the Rationality of Language Models" (AI summary of creator comment)

Market context

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ510
2		Ṁ101
3		Ṁ83

People are also trading

Best 8-hour AI score on RE-Bench >= 0.8 by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2028?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

Will OpenAI's o4 get above 50% on humanity's last exam?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2028?

In what year will AI achieve a score of 95% or higher on the PutnamBench leaderboard?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Sort by:

I'll probably try running this this week if I can automate the web interaction (unless the API comes out before then)

Referring, of course, to the famous https://arxiv.org/abs/2401.05604

bought Ṁ100 YES

For reference, the release version of 4o scored 42%, and the human baseline is 83%.

@derikk after looking at the examples and not getting any correct and then seeing 83% as the human baseline I felt really bad till I read that humans were allowed to Google and use reverse image search.

People are also trading

Best 8-hour AI score on RE-Bench >= 0.8 by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2028?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

Will OpenAI's o4 get above 50% on humanity's last exam?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2028?

In what year will AI achieve a score of 95% or higher on the PutnamBench leaderboard?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

Related questions

Best 8-hour AI score on RE-Bench >= 0.8 by what year?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2030?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 85% or higher on the Humanity's Last Exam benchmark before 2028?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2027?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2030?

Will OpenAI's o4 get above 50% on humanity's last exam?

Will Al achieve 95% or higher on the Humanity's Last Exam benchmark before 2028?

In what year will AI achieve a score of 95% or higher on the PutnamBench leaderboard?

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

© Manifold Markets, Inc.•Terms•Privacy