Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Ṁ600Ṁ25k

resolved Sep 6

Resolved

ALL

Market context

Technology

Technical AI Timelines

LLMs

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ774
2		Ṁ616
3		Ṁ383
4		Ṁ335
5		Ṁ302

People are also trading

In what year will AI achieve a score of 95% or higher on the PhysBench leaderboard?

2036

Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?

39% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

32% chance

Will Anthropic’s next Sonnet model exceed 65% on terminal bench?

10% chance

Will any AI model score above 95% on ARC-AGI-2 by end of 2026?

72% chance

Will a frontier model score above 90% on the APEX-SWE benchmark before 2028?

48% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

74% chance

In what year will AI achieve a score of 95% or higher on the PutnamBench leaderboard?

7/1/28

In what year will AI achieve a score of 95% or higher on the SWE-bench Verified benchmark?

12/10/26

When will any model achieve >=human performance on QuALITY?

Sort by:

Shocking - 56.7% for GPT 5(High)

Postive(+)

GPT5 - Hallucination rate is down ~80% across the board

GPT-5 dominates the Text Arena, ranking #1 in every major category: Hard Prompts,Coding ,Math,Long Queries

The model will do well.

My prediction is 73% +or- 2% ( Much better than i thought). My end of year optimistic estimate on simple bench was 74% just last month.

I think worse case the model does 68-69% and my best case is around 78%

we will likely have an answer to this market within a week.

Rumor : gpt 5 got 90%. Someone tested it through copilot.

I’m going to come out and say the model won’t break 80%.

We will likely know the answer in1-2 weeks

@Mad2live That test was done on the 10 public questions, not the private dataset.

Do we have any info on GPT 5's knowledge cutoff date? Could it possibly have the public questions in its training data?

@TiagoChamba That is what I’m thinking, plus I think i remember another modeling getting 80% on that public version and failing to break 63%

The best model will probably be around 72% give or take 2%.

I like the odds for EOY tho

bought Ṁ175 NO

i'd bet even odds by EOY, but it's highly unlikely by september 1st.

The human baseline is now 83.7%. Unfortunate that the old baseline is the name but I will resolve to true if any model exceeds the human baseline published on https://simple-bench.com.

bought Ṁ50 YES

@HenryGeorge You can edit the name. Hover over it and a pen button will appear.

@NeuralBets done thx

We have a new reported human baseline. (83.7%) Is this a question about 92% or about the human level?

@MikhailDoroshenko human baseline

bought Ṁ250 NO

Seems unlikely without a major paradigm shift. 27% is sota and it doesn't seem to be increasing much with successive model generations

Is it true that this benchmark can be anything, and can be changed at any point? There are no hashes, no large sample of problems, no error bars, no evaluation code, no specifics on what a model can or cannot use... How do we know what a true performance is, except what the author says?