What SimpleBench percentile range will full o1 achieve?

Basic

Ṁ76

Dec 2

28%

40s

42%

50s

19%

60s

11%

70s

SimpleBench is a multiple-choice text benchmark designed for large language models (LLMs) to evaluate their ability to compete with a human baseline of unspecialized (high school level) knowledge across various domains. The current human baseline is 83.7%, while the top-scoring LLM, o1-preview, achieved a score of 41.7%. Notably, the other variant of o1, o1-mini, scored 18.1%.

Q: When the full version of o1 is released and tested on SimpleBench's questions, which range do you expect its performance score to fall within: 40s, 50s, 60s, or 70s range?

Note: The resolution will be marked as N/A if, hypothetically, it falls below the 40s percentile or exceeds the 70s percentile range. Additionally, if full o1 is not released and/or its benchmark score for this test has not been publicly disclosed by December 1, 2024, the resolution will also be designated as N/A.

This question is managed and resolved by Manifold.

#️ Technology

#AI

#OpenAI

Get

1,000

and

3.00

5 Comments

4 Holders

12 Trades

Sort by:

Ex-OpenAI CTO Murati’s New Team Takes Shape — The Information
The full release of o1 is expected by the end of this year, so it seems that "soon" isn't as soon as one might think for OpenAI. This market will likely end up as unresolved or N/A.

you mean "percentage range" not percentile range

@JoshYou yeah typo.

You're not giving nearly enough time for the score to be disclosed. It took them a while to add o1-preview's score

@JaundicedBaboon I had a belief or prediction that the full release of o1 will occur on or before the 15th. If they release it on the 15th, they will still have two weeks to test o1 and share the results by December 1st or earlier, which seems sufficient time. Although, there remains a possibility that this timeline may not be met, and that is a risk you will need to consider, as I am unable to alter the date.

Related questions

Related questions