SimpleBench is a multiple-choice text benchmark designed for large language models (LLMs) to evaluate their ability to compete with a human baseline of unspecialized (high school level) knowledge across various domains. The current human baseline is 83.7%, while the top-scoring LLM, o1-preview, achieved a score of 41.7%. Notably, the other variant of o1, o1-mini, scored 18.1%.
Q: When the full version of o1 is released and tested on SimpleBench's questions, which range do you expect its performance score to fall within: 40s, 50s, 60s, or 70s range?
Note: The resolution will be marked as N/A if, hypothetically, it falls below the 40s percentile or exceeds the 70s percentile range. Additionally, if full o1 is not released and/or its benchmark score for this test has not been publicly disclosed by December 1, 2024, the resolution will also be designated as N/A.
Ex-OpenAI CTO Murati’s New Team Takes Shape — The Information
The full release of o1 is expected by the end of this year, so it seems that "soon" isn't as soon as one might think for OpenAI. This market will likely end up as unresolved or N/A.
@JaundicedBaboon I had a belief or prediction that the full release of o1 will occur on or before the 15th. If they release it on the 15th, they will still have two weeks to test o1 and share the results by December 1st or earlier, which seems sufficient time. Although, there remains a possibility that this timeline may not be met, and that is a risk you will need to consider, as I am unable to alter the date.