How long until one of Gemini, Claude, etc... match the capabilities of O1?
29
1kṀ8119
resolved Jul 24
100%80%
April 12th 2025
0.9%
Oct 12th 2024
1.3%
Dec 12th 2024
14%
September 12th 2025
1.7%
April 12th 2026
2%Other

OpenAI's O1 model represents a new paradigm of LLMs. How long until a competitor catches up?

"Catches up" / "matches capabilities" is defined as matching or exceeding the O1 pass@1 benchmarks on AIME, Codeforces, and GPQA at the time of publication:

  • 74.4-percentile on AIME

  • 89-percentile on Codeforces

  • 78% accuracy on GPQA

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ236
2Ṁ107
3Ṁ105
4Ṁ101
5Ṁ100
Sort by:
bought Ṁ300 YES

@SimonBerens I think April 12th 2025 should resolve yes, as Gemini 2.5 Pro came out 25 March 2025 and wins clearly in AIME, GPQA, and programming (though I did not find a Codeforces benchmark it clearly wins on Aider Polyglot or LiveCodeBench 70% vs 65%)

Can you operationalize the etc..?

It's soooooo slow though. And the results in real world day-to-day usage are rarely better than Sonnet 3.5 new, which are nearly instantaneous. (I ask both the same questions, and use them on a nearly daily basis)

Option of Oct 12th2024 should be resolved.

@Adamacki I can’t seem to find a way to partially resolve the market

bought Ṁ50 YES

Apparently o1's AIME score was pass@10000, not pass@1. Criteria should be updates accordingly

@JaundicedBaboon I updated the benchmark to the pass@1 score

© Manifold Markets, Inc.TermsPrivacy