
OpenAI's O1 model represents a new paradigm of LLMs. How long until a competitor catches up?
"Catches up" / "matches capabilities" is defined as matching or exceeding the O1 pass@1 benchmarks on AIME, Codeforces, and GPQA at the time of publication:
74.4-percentile on AIME
89-percentile on Codeforces
78% accuracy on GPQA
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ236 | |
2 | Ṁ107 | |
3 | Ṁ105 | |
4 | Ṁ101 | |
5 | Ṁ100 |
People are also trading
@SimonBerens I think April 12th 2025 should resolve yes, as Gemini 2.5 Pro came out 25 March 2025 and wins clearly in AIME, GPQA, and programming (though I did not find a Codeforces benchmark it clearly wins on Aider Polyglot or LiveCodeBench 70% vs 65%)
It's soooooo slow though. And the results in real world day-to-day usage are rarely better than Sonnet 3.5 new, which are nearly instantaneous. (I ask both the same questions, and use them on a nearly daily basis)