How long until one of Gemini, Claude, etc... match the capabilities of O1?

1kṀ7819

2026

1.1%

Oct 12th 2024

1.6%

Dec 12th 2024

66%

April 12th 2025

26%

September 12th 2025

April 12th 2026

Other

OpenAI's O1 model represents a new paradigm of LLMs. How long until a competitor catches up?

"Catches up" / "matches capabilities" is defined as matching or exceeding the O1 pass@1 benchmarks on AIME, Codeforces, and GPQA at the time of publication:

74.4-percentile on AIME
89-percentile on Codeforces
78% accuracy on GPQA

OpenAI

Anthropic

Google Gemini

Get

1,000

to start trading!

People are also trading

Will ANY Gemini or Apollo astronaut become centenarian?

48% chance

Will Gemini 1.5 Pro seem to be as good as Gemini 1.0 Ultra for common use cases? [Poll]

70% chance

Will "Gemini [Ultra, 1.0] smash GPT-4 by 5x"?

Sort by:

Can you operationalize the etc..?

It's soooooo slow though. And the results in real world day-to-day usage are rarely better than Sonnet 3.5 new, which are nearly instantaneous. (I ask both the same questions, and use them on a nearly daily basis)

Option of Oct 12th2024 should be resolved.

@Adamacki I can’t seem to find a way to partially resolve the market

bought Ṁ50 YES

Apparently o1's AIME score was pass@10000, not pass@1. Criteria should be updates accordingly

@JaundicedBaboon I updated the benchmark to the pass@1 score

People are also trading

Will ANY Gemini or Apollo astronaut become centenarian?

48% chance

Will Gemini 1.5 Pro seem to be as good as Gemini 1.0 Ultra for common use cases? [Poll]

70% chance

Will "Gemini [Ultra, 1.0] smash GPT-4 by 5x"?

18% chance

People are also trading

People are also trading

Related questions