Will o1 (not preview) achieve a better score on LiveBench coding than Claude 3.5 Sonnet 10/22?
Basic
5
Ṁ261resolved Dec 14
Resolved
NO1D
1W
1M
ALL
Per LiveBench.ai Claude 3.5 Sonnet achieves 67.13 while o1-preview gets only 50.85.
Resolves when o1 is added to the LiveBench leaderboard
Update 2024-11-12 (PST): Market will resolve based on API results from LiveBench, not manual additions to the leaderboard. (AI summary of creator comment)
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
Will o1 score ≥60% on the REBUS benchmark?
57% chance
Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?
32% chance
Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?
32% chance
Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on Simple Bench?
66% chance
Is Claude 3.5 Sonnet a distilled or quantized version of a larger model?
48% chance
What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release?
Will I judge GPT-5 to be smarter than o1 (not preview) after both are released?
78% chance
Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?
63% chance
What will be the *first* ELO Rating of Claude 3.5 Opus in the LMSYS Arena?
Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?
6% chance