What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release?
Basic
2
Ṁ105Mar 2
13%
[0%, 60%)
28%
[60%, 70%)
39%
[70%, 80%)
13%
[80%, 90%)
7%
[90%, 100%]
Rather than yet another market speculating on the exact date Claude 3.5 Opus's, I find it more interesting to see how good people's projections of capabilities are at this point. Towards this, I'm curious to get estimate of Claude 3.5 Opus's GPQA Diamond performance.
For context, Claude 3.5 Sonnet achieved 59.4% accuracy, and o1 (unreleased version reportedly achieves 77.3% accuracy (pass@1, rightmost).
If GPQA Diamond performance for Claude 3.5 Opus isn't reported within 3 months following release, either by Anthropic or a source I consider credible (e.g. the benchmark creators, Scale's benchmarking team, etc.), I'll resolve this N/A.
Note: I won't bet on this market.
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Related questions
Related questions
When will Claude 3.5 Opus be released?
Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?
29% chance
Will Claude 3.5 Opus be available via API by end of 2025?
82% chance
Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?
68% chance
Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?
7% chance
Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on LiveBench?
55% chance
Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on Simple Bench?
55% chance
What will be the *first* ELO Rating of Claude 3.5 Opus in the LMSYS Arena?
Will o1 (not preview) achieve a better score on LiveBench coding than Claude 3.5 Sonnet 10/22?
75% chance
Will we get a video of claude 3.5 Sonnet running a very single minded competent minecraft agent before December 2024?
18% chance