What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release? | Manifold

What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release?

10

100Ṁ578

Jun 2

0.6%

[0%, 60%)

5%

[60%, 70%)

88%

[70%, 80%)

5%

[80%, 90%)

1.1%

[90%, 100%]

Rather than yet another market speculating on the exact date Claude 3.5 Opus's, I find it more interesting to see how good people's projections of capabilities are at this point. Towards this, I'm curious to get estimate of Claude 3.5 Opus's GPQA Diamond performance.

For context, Claude 3.5 Sonnet achieved 59.4% accuracy, and o1 (unreleased version reportedly achieves 77.3% accuracy (pass@1, rightmost).

If GPQA Diamond performance for Claude 3.5 Opus isn't reported within 3 months following release, either by Anthropic or a source I consider credible (e.g. the benchmark creators, Scale's benchmarking team, etc.), I'll resolve this N/A.

Note: I won't bet on this market.

Get

1,000

to start trading!

People are also trading

Will Claude 3.5 Opus be available via API by end of 2025?

-11% 1d5% chance

Which will be released first: Claude 3.5 Opus or Claude 4.0 Sonnet?

Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?

-15% 1d13% chance

Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?

What will be the *first* ELO Rating of Claude 3.5 Opus in the LMSYS Arena?

Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?

GPT-5 score on GPQA Diamond?

What score will GPT-5 achieve on GPQA?

Will Claude Opus be ranked in the top 20 on the Chatbot Arena Leaderboard two years from today (3/10/24)?

Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?

Related questions

Will Claude 3.5 Opus be available via API by end of 2025?

Which will be released first: Claude 3.5 Opus or Claude 4.0 Sonnet?

Will Claude 4 achieve over 95% on the MMLU-Pro benchmark by end of 2025?

Will Claude 3.5 Opus have a higher Chat Arena Elo than GPT-5?

What will be the *first* ELO Rating of Claude 3.5 Opus in the LMSYS Arena?

Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?

GPT-5 score on GPQA Diamond?

What score will GPT-5 achieve on GPQA?

Will Claude Opus be ranked in the top 20 on the Chatbot Arena Leaderboard two years from today (3/10/24)?

Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?

© Manifold Markets, Inc.•Terms•Privacy