Will the gap between open-weights and frontier models on GPQA Diamond be at most 7%?
7
100Ṁ579
2026
60%
chance

At the end of 2026, there will be a model that performs best on GPQA Diamodn. There will also be an open-weights model that performs best on GPQA Diamond.

Question resolves positively if and only if the score of the best open-weights model on 0-shot CoT GPQA is at most 7% less than the score of the best-performing model on 0-shot CoT GPQA.

As of the time of writing, the model that performs best on GPQA Diamond is Claude Sonnet 3.5, with a score of 59.4. The best performing open-weights model is Llama 3.1-405B, with a score of 51.1. This would not be sufficient for a positive resolution, as the gap is 8.3%. If the gap is exactly 7%, the question still resolves positively, but if it is 7.1%, it resolves negatively. The question also resolves positively if open-weights models are at the frontier on GPQA (i.e. if they beat closed-weights models).

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy