
BIG-bench accuracy 75% #5: Will SOTA for a single model on BIG-bench pass 75% by the start of 2028?
7
150Ṁ3072028
87%
chance
1H
6H
1D
1W
1M
ALL
Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)
It must be a single model. If Model A achieves 75% on half and Model B achieves 75% on the other half that does not resolve the question YES
Ensemble models are fine but something like "run Model A on this benchmark and model B on this other benchmark" is not. If there is model selection is must be learned and it cannot include the current benchmark as an input.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
People are also trading
What is this?
What is Manifold?
Manifold is the world's largest social prediction market.
Get accurate real-time odds on politics, tech, sports, and more.
Or create your own play-money betting market on any question you care about.
Are our predictions accurate?
Yes! Manifold is very well calibrated, with forecasts on average within 4 percentage points of the true probability. Our probabilities are created by users buying and selling shares of a market.
In the 2022 US midterm elections, we outperformed all other prediction market platforms and were in line with FiveThirtyEight’s performance. Many people who don't like betting still use Manifold to get reliable news.
Why use play money?
Mana (Ṁ) is the play-money currency used to bet on Manifold. It cannot be converted to cash. All users start with Ṁ1,000 for free.
Play money means it's much easier for anyone anywhere in the world to get started and try out forecasting without any risk. It also means there's more freedom to create and bet on any type of question.
Related questions
BIG-bench accuracy 75% #3: Will SOTA for a single model on BIG-bench pass 75% by the start of 2026?
86% chance
BIG-bench accuracy 75% #4: Will SOTA for a single model on BIG-bench pass 75% by the start of 2027?
86% chance
MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?
6% chance
MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?
44% chance
MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?
8% chance
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
61% chance
What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?
What will be the highest score achieved on SWE-Bench Verified in 2025?
What will be true of the SOTA AI on the FrontierMath benchmark, before 2028?
What will be true of the SOTA AI on the FrontierMath benchmark, before 2027?