BIG-bench accuracy 75% #1: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024?

130Ṁ388

resolved Jan 3

Resolved

ALL

Benchmark

Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)
It must be a single model. If Model A achieves 75% on half and Model B achieves 75% on the other half that does not resolve the question YES
Ensemble models are fine but something like "run Model A on this benchmark and model B on this other benchmark" is not. If there is model selection is must be learned and it cannot include the current benchmark as an input.
Feb 8, 2:38pm: ~~BIG-bench accuracy 75%: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024?~~ → BIG-bench accuracy 75% #1: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024?

Technical AI Timelines

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ14
2		Ṁ3

People are also trading

BIG-bench accuracy 75% #3: Will SOTA for a single model on BIG-bench pass 75% by the start of 2026?

86% chance

BIG-bench accuracy 75% #4: Will SOTA for a single model on BIG-bench pass 75% by the start of 2027?

86% chance

BIG-bench accuracy 75% #5: Will SOTA for a single model on BIG-bench pass 75% by the start of 2028?

87% chance

MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?

6% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

61% chance

MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?

8% chance

MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?

44% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2028?

What will be true of the SOTA AI on the FrontierMath benchmark, before 2027?

7 Comments

3 Holders

9 Trades

Sort by:

This is resolving NO but not in a way I like - many of the benchmarks simply are not used anymore (BIG-bench-hard is more common now), so worst case performance is below 75% in a somewhat trivial way. Average accuracy on BIG-bench-hard is above 80% now, but GPT-4 and Gemini only report average, not worst case.

The link no longer works, but by the URL the new link appears to be this. @VincentLuczkow Is that right?

@Shump Full set of benchmarks is here: https://paperswithcode.com/dataset/big-bench

predictedNO

@vluzko Yeah but which one is this question referring to?

@Shump See description: "Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)"