BIG-bench accuracy 75% #2: Will SOTA for a single model on BIG-bench pass 75% by the start of 2025?
Mini
8
98
2025
65%
chance
  • Benchmarks

  • Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)

  • It must be a single model. If Model A achieves 75% on half and Model B achieves 75% on the other half that does not resolve the question YES

  • Ensemble models are fine but something like "run Model A on this benchmark and model B on this other benchmark" is not. If there is model selection is must be learned and it cannot include the current benchmark as an input.

Get Ṁ600 play money
Sort by:

For this and the related BIG-bench markets: it seems like most groups are done publishing metrics on the individual tasks (as opposed to average score), and that they're mostly publishing on BIG-bench hard. If that's the case then my current plan is to resolve these markets N/A, and I'll make new ones asking about average score on BIG-bench hard.