BIG-bench accuracy 75% #2: Will SOTA for a single model on BIG-bench pass 75% by the start of 2025?
Basic
8
Ṁ98
resolved Jan 9
Resolved
N/A
  • Benchmarks

  • Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)

  • It must be a single model. If Model A achieves 75% on half and Model B achieves 75% on the other half that does not resolve the question YES

  • Ensemble models are fine but something like "run Model A on this benchmark and model B on this other benchmark" is not. If there is model selection is must be learned and it cannot include the current benchmark as an input.

  • Update 2025-05-01 (PST) (AI summary of creator comment): - If no BIG-bench results are available for any major models by the resolution date, the market will be resolved as N/A.

    • NO will not be resolved based solely on SOTA results from 2023.

    • YES will not be resolved based on personal predictions.

Get
Ṁ1,000
and
S3.00
Sort by:

I'm inclined to resolve this N/A - I can't find BIG-bench results for any major models currently. I think it would be extremely disingenuous to resolve NO based on SOTA results from 2023, but won't resolve YES based on my personal guess that this could be done. Has anyone been able to find recent BIG-bench results?

For this and the related BIG-bench markets: it seems like most groups are done publishing metrics on the individual tasks (as opposed to average score), and that they're mostly publishing on BIG-bench hard. If that's the case then my current plan is to resolve these markets N/A, and I'll make new ones asking about average score on BIG-bench hard.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules