BIG-bench accuracy 75% #1: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024?
4
130Ṁ388
resolved Jan 3
Resolved
NO
  • Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)

  • It must be a single model. If Model A achieves 75% on half and Model B achieves 75% on the other half that does not resolve the question YES

  • Ensemble models are fine but something like "run Model A on this benchmark and model B on this other benchmark" is not. If there is model selection is must be learned and it cannot include the current benchmark as an input.

    Feb 8, 2:38pm: BIG-bench accuracy 75%: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024? → BIG-bench accuracy 75% #1: Will SOTA for a single model on BIG-bench pass 75% by the start of 2024?

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ14
2Ṁ3
Sort by:

This is resolving NO but not in a way I like - many of the benchmarks simply are not used anymore (BIG-bench-hard is more common now), so worst case performance is below 75% in a somewhat trivial way. Average accuracy on BIG-bench-hard is above 80% now, but GPT-4 and Gemini only report average, not worst case.

The link no longer works, but by the URL the new link appears to be this. @VincentLuczkow Is that right?

predictedNO

@vluzko Yeah but which one is this question referring to?

@Shump See description: "Only the sub benchmarks that are scored as an accuracy (i.e. from 0-100%) will be included (I think that's all of them but I'm not sure)"

@vluzko In that case I think PALM-2 meets the requirements: https://arxiv.org/pdf/2305.10403v3.pdf

@Shump Palm 2 was only evaluated on a subset

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules