Will published performance on GSM8K-test exceed 90% by 1st April 2023? | Manifold

Will published performance on GSM8K-test exceed 90% by 1st April 2023?

Basic

8

Ṁ134

resolved Mar 15

Resolved

YES

1D

1W

1M

ALL

https://arxiv.org/abs/2110.14168 https://paperswithcode.com/dataset/gsm8k State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning. To diagnose the failures of current models and support research, we introduce GSM8K, a dataset of 8.5K high quality linguistically diverse grade school math word problems. We find that even the largest transformer models fail to achieve high test performance, despite the conceptual simplicity of this problem distribution. To increase performance, we propose training verifiers to judge the correctness of model completions. At test time, we generate many candidate solutions and select the one ranked highest by the verifier. We demonstrate that verification significantly improves performance on GSM8K, and we provide strong empirical evidence that verification scales more effectively with increased data than a finetuning baseline.

This question is managed and resolved by Manifold.

Get

1,000

and

3.00

Sort by:

predicted YES

GPT-4 achieves 92.0% - https://cdn.openai.com/papers/gpt-4.pdf

predicted YES

We're at 87.3%

https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k

https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k We're at 83% now...

Minerva just got >50% on MATH...

Related questions

Will GPT4/Opus report >50% score on ARC in 2024?

What will be the best score on the GPQA benchmark before 2025?

What will be the best score on the GAIA benchmark before 2025?

MMLU 99% #2: Will SOTA for MMLU (average) pass 99% by the start of 2025?

MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?

Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?

Will Grok achieve 98% or greater on ARC by the end of November 2024?

MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?

Related questions

Will GPT4/Opus report >50% score on ARC in 2024?

Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.

What will be the best score on the GPQA benchmark before 2025?

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

What will be the best score on the GAIA benchmark before 2025?

MMLU 99% #5: Will SOTA for MMLU (average) pass 99% by the start of 2028?

MMLU 99% #2: Will SOTA for MMLU (average) pass 99% by the start of 2025?

Will Grok achieve 98% or greater on ARC by the end of November 2024?

MMLU 99% #3: Will SOTA for MMLU (average) pass 99% by the start of 2026?

MMLU 99% #4: Will SOTA for MMLU (average) pass 99% by the start of 2027?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules