Will a machine learning model score above 50.0% on the MATH dataset before 2025?

17

17

Ṁ11KṀ102

resolved Jun 30

Resolved

YES1D

1W

1M

ALL

From Hendrycks et al (https://arxiv.org/abs/2103.03874),
> Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. [...]
> Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.
In addition,
> It's also worth mentioning the competition maths problems in MATH are designed under the assumption that competitors don't use calculators or script executors. That way, solving them requires making a clever observation or reducing the search space to make the problem tractable. With a script executor, competitors do not need to figure out how to succinctly reason to the conclusion and cleverness is rarely needed.
> There are other competition problems designed to be difficult even with calculators and script executors, but there are not nearly as many of these problems lying around.
The best model in the paper only received an average accuracy of 6.9% on the dataset.
This question resolves to YES if the state-of-the-art average accuracy score on the MATH dataset, as reported prior to January 1st 2025 Eastern Time, is above 50.0%. Credible reports include but are not limited to blog posts, arXiv preprints, and papers. Otherwise, it resolves to NO.
I will use my discretion in determining whether a result should be considered valid. Obvious cheating, such as including the test set in the training data, does not count. Only results that use a no-calculator restriction will count.

Get Ṁ500 play money

## Related questions

# 🏅 Top traders

# | Name | Total profit |
---|---|---|

1 | Ṁ84 | |

2 | Ṁ80 | |

3 | Ṁ79 | |

4 | Ṁ67 | |

5 | Ṁ17 |

Sort by:

Eli Liflandbought Ṁ200 of YES

Nathanbought Ṁ200 of YES

## Related questions

Will an AI get gold on any International Math Olympiad by 2025?

Austin

Will at least $1M in XTX Markets AI-MO "Progress Prize(s)" be awarded by end of the 65th IMO in July 2024?

Joe Brenton

Will an AI win a Gold Medal on the International Math Olympiad by 2027?

Gigacasting

41. Will an image model win Scott Alexander’s bet on compositionality, to Edwin Chen’s satisfaction, in 2023?

ACX BotBot

Will an AI solve any important mathematical conjecture before January 1st, 2030?

Matthew Barnett

Will MIT Forecasting Group write up a post about their project by the end of 2023?

Misha

Will an AI get bronze on any International Math Olympiad by 2025?

Dan

Will @Mira solve the 2023 Sudoku challenge?

Mira

Will there be a Forward-Forward Algorithm based neural network with >65% Top 1 Accuracy on Papers With Code's ImageNet leaderboard by 2024?

l8doku

🐕 Will AI Achieve Significantly More, "Embodiment" by end of 2023?

Patrick Delaney

Will there be a statistical test for catching superposition? (2023)

firstuserhere

Will an AI win a Gold Medal on the International Math Olympiad by 2029?

Gigacasting

Will an AI get gold on any International Math Olympiad by 2025?

Levi Finkelstein

Will any corporate quantum computing team publicly report the implementation of 2 or more simultaneous 2-qubit gates with average gate fidelity of at least 99.9% before Feb 2024?

Quantum Observer

Will @firstuserhere coauthor a NeurIPS or ICML conference publication before end of 2024? (10,000 Mana subsidy)

firstuserhere

Will Neel Nanda's or Lee Sharkey's SERI MATS streams (Summer 2023) produce a NeurIPS or ICML conference publication?

Ryan

Will human mathematicians be mostly obsolete by 2030?

Levi Finkelstein

Will a new best accuracy for ImageNet classification be achieved before the end of 2023?

Widden

Will Nonlinear post its response by EOY?

Rodeo

Will an AI get bronze or silver on any International Math Olympiad by end of 2025?

Forrest