Will a machine learning model score above 50.0% on the MATH dataset before 2025?
17
17
Ṁ11KṀ102
resolved Jun 30
Resolved
YES1D
1W
1M
ALL
From Hendrycks et al (https://arxiv.org/abs/2103.03874),
> Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. [...]
> Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.
In addition,
> It's also worth mentioning the competition maths problems in MATH are designed under the assumption that competitors don't use calculators or script executors. That way, solving them requires making a clever observation or reducing the search space to make the problem tractable. With a script executor, competitors do not need to figure out how to succinctly reason to the conclusion and cleverness is rarely needed.
> There are other competition problems designed to be difficult even with calculators and script executors, but there are not nearly as many of these problems lying around.
The best model in the paper only received an average accuracy of 6.9% on the dataset.
This question resolves to YES if the state-of-the-art average accuracy score on the MATH dataset, as reported prior to January 1st 2025 Eastern Time, is above 50.0%. Credible reports include but are not limited to blog posts, arXiv preprints, and papers. Otherwise, it resolves to NO.
I will use my discretion in determining whether a result should be considered valid. Obvious cheating, such as including the test set in the training data, does not count. Only results that use a no-calculator restriction will count.
Get Ṁ500 play money
Related questions
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ84 | |
2 | Ṁ80 | |
3 | Ṁ79 | |
4 | Ṁ67 | |
5 | Ṁ17 |
Sort by:

Eli Liflandbought Ṁ200 of YES

Nathanbought Ṁ200 of YES
Related questions
Will an AI get gold on any International Math Olympiad by 2025?
Austin
Will at least $1M in XTX Markets AI-MO "Progress Prize(s)" be awarded by end of the 65th IMO in July 2024?

Joe Brenton
Will an AI win a Gold Medal on the International Math Olympiad by 2027?

Gigacasting
41. Will an image model win Scott Alexander’s bet on compositionality, to Edwin Chen’s satisfaction, in 2023?

ACX BotBot
Will an AI solve any important mathematical conjecture before January 1st, 2030?

Matthew Barnett
Will MIT Forecasting Group write up a post about their project by the end of 2023?

Misha
Will an AI get bronze on any International Math Olympiad by 2025?

Dan
Will @Mira solve the 2023 Sudoku challenge?

Mira
Will there be a Forward-Forward Algorithm based neural network with >65% Top 1 Accuracy on Papers With Code's ImageNet leaderboard by 2024?
l8doku
🐕 Will AI Achieve Significantly More, "Embodiment" by end of 2023?

Patrick Delaney
Will there be a statistical test for catching superposition? (2023)
firstuserhere
Will an AI win a Gold Medal on the International Math Olympiad by 2029?

Gigacasting
Will an AI get gold on any International Math Olympiad by 2025?

Levi Finkelstein
Will any corporate quantum computing team publicly report the implementation of 2 or more simultaneous 2-qubit gates with average gate fidelity of at least 99.9% before Feb 2024?
Quantum Observer
Will @firstuserhere coauthor a NeurIPS or ICML conference publication before end of 2024? (10,000 Mana subsidy)
firstuserhere
Will Neel Nanda's or Lee Sharkey's SERI MATS streams (Summer 2023) produce a NeurIPS or ICML conference publication?
Ryan
Will human mathematicians be mostly obsolete by 2030?

Levi Finkelstein
Will a new best accuracy for ImageNet classification be achieved before the end of 2023?
Widden
Will Nonlinear post its response by EOY?
Rodeo
Will an AI get bronze or silver on any International Math Olympiad by end of 2025?

Forrest