Will SOTA on MATH in Sep 2024 utilize a hard-coded search/amplification procedure?
10
closes 2024
47%
chance

Jeremy Gillen has bet Eli Lifland (myself) that (no-calculator) SOTA on MATH as of Sep 30, 2024 will utilize a hard-coded search/amplification procedure like MCTS. The bet is at 150:200 odds in Jeremy's favor: Eli will pay Jeremy $200 if this market resolves yes, otherwise Jeremy will pay Eli $150.

For current SOTA, see Minerva.

We've agreed on how a few boundary cases would resolve, and any disagreements about boundary cases will be judged by Thomas Larsen. We're reluctant to share the details of boundary cases publicly due to capability speedup concerns, and generally encourage commenters to be careful about infohazards.

Sort by:
JeanStanislasDenain avatar

Would Training Verifiers to Solve Math Word Problems count as a "hard-coded search/amplification procedure"?

JeremyGillen avatar
Jeremy Gillenis predicting YES at 49%

@JeanStanislasDenain This was one of our edge cases, we decided it would resolve in Eli's favor. So no.

JeanStanislasDenain avatar

@JeremyGillen Sorry about that.

Gigacasting avatar
Gigacasting

Some amazing hubris in assuming out of the thousands of people working in the area you’re the only ones who had some brilliant idea that might “speed things up”

That said, augmentation and solution-checking aren’t quite mcts but are vastly better than one-shot prompting (even restatement, reordering of answers etc. would be obvious day two work in any serious domain)

Gigacasting avatar
Gigacasting

“Bet on our weird bet of which we disclose no information about wtf it means, and please don’t talk about what it means because if the AI can do 12-th grade math the world will end 🤔”

Gigacasting avatar
Gigacasting

some say the world will end in fire, others ice, me I predict when someone leaks details about the qualifier to the qualifier to the people shown here

Related markets

Will a SOTA LM be trained on 10x more data points than Chinchilla, proving data-set scaling vs. parameter scaling?50%
Will it be possible to get a correct solution for the quadratic equation from the text-to-image model this year?42%
By 2030, will the Mu2e experiment at Fermilab find signals of physics beyond the Standard Model?40%
By the end of 2024, will at least 2 MIRI researchers publicly consider the Infra-Bayesianism agenda to have contributed substantially to solving the alignment problem?18%
Will Complexity theory in mathlib be mostly lambda-calculus based on 2025-02-17?65%
Will my upcoming mathematical idea be groundbreaking?8%
Will Fermat's last theorem be formalized before AI gets IMO gold?35%
Will we have a formalized proof of Fermat's last theorem by 2029-05-01?44%
Will we have a formalized proof of the Modularity theorem by 2029-05-01?40%
Will mathlib track formal proofs of at least 80 out of 100 theorems from Freek Wiedijk's list by 2023-12-31?59%
Will another aperiodic monotile be discovered?91%
Was mathematics created (YES) or discovered (NO)?36%
Will my question on invertible pidgeonhole functions be resolved by end of year?43%
If the experiment in the Chinchilla paper is repeated by a credible actor before 2026, will the exponent in the data scaling term be greater than 0.32?41%
Will the next nice number be found by a Proofnik?38%
Will someone strengthen our Goodhart's Law result?58%
Is the Strong Exponential Time Hypothesis true?45%
Will the Future Circular Collider discover any new elementary particle?45%
Will experts find the trig-based “proof” of the Pythagorean Theorem by high schoolers legit?97%
Will mathlib be ported to Lean 4 by the end of June?12%