Jeremy Gillen has bet Eli Lifland (myself) that (no-calculator) SOTA on MATH as of Sep 30, 2024 will utilize a hard-coded search/amplification procedure like MCTS. The bet is at 150:200 odds in Jeremy's favor: Eli will pay Jeremy $200 if this market resolves yes, otherwise Jeremy will pay Eli $150.
For current SOTA, see Minerva.
We've agreed on how a few boundary cases would resolve, and any disagreements about boundary cases will be judged by Thomas Larsen. We're reluctant to share the details of boundary cases publicly due to capability speedup concerns, and generally encourage commenters to be careful about infohazards.
Resolving NO due to o1, as @JeremyGillen linked https://openai.com/index/learning-to-reason-with-llms/
https://openai.com/index/learning-to-reason-with-llms/
I think this is not looking great for me, unless I'm misunderstanding how this works.
@JeanStanislasDenain This was one of our edge cases, we decided it would resolve in Eli's favor. So no.
Some amazing hubris in assuming out of the thousands of people working in the area you’re the only ones who had some brilliant idea that might “speed things up”
That said, augmentation and solution-checking aren’t quite mcts but are vastly better than one-shot prompting (even restatement, reordering of answers etc. would be obvious day two work in any serious domain)