Will SOTA on MATH in Sep 2024 utilize a hard-coded search/amplification procedure?

1kṀ4362

resolved Oct 1

Resolved

ALL

Jeremy Gillen has bet Eli Lifland (myself) that (no-calculator) SOTA on MATH as of Sep 30, 2024 will utilize a hard-coded search/amplification procedure like MCTS. The bet is at 150:200 odds in Jeremy's favor: Eli will pay Jeremy $200 if this market resolves yes, otherwise Jeremy will pay Eli $150.

For current SOTA, see Minerva.

We've agreed on how a few boundary cases would resolve, and any disagreements about boundary cases will be judged by Thomas Larsen. We're reluctant to share the details of boundary cases publicly due to capability speedup concerns, and generally encourage commenters to be careful about infohazards.

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ210
2		Ṁ57
3		Ṁ37
4		Ṁ27
5		Ṁ24

People are also trading

By 2026, will it be standard practice to sandbox SOTA LLMs?

27% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2026?

By 2026, the SOTA in image generation will be using a voice chat to control the generation.

49% chance

BIG-bench accuracy 75% #3: Will SOTA for a single model on BIG-bench pass 75% by the start of 2026?

86% chance

What will be true of the SOTA AI on the FrontierMath benchmark, before 2028?

What will be true of the SOTA AI on the FrontierMath benchmark, before 2027?

Will a SOTA model be trained with Kolmogorov-Arnold Networks by 2029?

8% chance

SOTA AI at EOY 2026 a reasoning model?

81% chance

Will the transformer architecture be replaced in SOTA LLMs by 2028?

61% chance

Any SOTA AI model uses human-understandable thinking medium at the end of 2028?

Sort by:

Resolving NO due to o1, as @JeremyGillen linked https://openai.com/index/learning-to-reason-with-llms/

bought Ṁ250 NO

https://openai.com/index/learning-to-reason-with-llms/
I think this is not looking great for me, unless I'm misunderstanding how this works.

predictedYES

@JeanStanislasDenain This was one of our edge cases, we decided it would resolve in Eli's favor. So no.

@JeremyGillen Sorry about that.

Some amazing hubris in assuming out of the thousands of people working in the area you’re the only ones who had some brilliant idea that might “speed things up”

That said, augmentation and solution-checking aren’t quite mcts but are vastly better than one-shot prompting (even restatement, reordering of answers etc. would be obvious day two work in any serious domain)

“Bet on our weird bet of which we disclose no information about wtf it means, and please don’t talk about what it means because if the AI can do 12-th grade math the world will end 🤔”

some say the world will end in fire, others ice, me I predict when someone leaks details about the qualifier to the qualifier to the people shown here