Will reinforcement learning overtake LMs on math before 2028?

Ṁ1kṀ4.3k

2028

61%

chance

ALL

Will a state of the art model on Hendrycks' MATH be trained for more FLOP on RL than it is on LM objectives? A purely RL model counts as well of course.

RL encompasses anything involving online learning or expert iteration-like etc. If this ends up being difficult to call because of some breakthrough in decision transformer style conditional imitation learning (ie something between rl and LMs), I will probably cancel the market as ambiguous.

When models approach 100% acc on MATH, a similar successor natural language math dataset will be used instead.

Market context

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

What tactic will prove the most mathlib lemmas at the end of 2026?

Will AI models solve at least 2 FrontierMath Open Problems before 2027?

83% chance

Will AI be better every human at proving Math theorems by the end of 2030?

60% chance

Will end-to-end neural networks such as LLMs can beat the best human player in chess by 2028?

66% chance

Will there by a major breakthrough in LLM continual learning before 2027?

18% chance

AI outperforms humans in all mathematical research areas by 2028?

23% chance

Will second-order optimizers displace first-order optimizers for training LLMs by 2030?

43% chance

Will there be any major breakthrough in LLM continual learning before 2028?

76% chance

Will research-level math become a sport akin to chess before 2035?

12% chance

Will Deep Learning Models Beat Classical ML algorithms on tabular data in 2028?

60% chance

Sort by:

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

I'd guess this took something like 1-10 trillion tokens worth of FLOPs.

predictedNO

Would you count LM regularization Terms computed during RL phase as part of the LM share? This may actually be hard to disentangle?

predictedYES

@Thomas42 That’s a bit tricky, but I’d say kl penalties from base LM should just be counted as part of the RL compute. That’s not an LM loss anyway.

If this question ends up hinging on some edge case like a method which does continued LM training during RL, and the relative compute contributions are unclear Ill probably resolve N/A.

I think the first question one should ask is will anything overtake LMs. The probability that one specific technology would be the one doing the overtaking should then be below that base probability. I place the first probability at around 50%, so I am comfortable betting against this at the current price.

Why are you defining RL as online learning? Online learning encompasses more than RL. Why not define it using action/state/reward?

predictedYES

@vluzko I wanted to exclude decision transformer type stuff. Maybe it would be more fair to have titled the question 'Will online learning overtake offline learning for LMs on Math...', but I went for something more eye-catching.

I'm interested in this because I'm interested in the data shortage in terms of imitation learning data available. I also think offline learning has different safety properties.

predictedYES

Would be curious to hear why everyone's NO on this. 2028 is 5 years out, and Epoch AI estimates 4x/yr compute scaling, with text data running out by EOY 2024. That gives 3 years worth of compute scaling that needs to go somewhere else.