Is “reasoning” mostly elicitation?
10
2.1kṀ2462026
21%
chance
1H
6H
1D
1W
1M
ALL
One interesting research programme in 2025 suggests that RL on verifiable rewards (RLVR) actually doesn't add capability to a base model, but instead makes it easier to elicit existing capabilities.
Resolution: at the end of next year, will I put >66% that RLVR is bottlenecked on capabilities learned during pretraining?
My current credence (Dec 2025): 30%
If you want to use a model of me as well as your model of RLVR to answer, here are some of my views.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
Sort by:
@GavinLeech I recommend adding some topics to this market to increase its discoverability
PS, relevant: https://www.lesswrong.com/posts/ZtQD8CmQRZKNQFRd3/faul_sname-s-shortform?commentId=ZHrzKtB4p3uZ7uSNk