Inverse Reinforcement Learning (IRL) a significant component of publicly known SOTA alignment techniques by 2028
15
103
270
2028
14%
chance

Forward RL became a well-known component of most current SOTA LMs (through RLHF), against the expectations of many. Will IRL have a similarly significant, publicly known role in any key component of SOTA alignment techniques before Jan 1, 2028?

I have long felt conflicted about CHAI's IRL-heavy research agenda, on the one hand seeing a clear pathway of relevance from IRL to alignment that certainly deserves exploration, and on the other observing that current IRL-based imitation learning methods seem very hard to scale to the largest environments.

In some of my early writing (https://www.lesswrong.com/posts/wf83tBACPM9aiykPn/a-survey-of-foundational-methods-in-inverse-reinforcement), I concluded: "The current paradigm of IRL is simply unable to (a) robustly apply (b) sufficiently complex rewards in (c) sufficiently complex environments."

This question tries to gague what I consider to be the crux of my intuition--that IRL will be widely considered relevant to alignment by 2028 iff it makes significant progress on scalability soon, to a comparable level as RLHF. If anyone at CHAI would be interested in a conversation about this, please let me know!

Since there is a bit of subjective interpretation of what a "significant component" entails, I will not bet in this market.

Get Ṁ600 play money

More related questions