Forward RL became a well-known component of most current SOTA LMs (through RLHF), against the expectations of many. Will IRL have a similarly significant, publicly known role in any key component of SOTA alignment techniques before Jan 1, 2028?
I have long felt conflicted about CHAI's IRL-heavy research agenda, on the one hand seeing a clear pathway of relevance from IRL to alignment that certainly deserves exploration, and on the other observing that current IRL-based imitation learning methods seem very hard to scale to the largest environments.
In some of my early writing (https://www.lesswrong.com/posts/wf83tBACPM9aiykPn/a-survey-of-foundational-methods-in-inverse-reinforcement), I concluded: "The current paradigm of IRL is simply unable to (a) robustly apply (b) sufficiently complex rewards in (c) sufficiently complex environments."
This question tries to gague what I consider to be the crux of my intuition--that IRL will be widely considered relevant to alignment by 2028 iff it makes significant progress on scalability soon, to a comparable level as RLHF. If anyone at CHAI would be interested in a conversation about this, please let me know!
Since there is a bit of subjective interpretation of what a "significant component" entails, I will not bet in this market.