Will a single model achieve superhuman performance on all OpenAI gym environments by 2025?
"OpenAI gym environments" means all environments that come with the `gym` python package as of 2022-04-08 (this includes the Mujoco environments even though they require third-party software) Note: single *model*, not single *algorithm* If I decide the model is actually many environment specific models stuck together (e.g. a hierarchical model that predicts current environment and delegates to a subagent) it will not count (a single model that *learns* a hierarchy without any inductive bias pushing it towards that *would* count, however). Apr 8, 11:38pm: For the environments that don't have a human performance benchmark, I'll accept if the model is better than SOTA from two years before the publication date.
Get Ṁ600 play money
Sort by:

My guess is that a lab could train a model that could achieve superhuman performance on all OpenAI gym environments (includes Montezuma's revenge, so would be hard for most individuals to do easily is my sense), but I don't think it would. What are the odds a lab would train their multi-game RL policy on every single environment, including https://www.gymlibrary.dev/environments/toy_text/? Maybe 20%?

So then the question is if there will be a model that was trained on Montezuma's revenge or can generalize to Montezuma's revenge and generalize to these sillier toy text environments, or if someone finetunes an open-sourced atari-trained model on these sillier environments. 25%?

Then an extra 10% because there are worlds I haven't accounted for, and this tends to push towards this event happening instead of not happpening.

20% + (100%-20%)*(25%) + 10% = 0.5

@NoaNabeshima I think that if they can get 80% of environments then they can also get 100% of environments, and training on the extra 20% is a relatively low cost (especially if there are transfer gains, which seems likely), and in return they get to say "we solved all the environments" which has historically been a pretty big motivator for RL research.

@vluzko Gato didn't train on toy text (I think), I imagine because it doesn't seem that helpful/important.


predicts NO

@NoaNabeshima or just wasn't being tracked by DM engineers

predicts NO

@vluzko I mean they did train on control environments which seem similarly trivial. shrug

@NoaNabeshima I think Gato is a little weird as an example, I think more central RL papers often try to solve as many environments as they can. e.g. Agent 57 cared a lot about beating the sub benchmark of "all Atari environments", I think that there might be a similar push for "GymX" that solves all Gym environments. Not guaranteed of course but if I was writing a paper that solved most gym environments I would definitely spend a weekend running it on all the tiny esoteric environments too

predicts NO

@vluzko "all atari environments", "all control environments" seem like more natural kinds than "all gym environments" to me. Also, as a model learns to solve more environments, I think focusing on training on easy environments will be less of a big deal because it'll be more obvious that your model could solve them.