Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
7
390Ṁ577
2040
1.6
expected

For the purposes of this question, major RL benchmarks are ALE, Minecraft, chess, go, and Starcraft II.

Sample efficiency: the number of frames/games/amount of time required to achieve a given level of performance. For this market I will use average human performance: the algorithm must achieve average human performance (measured by score/ELO/time/etc) given the same amount of data.

Video game tasks could include: maximizing score, speed runs, challenge runs, or competing against human players.

I'm restricting the resolution to AAA video games to avoid possibilities like an indie developer making a Turing test video game.

"Essentially all":

  • Can complete >=90% of AAA video games in <= mean human completion time

  • Can achieve a top 100 speedrun (according to whatever the largest speedrun website at the time is) on >=90% of AAA video games given approximately the same amount of time as human speed runners

  • Can complete popular challenge runs on >=90% of AAA video games

The models used can include pretraining as long as the training data does not include frames from the video games. Instructions/manuals/guides can also be used, as long as they are available to human players (e.g. the contents of a speedrunners forum or a youtube video explaining a trick can be part of the input).

Note: this question is about algorithms rather than models. There is no requirement that a single model be able to play multiple video games. In cases where a single model is trained to play multiple video games, I will use its average sample efficiency across all those games.

Get
Ṁ1,000
to start trading!
Sort by:

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
83% chance
Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?
12
Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?
4.3
By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?
81% chance
Will an AI be able to play a type of video game that it wasn't trained on before 2026?
33% chance
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules