Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks? | Manifold

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

7

390Ṁ577

2040

1.6

expected

1H

6H

1D

1W

1M

ALL

For the purposes of this question, major RL benchmarks are ALE, Minecraft, chess, go, and Starcraft II.

Sample efficiency: the number of frames/games/amount of time required to achieve a given level of performance. For this market I will use average human performance: the algorithm must achieve average human performance (measured by score/ELO/time/etc) given the same amount of data.

Video game tasks could include: maximizing score, speed runs, challenge runs, or competing against human players.

I'm restricting the resolution to AAA video games to avoid possibilities like an indie developer making a Turing test video game.

"Essentially all":

Can complete >=90% of AAA video games in <= mean human completion time
Can achieve a top 100 speedrun (according to whatever the largest speedrun website at the time is) on >=90% of AAA video games given approximately the same amount of time as human speed runners
Can complete popular challenge runs on >=90% of AAA video games

The models used can include pretraining as long as the training data does not include frames from the video games. Instructions/manuals/guides can also be used, as long as they are available to human players (e.g. the contents of a speedrunners forum or a youtube video explaining a trick can be part of the input).

Note: this question is about algorithms rather than models. There is no requirement that a single model be able to play multiple video games. In cases where a single model is trained to play multiple video games, I will use its average sample efficiency across all those games.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will an AI be able to play a type of video game that it wasn't trained on before 2026?

When will an AI be able to speedrun a popular video game faster than the human WR?

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Related questions

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

By 2026 will any RL agent with learned causal models of its environment achieve superhuman performance on >=10 Atari environments?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will an AI be able to play a type of video game that it wasn't trained on before 2026?

When will an AI be able to speedrun a popular video game faster than the human WR?

In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

© Manifold Markets, Inc.•Terms•Privacy