Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?
1
270Ṁ20
2050
12
expected

Transfer model criteria:

  • The model can include pretrained non-RL components (e.g. it can include a language or image model (effort should have been made to avoid including states from the RL environments in the training set for any pretrained components, but this doesn't have to be perfect)).

  • The model can train for any amount of time on the training set of RL environments

  • Once transferred it must achieve mean human performance with human level sampling efficiency on >=75% of the test environments

Non-transfer model:

  • Can include pretrained components in the same way

  • Must achieve mean human performance with human level sampling efficiency on >= 75% of all the environments (there are no training vs test environments)

Market context
Get
Ṁ1,000
to start trading!

People are also trading

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
1.6
[ACX 2026] Will an AI model reach a 3 hour time horizon with 80% reliability during 2026?
62% chance
Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?
4.3
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
73% chance
Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?
8.1
Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?
37
Will a publicly known AI model achieve an 80% time horizon that is an 1 hour and 30 minutes by September 2026?
80% chance
Benchmark Gap #7: Once 10% of the medical Grand Challenges are "solved", how many months will it be before AI are in common use in hospitals for analyzing medical images with minimal human oversight?
64
When will any model achieve >=human performance on QuALITY?
© Manifold Markets, Inc.TermsPrivacy