Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same? | Manifold

Benchmark Gap #6: Once we have a transfer model that achieves human-level sample efficiency on many major RL environments, how many months will it be before we have a non-transfer model that achieves the same?

1

Ṁ270Ṁ20

2050

12

expected

1H

6H

1D

1W

1M

ALL

Transfer model criteria:

The model can include pretrained non-RL components (e.g. it can include a language or image model (effort should have been made to avoid including states from the RL environments in the training set for any pretrained components, but this doesn't have to be perfect)).
The model can train for any amount of time on the training set of RL environments
Once transferred it must achieve mean human performance with human level sampling efficiency on >=75% of the test environments

Non-transfer model:

Can include pretrained components in the same way
Must achieve mean human performance with human level sampling efficiency on >= 75% of all the environments (there are no training vs test environments)

Market context

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

Will a publicly known AI model achieve an 80% time horizon of 3 weeks by April 2027?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

Benchmark Gap #7: Once 10% of the medical Grand Challenges are "solved", how many months will it be before AI are in common use in hospitals for analyzing medical images with minimal human oversight?

Will models be able to do the work of an AI researcher/engineer before 2027?

When will any model achieve >=human performance on QuALITY?

Related questions

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

Will a publicly known AI model achieve an 80% time horizon of 3 weeks by April 2027?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

Benchmark Gap #7: Once 10% of the medical Grand Challenges are "solved", how many months will it be before AI are in common use in hospitals for analyzing medical images with minimal human oversight?

Will models be able to do the work of an AI researcher/engineer before 2027?

When will any model achieve >=human performance on QuALITY?