Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code? | Manifold

Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?

5

300Ṁ70

2050

8.1 years

expected

1H

6H

1D

1W

1M

ALL

8%

0 - 1

5%

1 - 2

5%

2 - 3

5%

3 - 4

6%

4 - 5

6%

5 - 6

6%

6 - 7

6%

7 - 8

6%

8 - 9

5%

9 - 10

45%

Above 10

To expand on the title: once an AI beats our current hardest engineering benchmarks, how many years will it be before humans are not hired to do software engineering anymore?

The benchmarks I'll consider for this question are:

SWE-bench Lite: better than 90% resolved.
RE-Bench: mean normalized score >= 1.2*
CodeContests: pass@5 >= 0.9

Once a single model achieves all of these, how many years will it be before "software engineer" (as understood in 2025) is not a job humans get hired for?

Some notes on resolution:

This market is about humans doing the core work of software engineering - opening tickets, pulling a branch, writing new code, testing it, submitting PRs, etc.
If "software engineer" is still a job title but means something different, the market still resolves.
If software engineers stay employed but their work changes the market still resolves - e.g. if all software engineers switch from being ICs to "AI managers" of some sort.
If this happens before the benchmarks are beaten then the market resolves to 0.
If these benchmarks undergo minor variations, I'll allow the market to resolve based on either the original or the variant (e.g. if a question is added to RE-bench, or a different subset of SWE-bench becomes popular).
If there are still some humans doing software engineering here and there the market still resolves - I'm not really interested in whether some random small companies or government departments will refuse to change.

*The paper lists 0.98 as the average score for testers from METR's professional network, which was their best group of testers. They don't give a variance so I'm adding a little for wiggle room but I think this is reasonably close to "peak human".

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Will the first AI model that saturates Humanity's Last Exam be employable as a software engineer?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

Related questions

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Will the first AI model that saturates Humanity's Last Exam be employable as a software engineer?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?

Benchmark Gap #4: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, how many months will it be before an AI is listed as a (co) first author on a published math paper?

© Manifold Markets, Inc.•Terms•Privacy