Benchmark Gap #9: Once a model solves current software engineering benchmarks, how long until humans don't code?
4
300Ṁ60
2050
7.2 years
expected
7%
0 - 1
7%
1 - 2
7%
2 - 3
7%
3 - 4
7%
4 - 5
7%
5 - 6
7%
6 - 7
7%
7 - 8
7%
8 - 9
7%
9 - 10
30%
Above 10

To expand on the title: once an AI beats our current hardest engineering benchmarks, how many years will it be before humans are not hired to do software engineering anymore?

The benchmarks I'll consider for this question are:

Once a single model achieves all of these, how many years will it be before "software engineer" (as understood in 2025) is not a job humans get hired for?

Some notes on resolution:

  • This market is about humans doing the core work of software engineering - opening tickets, pulling a branch, writing new code, testing it, submitting PRs, etc.

  • If "software engineer" is still a job title but means something different, the market still resolves.

  • If software engineers stay employed but their work changes the market still resolves - e.g. if all software engineers switch from being ICs to "AI managers" of some sort.

  • If this happens before the benchmarks are beaten then the market resolves to 0.

  • If these benchmarks undergo minor variations, I'll allow the market to resolve based on either the original or the variant (e.g. if a question is added to RE-bench, or a different subset of SWE-bench becomes popular).

  • If there are still some humans doing software engineering here and there the market still resolves - I'm not really interested in whether some random small companies or government departments will refuse to change.

*The paper lists 0.98 as the average score for testers from METR's professional network, which was their best group of testers. They don't give a variance so I'm adding a little for wiggle room but I think this is reasonably close to "peak human".

Get
Ṁ1,000
to start trading!

People are also trading

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?
73% chance
Will the first AI model that saturates Humanity's Last Exam be employable as a software engineer?
35% chance
Will an AI achieve >80% performance on the FrontierMath benchmark before 2027?
68% chance
Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?
92% chance
Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?
67% chance
Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?
1.6
Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.
25% chance
Will an AI system be able to fully refactor a 10k+ line codebase before 2026 ?
27% chance
Benchmark Gap #8: Once a single AI gets >= 80% on FrontierMath Tier 4, how long until an AI publishes a math paper?
40
Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?
4.3
© Manifold Markets, Inc.TermsPrivacy