To expand on the title: once an AI beats our current hardest engineering benchmarks, how many years will it be before humans are not hired to do software engineering anymore?
The benchmarks I'll consider for this question are:
SWE-bench Lite: better than 90% resolved.
RE-Bench: mean normalized score >= 1.2*
CodeContests: pass@5 >= 0.9
Once a single model achieves all of these, how many years will it be before "software engineer" (as understood in 2025) is not a job humans get hired for?
Some notes on resolution:
This market is about humans doing the core work of software engineering - opening tickets, pulling a branch, writing new code, testing it, submitting PRs, etc.
If "software engineer" is still a job title but means something different, the market still resolves.
If software engineers stay employed but their work changes the market still resolves - e.g. if all software engineers switch from being ICs to "AI managers" of some sort.
If this happens before the benchmarks are beaten then the market resolves to 0.
If these benchmarks undergo minor variations, I'll allow the market to resolve based on either the original or the variant (e.g. if a question is added to RE-bench, or a different subset of SWE-bench becomes popular).
If there are still some humans doing software engineering here and there the market still resolves - I'm not really interested in whether some random small companies or government departments will refuse to change.
*The paper lists 0.98 as the average score for testers from METR's professional network, which was their best group of testers. They don't give a variance so I'm adding a little for wiggle room but I think this is reasonably close to "peak human".