Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?
Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?
14
220Ṁ2546resolved Jan 1
Resolved
NO1D
1W
1M
ALL
Resolves "Yes" if, at time of closure, there is an entry on the SWE-bench leaderboard (https://www.swebench.com/) with score greater or equal to 90%.
Linked Questions:
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ54 | |
2 | Ṁ32 | |
3 | Ṁ21 | |
4 | Ṁ16 | |
5 | Ṁ9 |
Sort by:
What if there's evidence that the training data is contaminated with the SWE-Bench tasks somehow?
@DavidFWatson That's an excellent question. Let's explore possibilities:
This could be included in the question, i.e. what matters is only the number on the benchmark, regardless of whether it was gamed
I could wait a certain amount of time to check if no controversy emerges. Feels like one month would be safe. The question then resolves yes if one month after the deadline, I judge that there is no consensus that the number was gamed. This makes the question more informative.
Related questions
Related questions
Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?
72% chance
Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?
63% chance
Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?
55% chance
Will an AI score over 80% on FrontierMath Benchmark in 2025
20% chance
Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?
72% chance
Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?
41% chance
By what factor will the cost for SotA SWE-agents drop from 2024 to 2025?
AI resolves at least X% on SWE-bench without any assistance, by 2028?
AI resolves at least X% on SWE-bench WITH assistance, by 2028?
Will any AI solve more than four of AI 2027 Marcus-Brundage tasks in 2025?
28% chance