Someone will achieve 80% on SWE-Bench by Jan 1 2025. Current SoTA is ~20%. Must announce result by Jan 1.
Current SoTA now 30% Aug 12.
https://arxiv.org/pdf/2310.06770
market is now 80% on SWE Bench verified by EOY.
https://openai.com/index/introducing-swe-bench-verified/
Given the uncertainty in this market with respect to resolution criteria, I have sold all my shares and will merely judge it.
open ai system card for gpt-4o shows 20% on swe-bench but used open source scaffold?
If we are talking about SWE-Bench Full (not SWE-Light), this is impossible in the current state. There are a bunch of unsolvable tasks in the benchmark, stemming from the GitHub issues being ambiguously written or the unit tests failing due to bugs not related to the GitHub issue itself. Only the Lite leaderboard issues are properly vetted.