SOTA on a SWE-bench [Unassisted] in October 2024

930Ṁ30k

resolved Oct 11

100%42%

15-35%

0.2%

90-100%

1.3%

75-90%

1.3%

60-75%

54%

35-60%

0.9%

0-15%

The SWE-bench is a benchmark developed to evaluate if language models can resolve real-world GitHub issues. The leaderboard showcases various models and their performances in terms of the percentage of SWE-bench instances they resolved. Each instance in the SWE-bench represents a GitHub issue. The leaderboard is categorized into two main sections: Unassisted and Assisted.

Unassisted: In this category, models are evaluated without any assistance. This means that they don't have the advantage of the "oracle" retrieval setting where the correct files to edit are directly given to them.

This question is only about the Unassisted category of this benchmark.

http://www.swebench.com/#
Current SOTA is <2%

The prediction market will resolve based on the SWE-bench leaderboard standings as of 11th October 2024.

In the extremely unlikely case that the number would fit in two intervals, the lowest will be chosen.

Technology

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ1,576
2		Ṁ564
3		Ṁ283
4		Ṁ97
5		Ṁ53

13 Comments

24 Holders

115 Trades

Sort by:

There are two days left, but most of the probability is in the 35-60% range. Do people expect a new SOTA within those two days? The current SOTA is still 30% on the full benchmark.

This one counts, so the lower bar is 30% now.

https://cosine.sh/blog/genie-technical-report

@MikhailDoroshenko why does that count given that it wasn't submitted to the official leaderboard or verified?

Also the leaderboard appears to have changed format, which category will you be using to resolve this?

@Fay42 It counts because all their solution are available on GitHub and nobody objected their claim.

https://github.com/CosineAI/experiments/tree/cos/swe-bench-submission/evaluation/test/20230726_cosine_genie

I will be using full dataset, because the question was about full dataset.

@Sss19971997 I am a bit in a pickle. I am happy that you bet on my market, but I am not sure I understand your bets. Like which resource would you name as a reputable source if you believe swebench.com is fake?

opened a Ṁ250 YES at 30% order

@EliLifland @vluzko Wanna bet against me? I put a huge limit order for YES on 0-15%

15% has already been passed https://www.swebench.com/#

opened a Ṁ1,000 YES at 7% order

I put in another 1000 if you want to bet against me

Is 7% a good price for you?

I dont believe the leaderboard. Those were fake

bought Ṁ25 YES

We can also bet here, the two markets are similar

🏅 Top traders

Related questions