When will an AI model be better than me at competitive programming?
68
695
2.2k
2025
0.7%
Q4 2023
2%
Q1 2024
9%
Q2 2024
40%
Q3 2024
33%
Q4 2024
2%
Q1 2025
1.7%
Q2 2025
1.3%
Q3 2025
10%
Later

Resolution criteria: when an AI model does contests on codeforces and gets a rating higher than my all time high, and it stays that way for more than a month this market resolves 100% to the quarter it got that rank. (so, if a model comes out 20 december, and does a lot of contests 23, 24, 25 and 26 december 2023, and by dec 27 has higher rating than my ATH, and then its rating stays above mine through jan 27 2024, the market resolves Q4 2023)

Context:

Alpha code 2 was released

https://news.ycombinator.com/item?id=38544935

https://codeforces.com/profile/AdamantChicken2

https://codeforces.com/blog/entry/99566

I'm currently ~1700 rated on codeforces. Which should be slightly better than what AlphaCode 2 does. According to the report it is better than 85% of participants And from what I gather 1700 is better than 89.7% of participants. The resolution criteria means I can panic and do a lot of contests if results come out that imply resolution, and then maybe do better, to push out the date at which the market resolves. But obviously, if a model comes out that is grandmaster level, that would be nigh impossible to do.

Get Ṁ600 play money
Sort by:

I'm currently 1900, so I am quite confident. alphacode2 couldn't cause yes resolution.

Edit: Also updated what I realized might've been an ambiguity in the resolution criteria. Resolution is when models are released / published. Like if someone got access to alphacode2 now, and did tests satisfying the criteria in the description, it'd be Q3 2023 resolution still, even though we're in Q2 2024.

bought Ṁ40 Later NO

Have anyone seen Devin used on CF like problems before?

sold Ṁ34 of Later YES

Why so much on later? I think if we project alphacode 2 to alphacode 3, alphacode 3 would be very difficult for me to beat. Even alphacode2, if the results in the technical report were less than it would get on the benchmark in the description, could resolve this market Q12024.

More related questions