Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its description?

510Ṁ3099

resolved Jan 2

Resolved

YES

ALL

https://manifold.markets/PeterHro%C5%A1%C5%A1o/will-ai-outcompete-best-humans-in-c-c91105439712 has the following original description:

"DeepMind has recently published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming. See https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode . Will DeepMind, or anyone else provide evidence in 2023 they can beat the best human competitors?"

The AlphaCode paper testing setup involves submitting to Codeforces (the main competitive programming platform), computing the score that the model would have if it participated in that contest.

DeepMind's evaluation overestimates the performance of AlphaCode because they copy example outputs from human competitors in testing (see page 50 of AlphaCode paper), but this is less likely to matter at "best human competitor" level.

Resolves YES if the original market resolves:

YES, if a reputable research group publishes an evaluation of a model using the same testing protocol (Codeforces contest virtual submission on unseen contests) before 1 Jan 2024, and beating 99.9% competitors in Div1 or Div1+Div2 rounds on average over several contests;
NO, if no one publishes a paper with these results;
N/A, if a reputable research group claims a similar result but there is significant controversy regarding their evaluation setup.

Resolves NO if any of the following happen:

market resolves N/A or YES before anything like the above happens;
market resolves NO but it should resolve YES according to the above criteria.

Resolves N/A if:

the AlphaCode evaluation setup stops being possible; for example, Codeforces goes offline for a long time in late 2023;

most other cases.
Dec 7, 3:02pm: ~~Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve correctly according to its original description?~~ → Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its descripion?
Dec 7, 3:03pm: Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its descripion? → Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its description?

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ189
2		Ṁ187
3		Ṁ140
4		Ṁ93
5		Ṁ64

People are also trading

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

55% chance

Top score on codeforces by an AI model at the end of 2025

Will an AI win Advent of Code? (2025)

59% chance

Will AI be smarter than any one human probably around the end of 2025?

14% chance

Will there be entry-level AI coders by 2026?

61% chance

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

15% chance

Will Anthropic be the best on AI safety among major AI labs at the end of 2025?

93% chance

What will be the best AI performance on Humanity's Last Exam by December 31st 2025?

Which AI will be the best at the end of 2025?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Sort by:

predictedNO

Is there any source for the resolution of this?

predictedYES

@JoeCharlier The original market resolved NO. Is there is any evidence that it was resolved contrary to the criteria in this market description?

predictedNO

@dp My bad; thought I was betting on original market.

I mostly agree with the criteria you propose as one operationalization for the original question, but I think they are far too stringent for asking if it "resolved correctly". I think YES resolution is correct as soon as another contest (say IOI or ICPC) is solved that is very similar to Codeforces, even if Codeforces isn't tested. I believe the original question clearly is intended to allow for other forms of evidence beyond the specific Codeforces benchmark.

If researchers outperform the best humans on IOI/ICPC/etc, in my opinion the question "Will AI outcompete best humans in competitive programming" should resolve YES immediately. As we discussed on the discord, it shouldn't be hard in principle to also run the same AI on Codeforces, but that might happen afterwards, or it might not happen if the model isn't easy for anybody to run and the researchers don't bother to do it. One possible improvement to the criteria here is to allow a period of say months for someone to demonstrate the same feat on Codeforces, although this still isn't perfect.

@jack Sure, I edited the question title to better reflect the description. No one except me bet on this market yet so I think it's fine to iterate on the exact question/description until people bet. Thanks for the feedback.

predictedNO

@jack My interpretation is that we should anchor on the "AlphaCode is as good as the median human competitor" claim, as it's the only sentence in the description of the original market which implies any objective criteria for competitive programming performance.

The problem with IOI as an evaluation benchmark is that it is a high schooler contest. Moreover, since both IOI and the ICPC finals happen once a (non-pandemic 😭 ) year, it's not easy to test on enough unseen problemsets to make sure it's not an accident.
Submitting to Codeforces virtual contests is also much easier than reproducing the IOI evaluation setup.

Another thing is that both IOI and ICPC are no-internet-access contests, which can give the model a large advantage compared to human competitors if the same problem appeared before. Codeforces is open-internet-access, third-party-code-allowed, which makes the comparison between human and AI competitors "fairer" on that platform.

@dp I think one of the worst but also best things about Manifold is that the criteria don't have to be objective. Clearly ambiguity is bad, but on the other hand it's very easy to write objective resolution criteria that don't match the intent of the question. (Some good examples on real-money markets: https://polymarket.com/market/will-volodymyr-zelenskyy-be-the-2022-time-person-of-the-year and https://kalshi.com/events/SDEBT/markets/SDEBT-23JAN01)

The description gives Codeforces as a representative example, and I agree we should anchor on it, but I don't think we shouldn't stick to it exclusively. I largely agree with your points, and it's a good reason to operationalize a question based on Codeforces, but I think the fundamental issue is the question becomes a two-part question: will AI have the technical capability, and will anyone actually run the test on Codeforces. The original question is introducing subjectivity which is bad because some things are ambiguous, but good because it helps remove the less-relevant second part of the question.

"The problem with IOI as an evaluation benchmark is that it is a high schooler contest." IOI is a flagship programming contest and I think the problem difficulty is not too different from non-high-school contests. But it is true that "best high schoolers" isn't quite the same as "best competitive programmers". I'm not sure how big the difference is but I don't think it's huge.

The fact that it only runs once a year is a significant issue, but we can sidestep that issue by allowing AIs to benchmark their performance on past contests, as I did in my question https://manifold.markets/jack/will-an-ai-win-a-gold-medal-on-the-91c577533429 (although this makes some forms of cheating easier).

@dp My interpretation is that since the original market didn't provide narrow objective criteria, it is based on broad subjective criteria. Especially because the original market said "provide evidence", not "provide proof". I.e. if there is any reasonably standard competitive programming task where the AI is shown to outperform the best humans, then the original market should resolve yes.

Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve correctly according to its original description?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition