Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its description?
31
191
510
resolved Jan 2
Resolved
YES

https://manifold.markets/PeterHro%C5%A1%C5%A1o/will-ai-outcompete-best-humans-in-c-c91105439712 has the following original description:

"DeepMind has recently published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming. See https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode . Will DeepMind, or anyone else provide evidence in 2023 they can beat the best human competitors?"

The AlphaCode paper testing setup involves submitting to Codeforces (the main competitive programming platform), computing the score that the model would have if it participated in that contest.

DeepMind's evaluation overestimates the performance of AlphaCode because they copy example outputs from human competitors in testing (see page 50 of AlphaCode paper), but this is less likely to matter at "best human competitor" level.

Resolves YES if the original market resolves:

  • YES, if a reputable research group publishes an evaluation of a model using the same testing protocol (Codeforces contest virtual submission on unseen contests) before 1 Jan 2024, and beating 99.9% competitors in Div1 or Div1+Div2 rounds on average over several contests;

  • NO, if no one publishes a paper with these results;

  • N/A, if a reputable research group claims a similar result but there is significant controversy regarding their evaluation setup.

Resolves NO if any of the following happen:

  • market resolves N/A or YES before anything like the above happens;

  • market resolves NO but it should resolve YES according to the above criteria.

Resolves N/A if:

  • the AlphaCode evaluation setup stops being possible; for example, Codeforces goes offline for a long time in late 2023;

  • most other cases.

    Dec 7, 3:02pm: Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve correctly according to its original description? → Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its descripion?

    Dec 7, 3:03pm: Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its descripion? → Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve consistently with "AlphaCode AI is as good as the median human competitor in competitive programming" in its description?

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ189
2Ṁ187
3Ṁ140
4Ṁ93
5Ṁ64
Sort by:
predicted NO

Is there any source for the resolution of this?

predicted YES

@JoeCharlier The original market resolved NO. Is there is any evidence that it was resolved contrary to the criteria in this market description?

predicted NO

@dp My bad; thought I was betting on original market.

I mostly agree with the criteria you propose as one operationalization for the original question, but I think they are far too stringent for asking if it "resolved correctly". I think YES resolution is correct as soon as another contest (say IOI or ICPC) is solved that is very similar to Codeforces, even if Codeforces isn't tested. I believe the original question clearly is intended to allow for other forms of evidence beyond the specific Codeforces benchmark.

If researchers outperform the best humans on IOI/ICPC/etc, in my opinion the question "Will AI outcompete best humans in competitive programming" should resolve YES immediately. As we discussed on the discord, it shouldn't be hard in principle to also run the same AI on Codeforces, but that might happen afterwards, or it might not happen if the model isn't easy for anybody to run and the researchers don't bother to do it. One possible improvement to the criteria here is to allow a period of say months for someone to demonstrate the same feat on Codeforces, although this still isn't perfect.

bought Ṁ30 of NO

@jack Sure, I edited the question title to better reflect the description. No one except me bet on this market yet so I think it's fine to iterate on the exact question/description until people bet. Thanks for the feedback.

predicted NO

@jack My interpretation is that we should anchor on the "AlphaCode is as good as the median human competitor" claim, as it's the only sentence in the description of the original market which implies any objective criteria for competitive programming performance.


The problem with IOI as an evaluation benchmark is that it is a high schooler contest. Moreover, since both IOI and the ICPC finals happen once a (non-pandemic 😭 ) year, it's not easy to test on enough unseen problemsets to make sure it's not an accident.
Submitting to Codeforces virtual contests is also much easier than reproducing the IOI evaluation setup.

Another thing is that both IOI and ICPC are no-internet-access contests, which can give the model a large advantage compared to human competitors if the same problem appeared before. Codeforces is open-internet-access, third-party-code-allowed, which makes the comparison between human and AI competitors "fairer" on that platform.

@dp I think one of the worst but also best things about Manifold is that the criteria don't have to be objective. Clearly ambiguity is bad, but on the other hand it's very easy to write objective resolution criteria that don't match the intent of the question. (Some good examples on real-money markets: https://polymarket.com/market/will-volodymyr-zelenskyy-be-the-2022-time-person-of-the-year and https://kalshi.com/events/SDEBT/markets/SDEBT-23JAN01)

The description gives Codeforces as a representative example, and I agree we should anchor on it, but I don't think we shouldn't stick to it exclusively. I largely agree with your points, and it's a good reason to operationalize a question based on Codeforces, but I think the fundamental issue is the question becomes a two-part question: will AI have the technical capability, and will anyone actually run the test on Codeforces. The original question is introducing subjectivity which is bad because some things are ambiguous, but good because it helps remove the less-relevant second part of the question.

"The problem with IOI as an evaluation benchmark is that it is a high schooler contest." IOI is a flagship programming contest and I think the problem difficulty is not too different from non-high-school contests. But it is true that "best high schoolers" isn't quite the same as "best competitive programmers". I'm not sure how big the difference is but I don't think it's huge.

The fact that it only runs once a year is a significant issue, but we can sidestep that issue by allowing AIs to benchmark their performance on past contests, as I did in my question https://manifold.markets/jack/will-an-ai-win-a-gold-medal-on-the-91c577533429 (although this makes some forms of cheating easier).

@dp My interpretation is that since the original market didn't provide narrow objective criteria, it is based on broad subjective criteria. Especially because the original market said "provide evidence", not "provide proof". I.e. if there is any reasonably standard competitive programming task where the AI is shown to outperform the best humans, then the original market should resolve yes.

Will "Will AI outcompete best humans in competitive programming before the end of 2023?" resolve correctly according to its original description?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition