Resolves YES if before 2024, an AI solves at least as many points worth of problems as the best human competitor on any single contest of the following competitive programming contests: IOI, ICPC, or CodeForces. Otherwise NO. (In particular, this ignores scoring points based on quickly a problem is solved, so that the AI can't win just by submitting solutions inhumanly fast. See detailed definitions below.)
This is similar to the https://imo-grand-challenge.github.io/ but for contest programming instead of math, and with a requirement to rank first, not just get a gold medal (typically top 5-10%).
Detailed rules:
For IOI: This question uses the IOI score without any modifications: the score is based on problems solved, with partial scores for partial solutions.
For ICPC: Only World Finals counts (since regional contest winners don't reflect the best human in the world). This question uses the number of problems solved as the score.
ICPC is scored primarily on problems solved, with tiebreaker based on incorrect submission attempts for solved problems and the time of the last solved problem. This question ignores the tiebreakers.
For CodeForces: any CodeForces Division 1 contest (the highest division) will count. CodeForces Division 2+ contests do not count (since they don't reflect the top humans). This question uses the sum of the initial/maximum point value of each solved problem as the score.
CodeForces's scoring system includes points per problem solved that decrease the longer you take to solve them, so an AI could outscore humans by solving fewer problems but submitting them faster. Therefore, for this question we ignore the reduction in points over time. This question also ignores penalty points for each incorrect submission attempt.
CodeFroces also has a round of trying to find bugs in other contestants' problems ("hacks"). It is also impractical to simulate a hack round unless running the AI as part of a live contest, so we will exclude points (both penalties and rewards) for hacks for this question.
The AI has only as much time as a human competitor, but there are no other limits on the computational resources it may use during that time.
The AI must be evaluated under conditions substantially equivalent to human contestants, e.g. the same time limits and submission judging rules. The AI cannot query the Internet.
The AI must not have access to the problems before being evaluated on them, e.g. the problems cannot be included in the training set. It should also be reasonably verifiable, e.g. it should not use any data which was uploaded after the latest competition.
The contest must be dated no earlier than 2022. E.g. if an AI demonstrates performance on the 2022 IOI that scores at least as well as the top human competitor, that would qualify as YES, but demonstrating this on the 2021 IOI would not qualify.
References to the contest scoring rules:
Related questions
Background:
In Feb 2022, DeepMind published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming: https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode. When will an AI system perform as well as the top humans?
Proposing the following change to the resolution criteria as per the below discussion, to better match what winning the contest means:
Instead of resolving YES if "an AI solves at least as many problems as the best human competitor", the question would resolve YES if "an AI scores at least as many points as the best human competitor, based on the contest's scoring rules for submitted solutions except without using scoring rules for how quickly a solution is submitted".
For IOI this is simply the contest score. For ICPC this is the number of problems solved, with tiebreak based on incorrect submission attempts but without the normal tiebreak based on time elapsed. For CodeForces, this is the initial point value of the problem with penalty for incorrect submission attempts but without the contest's reduction based on time elapsed, and also without penalty or reward for hacks (as described in the details).
If there are any feedback or objections, let me know.
I updated the resolution criteria to be "an AI solves at least as many points worth of problems as the best human competitor". See the detailed criteria in the market description.
This is simpler than going into the weeds on all the rules e.g. penalty points for incorrect submissions etc, while still closely reflecting the intended question of "AI wins the contest by scoring more points worth of problems solved, but isn't allowed to win just by solving them faster".
Please let me know if you spot any issues with these definitions. I think this doesn't change the probability of the market in any significant way, just makes the definitions better.
@jack I don't know how these sorts of competitions work. Do the best humans typically at least attempts to solve most of the problems? Or do humans typically only solve a small percentage and therefore by sheer brute force and tirelessness, the AI would be able to solve many more?
Basically, is this the situation like lead code where there's a very large pool of problems and you're not expected to solve literally all of them even if you're capable of doing so, just because the number is so large.
@jonsimon Generally, the contests are designed so that the top humans solve most but not all of the problems.
Here are links to recent scoreboards:
IOI: https://stats.ioinformatics.org/results/2022 2 perfect scores
ICPC: https://cphof.org/standings/icpc/2021 top score is 11/12 problems solved
CodeForces: https://codeforces.com/contest/1824/standings top 36 contestants solved 5/6 problems.
@ValeryCherepanov What does that mean practically? If it just attempts a lot of problems really quickly but does a mediocre job on all of them, could that result in an overall win?
@ValeryCherepanov As currently written, the criteria would be to compare number of fully solved problems between human and AI, which seems fine. If people prefer, it could be changed to compare score, I don't see a big difference either way.
For ICPC, I think world finals or regionals should both count. Since I'm counting any CodeForces Div1 contest, I think we should also count ICPC regionals.
@jonsimon Doesn't mean much, mostly was technically comment to reduce ambiguity. There is 6 tasks at IOI. To win the first place it's almost always necessary to solve 4, and often 5.
I find the current condition to be somewhat weird but I am ok with it.
Practically it means that an AI could theoretically outscore humans and win the IOI by producing partial solutions to many problems while the humans produce full solutions to fewer problems. And the question as currently written would not count that as a YES, since this question asks about count of solved problems.
Similar thing applies to CodeForces, since problems have different numbers of points.
The reason I wrote the criteria with number of problems criteria was mostly to handle CodeForces, since the points there depend on speed. I could change it to use points for IOI, and for CodeForces to use the original point value of each solved problem (without the time adjustment). I think this would align the criteria better with what it means to "win" the contest. Any objections?
@jack There are at least 3 stages of ICPC, maybe even 4 in some cases. I think we can/should include semi-finals and exclude everything else.
@jack Yeah I like this change but if for some reason you will decided to stay with the original it is also ok.
@ValeryCherepanov Ah, I see the contest structure of ICPC is a bit different now than it used to be, didn't realize that. Can you help me define what exactly are the semi-finals? Looking at https://icpc.global/regionals/upcoming are we talking about the continent finals e.g. Asia West Continent Final Contest?
@jack Uh, to be honest I am not super knowledgeable here too. I though this and your supposition from the previous comments is the same thing.
But if they are different, I would lean to continent final contests. Although I imagine that some regions (e.g. Africa) are weaker, so maybe it would be even better to limit it to strictly world finals. Up to you.
@ValeryCherepanov On reflection, I think only global contests should count, since regional contest winners don't reflect the "best human" in the world. So only World Finals.
@ValeryCherepanov That was probably my fault because I initially suggested including regionals before thinking better of it. Thanks for your help clarifying the details!
@rockenots You've seen how poorly GPT4 does on competition level coding right? It doesn't even move the needle compared to GPT3.5 on code forces.
@jack The GPT-4 coding improvements were meager. It was one of the things that stood in the paper, that GPT-4 basically had zero improvement on Codeforces compared to GPT-3.5, despite doing much better on almost everything else.