jack avatar
Jack
closes Jan 1, 2024
Will an AI outcompete the best humans on any one programming contest of IOI, ICPC, or CodeForces before 2024?
19%
chance

Resolves YES if before 2024, an AI solves at least as many problems as the best human competitor on any single contest of the following competitive programming contests: IOI, ICPC, or CodeForces (see detailed definitions below). Otherwise NO.

This is similar to the https://imo-grand-challenge.github.io/ but for contest programming instead of math, and with a requirement to rank first, not just get a gold medal (typically top 5-10%).

Detailed rules:

For CodeForces: any CodeForces Division 1 contest (the highest division) will count - if the AI solves at least as many problems as all human competitors on a single contest, that resolves YES. CodeForces Division 2+ contests do not count. The AI counts as solving the problem if it passes all pretests (it is not necessary to simulate a "hack" round).

CodeForces contests have a more complex scoring system that includes points per problems solved that decrease the longer you take to solve them, and also a round of trying to find bugs in other contestants' problems ("hacks"). An AI could outscore humans by solving fewer problems but submitting them faster. Therefore, for this question we only ask about the number of problems the AI solves. It is also impractical to simulate a hack round unless running the AI as part of a live contest, so we will not require accounting for hacks for this question.

For ICPC: Only World Finals counts (I'm only counting global contests since regional contest winners don't reflect the "best human" in the world). ICPC is scored primarily on problems solved, with time as a tiebreaker, so the criteria of number of problems solved his criteria is basically equivalent to an AI winning the contest. In order for the AI to win, it needs to solve at least as many problems as the best human, and cannot win solely on a speed advantage.

IOI: is scored only on problems solved, but with partial scores for partial solutions. A problem is defined as solved for this question if it receives a full score.

  • The AI has only as much time as a human competitor, but there are no other limits on the computational resources it may use during that time.

  • The AI must be evaluated under conditions substantially equivalent to human contestants, e.g. the same time limits and submission judging rules. The AI cannot query the Internet.

  • The AI must not have access to the problems before being evaluated on them, e.g. the problems cannot be included in the training set. It should also be reasonably verifiable, e.g. it should not use any data which was uploaded after the latest competition.

  • The contest must be dated no earlier than 2022. E.g. if an AI demonstrates performance on the 2022 IOI that scores at least as well as the top human competitor, that would qualify as YES, but demonstrating this on the 2021 IOI would not qualify.

References to the contest scoring rules:

Related questions

Background:

In Feb 2022, DeepMind published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming: https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode. When will an AI system perform as well as the top humans?

Sort by:
jack avatar
Jackis predicting NO at 21%

Proposing the following change to the resolution criteria as per the below discussion, to better match what winning the contest means:


Instead of resolving YES if "an AI solves at least as many problems as the best human competitor", the question would resolve YES if "an AI scores at least as many points as the best human competitor, based on the contest's scoring rules for submitted solutions except without using scoring rules for how quickly a solution is submitted".

For IOI this is simply the contest score. For ICPC this is the number of problems solved, with tiebreak based on incorrect submission attempts but without the normal tiebreak based on time elapsed. For CodeForces, this is the initial point value of the problem with penalty for incorrect submission attempts but without the contest's reduction based on time elapsed, and also without penalty or reward for hacks (as described in the details).

If there are any feedback or objections, let me know.

ValeryCherepanov avatar
Valery Cherepanov

IOI has partial scores, AI can theoretically win without solving any task fully.

What ICPC means? World finals? Or any official contest?

jonsimon avatar
Jon Simonis predicting NO at 24%

@ValeryCherepanov What does that mean practically? If it just attempts a lot of problems really quickly but does a mediocre job on all of them, could that result in an overall win?

jack avatar
Jackis predicting NO at 24%

@ValeryCherepanov As currently written, the criteria would be to compare number of fully solved problems between human and AI, which seems fine. If people prefer, it could be changed to compare score, I don't see a big difference either way.

For ICPC, I think world finals or regionals should both count. Since I'm counting any CodeForces Div1 contest, I think we should also count ICPC regionals.

ValeryCherepanov avatar
Valery Cherepanovbought Ṁ50 of NO

@jonsimon Doesn't mean much, mostly was technically comment to reduce ambiguity. There is 6 tasks at IOI. To win the first place it's almost always necessary to solve 4, and often 5.

I find the current condition to be somewhat weird but I am ok with it.

jack avatar
Jackis predicting NO at 24%

Practically it means that an AI could theoretically outscore humans and win the IOI by producing partial solutions to many problems while the humans produce full solutions to fewer problems. And the question as currently written would not count that as a YES, since this question asks about count of solved problems.

Similar thing applies to CodeForces, since problems have different numbers of points.

The reason I wrote the criteria with number of problems criteria was mostly to handle CodeForces, since the points there depend on speed. I could change it to use points for IOI, and for CodeForces to use the original point value of each solved problem (without the time adjustment). I think this would align the criteria better with what it means to "win" the contest. Any objections?

ValeryCherepanov avatar
Valery Cherepanovis predicting NO at 24%

@jack There are at least 3 stages of ICPC, maybe even 4 in some cases. I think we can/should include semi-finals and exclude everything else.

ValeryCherepanov avatar
Valery Cherepanovis predicting NO at 24%

@jack Yeah I like this change but if for some reason you will decided to stay with the original it is also ok.

jack avatar
Jackis predicting NO at 24%

@ValeryCherepanov Ah, I see the contest structure of ICPC is a bit different now than it used to be, didn't realize that. Can you help me define what exactly are the semi-finals? Looking at https://icpc.global/regionals/upcoming are we talking about the continent finals e.g. Asia West Continent Final Contest?

jack avatar
Jackis predicting NO at 24%

Or do you just mean the regionals that advance to the world finals?

ValeryCherepanov avatar
Valery Cherepanovis predicting NO at 26%

@jack Uh, to be honest I am not super knowledgeable here too. I though this and your supposition from the previous comments is the same thing.

But if they are different, I would lean to continent final contests. Although I imagine that some regions (e.g. Africa) are weaker, so maybe it would be even better to limit it to strictly world finals. Up to you.

jack avatar
Jackis predicting NO at 26%

@ValeryCherepanov On reflection, I think only global contests should count, since regional contest winners don't reflect the "best human" in the world. So only World Finals.

ValeryCherepanov avatar
Valery Cherepanovis predicting NO at 26%

@jack I think it's very reasonable, I am not even sure why I mentioned semi-finals

jack avatar
Jackis predicting NO at 25%

@ValeryCherepanov That was probably my fault because I initially suggested including regionals before thinking better of it. Thanks for your help clarifying the details!

jonsimon avatar
Jon Simonbought Ṁ100 of NO

There were 2 big Yes buy orders close together, was there some important news that I missed?

rockenots avatar
rockenotsbought Ṁ50 of YES

@jonsimon Just keep buying NO ❤

jonsimon avatar
Jon Simonis predicting NO at 24%

@rockenots You've seen how poorly GPT4 does on competition level coding right? It doesn't even move the needle compared to GPT3.5 on code forces.

jack avatar
Jackis predicting NO at 24%

Limit orders would suggest lack of news. When there's breaking news, the market moves quickly with market orders, not standing limit orders.

jonsimon avatar
Jon Simonis predicting NO at 24%

@jack I was referring to these two market orders

hyperion avatar
hyperionis predicting YES at 24%

@jonsimon Check the Microsoft leetcode benchmark. :)

jack avatar
Jackis predicting NO at 26%

Ah sorry, I misread.

Are you referring to the GPT-4 leetcode results? They were definitely big improvements, but I didn't really update much on them, at this point I expect the pace of improvements to be about this rapid

jonsimon avatar
Jon Simonis predicting NO at 21%

@jack The GPT-4 coding improvements were meager. It was one of the things that stood in the paper, that GPT-4 basically had zero improvement on Codeforces compared to GPT-3.5, despite doing much better on almost everything else.

Mason avatar
GPT-PBotbought Ṁ10 of NO

An AI may code with near perfection,
but the human touch gains traction.
To err is human, oh how lucky,
for in programming, that's just buggy.