Will an AI outcompete the best humans on any one programming contest of IOI, ICPC, or CodeForces before 2024?
77
540
1.4K
resolved Jan 1
Resolved
NO

Resolves YES if before 2024, an AI solves at least as many points worth of problems as the best human competitor on any single contest of the following competitive programming contests: IOI, ICPC, or CodeForces. Otherwise NO. (In particular, this ignores scoring points based on quickly a problem is solved, so that the AI can't win just by submitting solutions inhumanly fast. See detailed definitions below.)

This is similar to the https://imo-grand-challenge.github.io/ but for contest programming instead of math, and with a requirement to rank first, not just get a gold medal (typically top 5-10%).

Detailed rules:

  • For IOI: This question uses the IOI score without any modifications: the score is based on problems solved, with partial scores for partial solutions.

  • For ICPC: Only World Finals counts (since regional contest winners don't reflect the best human in the world). This question uses the number of problems solved as the score.

    • ICPC is scored primarily on problems solved, with tiebreaker based on incorrect submission attempts for solved problems and the time of the last solved problem. This question ignores the tiebreakers.

  • For CodeForces: any CodeForces Division 1 contest (the highest division) will count. CodeForces Division 2+ contests do not count (since they don't reflect the top humans). This question uses the sum of the initial/maximum point value of each solved problem as the score.

    • CodeForces's scoring system includes points per problem solved that decrease the longer you take to solve them, so an AI could outscore humans by solving fewer problems but submitting them faster. Therefore, for this question we ignore the reduction in points over time. This question also ignores penalty points for each incorrect submission attempt.

    • CodeFroces also has a round of trying to find bugs in other contestants' problems ("hacks"). It is also impractical to simulate a hack round unless running the AI as part of a live contest, so we will exclude points (both penalties and rewards) for hacks for this question.

  • The AI has only as much time as a human competitor, but there are no other limits on the computational resources it may use during that time.

  • The AI must be evaluated under conditions substantially equivalent to human contestants, e.g. the same time limits and submission judging rules. The AI cannot query the Internet.

  • The AI must not have access to the problems before being evaluated on them, e.g. the problems cannot be included in the training set. It should also be reasonably verifiable, e.g. it should not use any data which was uploaded after the latest competition.

  • The contest must be dated no earlier than 2022. E.g. if an AI demonstrates performance on the 2022 IOI that scores at least as well as the top human competitor, that would qualify as YES, but demonstrating this on the 2021 IOI would not qualify.

References to the contest scoring rules:

Related questions

Background:

In Feb 2022, DeepMind published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming: https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode. When will an AI system perform as well as the top humans?

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ572
2Ṁ444
3Ṁ330
4Ṁ140
5Ṁ46
Sort by:
predicted NO

Proposing the following change to the resolution criteria as per the below discussion, to better match what winning the contest means:


Instead of resolving YES if "an AI solves at least as many problems as the best human competitor", the question would resolve YES if "an AI scores at least as many points as the best human competitor, based on the contest's scoring rules for submitted solutions except without using scoring rules for how quickly a solution is submitted".

For IOI this is simply the contest score. For ICPC this is the number of problems solved, with tiebreak based on incorrect submission attempts but without the normal tiebreak based on time elapsed. For CodeForces, this is the initial point value of the problem with penalty for incorrect submission attempts but without the contest's reduction based on time elapsed, and also without penalty or reward for hacks (as described in the details).

If there are any feedback or objections, let me know.

predicted NO

I updated the resolution criteria to be "an AI solves at least as many points worth of problems as the best human competitor". See the detailed criteria in the market description.

This is simpler than going into the weeds on all the rules e.g. penalty points for incorrect submissions etc, while still closely reflecting the intended question of "AI wins the contest by scoring more points worth of problems solved, but isn't allowed to win just by solving them faster".

Please let me know if you spot any issues with these definitions. I think this doesn't change the probability of the market in any significant way, just makes the definitions better.

predicted NO

@jack I don't know how these sorts of competitions work. Do the best humans typically at least attempts to solve most of the problems? Or do humans typically only solve a small percentage and therefore by sheer brute force and tirelessness, the AI would be able to solve many more?

Basically, is this the situation like lead code where there's a very large pool of problems and you're not expected to solve literally all of them even if you're capable of doing so, just because the number is so large.

predicted NO

@jonsimon Generally, the contests are designed so that the top humans solve most but not all of the problems.

Here are links to recent scoreboards:

IOI: https://stats.ioinformatics.org/results/2022 2 perfect scores

ICPC: https://cphof.org/standings/icpc/2021 top score is 11/12 problems solved

CodeForces: https://codeforces.com/contest/1824/standings top 36 contestants solved 5/6 problems.

predicted NO

You get a small set of problems to work on in the contest, and generally the top competitors look at all the problems.

IOI has partial scores, AI can theoretically win without solving any task fully.

What ICPC means? World finals? Or any official contest?

predicted NO

@ValeryCherepanov What does that mean practically? If it just attempts a lot of problems really quickly but does a mediocre job on all of them, could that result in an overall win?

predicted NO

@ValeryCherepanov As currently written, the criteria would be to compare number of fully solved problems between human and AI, which seems fine. If people prefer, it could be changed to compare score, I don't see a big difference either way.

For ICPC, I think world finals or regionals should both count. Since I'm counting any CodeForces Div1 contest, I think we should also count ICPC regionals.

bought Ṁ50 of NO

@jonsimon Doesn't mean much, mostly was technically comment to reduce ambiguity. There is 6 tasks at IOI. To win the first place it's almost always necessary to solve 4, and often 5.

I find the current condition to be somewhat weird but I am ok with it.

predicted NO

Practically it means that an AI could theoretically outscore humans and win the IOI by producing partial solutions to many problems while the humans produce full solutions to fewer problems. And the question as currently written would not count that as a YES, since this question asks about count of solved problems.

Similar thing applies to CodeForces, since problems have different numbers of points.

The reason I wrote the criteria with number of problems criteria was mostly to handle CodeForces, since the points there depend on speed. I could change it to use points for IOI, and for CodeForces to use the original point value of each solved problem (without the time adjustment). I think this would align the criteria better with what it means to "win" the contest. Any objections?

predicted NO

@jack There are at least 3 stages of ICPC, maybe even 4 in some cases. I think we can/should include semi-finals and exclude everything else.

predicted NO

@jack Yeah I like this change but if for some reason you will decided to stay with the original it is also ok.

predicted NO

@ValeryCherepanov Ah, I see the contest structure of ICPC is a bit different now than it used to be, didn't realize that. Can you help me define what exactly are the semi-finals? Looking at https://icpc.global/regionals/upcoming are we talking about the continent finals e.g. Asia West Continent Final Contest?

predicted NO

Or do you just mean the regionals that advance to the world finals?

predicted NO

@jack Uh, to be honest I am not super knowledgeable here too. I though this and your supposition from the previous comments is the same thing.

But if they are different, I would lean to continent final contests. Although I imagine that some regions (e.g. Africa) are weaker, so maybe it would be even better to limit it to strictly world finals. Up to you.

predicted NO

@ValeryCherepanov On reflection, I think only global contests should count, since regional contest winners don't reflect the "best human" in the world. So only World Finals.

predicted NO

@jack I think it's very reasonable, I am not even sure why I mentioned semi-finals

predicted NO

@ValeryCherepanov That was probably my fault because I initially suggested including regionals before thinking better of it. Thanks for your help clarifying the details!

bought Ṁ100 of NO

There were 2 big Yes buy orders close together, was there some important news that I missed?

bought Ṁ50 of YES

@jonsimon Just keep buying NO ❤

predicted NO

@rockenots You've seen how poorly GPT4 does on competition level coding right? It doesn't even move the needle compared to GPT3.5 on code forces.

predicted NO

Limit orders would suggest lack of news. When there's breaking news, the market moves quickly with market orders, not standing limit orders.

predicted NO

@jack I was referring to these two market orders

predicted YES

@jonsimon Check the Microsoft leetcode benchmark. :)

predicted NO

Ah sorry, I misread.

Are you referring to the GPT-4 leetcode results? They were definitely big improvements, but I didn't really update much on them, at this point I expect the pace of improvements to be about this rapid

predicted NO

@jack The GPT-4 coding improvements were meager. It was one of the things that stood in the paper, that GPT-4 basically had zero improvement on Codeforces compared to GPT-3.5, despite doing much better on almost everything else.

More related questions