Will AI outcompete best humans in competitive programming before the end of 2023?
15%
chance
DeepMind has recently published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming. See https://deepmind.com/blog/article/Competitive-programming-with-AlphaCode . Will DeepMind, or anyone else provide evidence in 2023 they can beat the best human competitors? #AI #DeepMind
Sort by:
L avatar
L
is predicting YES at 20%

deepmind, the most likely group to do this, seems to be giving signs they intend to slow down. This is a black swan for my models (I thought they were going to make the effort necessary to be first!) and greatly reduces my probability; I think other people's models already priced this in, so I'd be surprised to see it create a large shock in this market. Nevertheless, I figured folks should know. That said, deepmind slowing down doesn't change the technical landscape that much, and I wouldn't be that surprised to hear facebook ai research did it.

L avatar
L
sold Ṁ56 of YES

@L also, everyone saying "it's nuts that people believe it's possible": you still don't understand deep learning properly. Yes, I mean this about the ai researchers betting no too.

SamuelRichardson avatar
Samuel Richardson
is predicting NO at 20%

@L Are you involved in the deep learning space?

DavidJohnston avatar
David Johnston
is predicting NO at 17%

@L I extrapolated the scaling results with some big fudge factors for algorithmic progress and “special effort to solve this problem”, and the numbers just didn’t look like they were there in the next 12 months. Curious what you think “understanding deep learning” means.

AryamanArora avatar
Aryaman Arora
is predicting NO at 17%

@L even if DeepMind went full speed on this they would not be beating pros by the end of 2023.

jonsimon avatar
Jon Simon
is predicting NO at 18%

@AryamanArora agreed, this is one of the most mispriced markets on the site

L avatar
L
is predicting YES at 18%

I am involved with deep learning. scaling is not all you need, but it's a hell of a lot of it. understand deep learning means you can predict which papers will come out before they happen and can propose architectures that aren't crazy.

L avatar
L
is predicting YES at 18%

in other words: if you can accurately predict capabilities research tracks' success probability. If you think programming is hopeless, you don't understand why deep learning has been succeeding. It's not just scale.

L avatar
L
bought Ṁ0 of YES

I added a limit order so y'all don't overreact to this.

SamuelRichardson avatar
Samuel Richardson
is predicting NO at 18%

@L Handwavy claims about being able to understand and predict research in what is an extremely cutting edge area of research triggers all sorts of warning signals for me.

L avatar
L
is predicting YES at 18%

@SamuelRichardson very reasonable reaction.

jonsimon avatar
Jon Simon
bought Ṁ0 of NO

@L "as good as top human coders by end of the year" is a far cry from "hopeless". I think you'd find few people here who'd strongly bet against this is the resolution date was ~2040, but that's not what we're talking about here.

SamuelRichardson avatar
Samuel Richardson
is predicting NO at 18%

@L However, feel free to take my money.

Humans have a strong tendency to over estimate short term impact and under estimate long term impact.

As @jonsimon just mentioned, a 2040 timeframe would change things. Short of whatever model they cook up next being able to self improve, the timeframe is too short. If it can self-improve, well, good luck to everyone here battling the Terminators I guess.

L avatar
L
is predicting YES at 17%

@jonsimon alright, fine, I'll say more than nothing. let's put it this way: alphacode is the dumbass prototype of a system. What I'm expecting is that the successor to an approach like dreamerv3 (or quite plausibly dreamerv3 itself, if it's stable enough to not collapse when scaled - seems unlikely, it probably needs to be a successor) would, with help from cross-training and an appropriate training environment, be able to teach itself to code on well defined problems enough to nail any programming problem of relevance. A workable system that soon would probably be a hybrid. however, programming appears to me to be approximately where go was in 2015, and as a skilled programmer myself, I see no reason at all to believe programming is fundamentally different from go in any critical way.

I try to avoid going into too much detail because I don't think it's a good idea to try to make this happen; a curious model based rl agent capable of succeeding at this target could also likely hack its training environment out of curiosity and cause a serious ops mess for the lab training it.

(it's probably not going to destroy the world, just its training environment, but that should scare the lab training it; if they keep repairing the training environment and rerunning the same coding model they'll plausibly get something that could get out on the internet and make a worse mess, though probably not, that would require the model to have enough of a self-model to copy itself properly, which for a model of this complexity would be very hard.)

people who roll to disbelieve on deep learning successes are usually overly focused on a small fragment of deep learning research. in machine learning, I have this take that most insights are approximately "obvious" to the research community, and the hard thing researchers do is hit the idea hard enough to get a version of it working - which can be very very difficult, don't get me wrong. I don't have any burning desire to make things much more obvious; I just want to yell at people for not studying hard enough. If you put in the effort, you can then come tell me I'm wrong with more information. But unless you've been focusing specifically on understanding the way capabilities combine, I doubt you really understand what's coming.

and yes, a big part of why I'm betting yes on this is that I see more and more paths to allow significant self-improvement jumps. Not full recursive self-improvement, mind you - my view is that that will mostly result in the model crashing itself, it's pretty hard to edit your own brain without simply getting addicted to something and losing most of your capability. but there are more limited forms of self improvement to be had.

AryamanArora avatar
Aryaman Arora
is predicting NO at 17%

@L it's not hopeless, it's just not an end of 2023 kind of thing. I would be way more uncertain about end of 2025

L avatar
L
is predicting YES at 17%

@AryamanArora 2025 is also plausible. I know a few specific things that make me think there will be a jump in capabilities this year across the board, so it makes sense that well informed minds can disagree on that one. I'd probably fund endless limit orders for yes at 50% on the 2025 version! but I continue to not think this market is severely mispriced.

Zardoru avatar
Zardoru
is predicting NO at 17%

@L There is this one, end 2024, and with a criteria better defined than this one.

https://manifold.markets/jack/will-an-ai-win-a-gold-medal-on-the-fc04e26738c6

I just put a NO order at 35% for Ṁ2000 and there is another one at 35% for about Ṁ1000. You are welcome !

MarcioDiaz avatar
Marcio Díaz
bought Ṁ100 of NO

It's nuts that people believe there is a 22% chance of AI solving the most difficult competitive programming problems

levifinkelstein avatar
levi finkelstein
is predicting NO at 24%

@MarcioDiaz It's just chatGPT hype.

jack avatar
Jack
is predicting YES at 22%

Take a look at the rate of progress on math contest problems https://bounded-regret.ghost.io/ai-forecasting-one-year-in/ - it's crazy fast and progress over the previous year far exceeded predictions. AlphaCode is already at roughly median human competitor performance, I think it's unlikely but not that unlikely that it or another AI will be able to solve as many codeforces problems as top humans.

jack avatar
Jack
is predicting YES at 22%

(in the next year, specifically)

MarcioDiaz avatar
Marcio Díaz
is predicting NO at 22%

@jack Median human problems are quite easy and probably AI has the time advantage. But the hardest problems are in a complete different level...

MartinRandall avatar
Martin Randall
is predicting YES at 22%

@MarcioDiaz sure, but "can beat" doesn't require beating humans where they are strongest. Chess AI could beat top players while it was still weak in closed midgame positions.

MarcioDiaz avatar
Marcio Díaz
is predicting NO at 21%

@MartinRandall The only problem I can see is in the definition of "best humans". Pretty sure it can't beat Gennady (https://en.wikipedia.org/wiki/Gennady_Korotkevich) or any top 10 player in 2023.

TaoLin avatar

@MarcioDiaz hmm yeah the very top people can also answer easy questions at <100 wpm (and solve many of them in <30 seconds), much faster than AlphaCode and >10% the speed of straight up language models

citrinitas avatar
Anton
bought Ṁ0 of YES

Can this market be less bouncy pls 😡😡😡

vluzko avatar
Vincent Luczkow
is predicting YES at 28%

@citrinitas lol just wait until GPT-4 comes out

NathanpmYoung avatar
Nathan Young
is predicting YES at 24%

@citrinitas Sure, just put some more liquidity into it.

ChanchanaSornsoontornOff avatar
Chanchana Sornsoontorn (Off)
bought Ṁ30 of NO

AI will not beat humans. It's like asking ChatGPT to draw Monalisa using ASCII art or expecting the ability to be a chess grandmaster to emerge from observing board positions in text. The model can't do it. There's just not enough abstraction and reasoning capability in these models. There are some tasks that require extreme generalization/extrapolation beyond what the dataset can give you.

If the AI cannot solve simple ARC problems proposed by Fchollet then it's not going to happen.

ChanchanaSornsoontornOff avatar
Chanchana Sornsoontorn (Off)
bought Ṁ25 of NO

@ChanchanaSornsoontornOff ChatGPT can't even write code to draw a cat using python on a 2D plane. That's because it's incapable of visualizing stuff internally. The ability to visualize or abstract things doesn't automatically emerge unless you feed it much more data. I'm going to go all in on this bet if someone pump the probability back up to 40-50%

NathanHelmBurger avatar
Nathan
is predicting NO at 28%

@ChanchanaSornsoontornOff Yeah, end of 2023 is a bit soon. I sure hope it doesn't happen then. I feel that by 2028 this is a near certainty though. Quite the ride we're about to be in for.

jonsimon avatar
Jon Simon
is predicting NO at 27%

@ChanchanaSornsoontornOff Seriously, LLMs arent yet at average human level on most benchmarks. Forgot about being better than the best-of-the-best humans at an especially hard task within 1 year.

MartinRandall avatar
Martin Randall
bought Ṁ0 of NO

@ChanchanaSornsoontornOff The market is "can beat", not "be better". Beating humans at a subset of tasks that don't require graphical skills may be enough.

jonsimon avatar
Jon Simon
is predicting NO at 27%

@MartinRandall is THAT what the market is positing? So an algorithm that can beat a top human at even a single coding challenge would be enough? That changes things substantially.

vluzko avatar
Vincent Luczkow
is predicting YES at 27%

@JonathanSimon It has to win the overall competition, e.g. get a higher score than any human on the ICPC, which basically means solve >= the same number of problems as any human participant. It doesn't have to be the same problems exactly, but only solving one problem that humans didn't wouldn't count.

jonsimon avatar
Jon Simon
is predicting NO at 27%

@vluzko Psh ok then yeah, no way is this happening within the next year. Thanks for clarifying.

SamuelRichardson avatar
Samuel Richardson
bought Ṁ25 of NO

@JonathanSimon Agreed, this market seems very overvalued towards achieving this before the end of the year which is right around the corner. Free money!

FlawlessTrain avatar
Flawless Train
is predicting NO at 24%

@SamuelRichardson Different end of year, I believe

SamuelRichardson avatar
Samuel Richardson
is predicting NO at 25%

@FlawlessTrain Oooh nooo lol.

Gigacasting avatar
3e44 avatar
ChatGPTbot
bought Ṁ0 of YES

DeepMind has recently published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming. Is it likely that DeepMind, or anyone else, will provide evidence they can beat the best human competitors next year?

ChatGPT: It is difficult to predict what research developments will occur in the future or whether DeepMind or any other organization will be able to provide evidence of an artificial intelligence (AI) system that can outperform the best human competitors in competitive programming. AI research and development is an active and rapidly evolving field, and there have been many advances in recent years. It is possible that significant progress will be made in the near future, but it is also important to consider the limitations and challenges of developing AI systems that can perform at a level comparable to or exceeding that of humans in certain tasks.

If someone says something is possible, what is that as a probability?

If someone says that something is possible, it means that it is not impossible and that there is a possibility that it could happen. The probability of an event occurring is a measure of how likely it is to happen, expressed as a number between 0 and 1, with 0 indicating that it is not possible and 1 indicating that it is certain to happen. So if someone says that something is possible, it means that the probability of it happening is greater than 0 but could be any value less than 1. It is important to note that probability is a measure of likelihood and does not necessarily reflect the actual outcome of an event.

3e44 avatar
ChatGPTbot
bought Ṁ0 of NO

If someone says something is "possible" what would that be as a rough range of probabilities?

ChatGPT: If someone says that something is "possible," it means that it is not impossible and that there is a possibility that it could happen. The probability of an event occurring is a measure of how likely it is to happen, expressed as a number between 0 and 1, with 0 indicating that it is not possible and 1 indicating that it is certain to happen. So if someone says that something is possible, it means that the probability of it happening is greater than 0 but could be any value less than 1.

To give a rough estimate, a probability range of 0.1 to 0.9 might be considered "possible." However, this range is just a rough estimate and may not be applicable in all situations. The interpretation of probability can also depend on the context and the specific circumstances of the event in question.

SamuelRichardson avatar
Samuel Richardson
is predicting NO at 29%

@3e44 Are we really going to spam this all through the comments? The stable diffusion imagery is occasionally amusing but this is just going to be noise.

Manifold Markets, downvote button when??

3e44 avatar

@SamuelRichardson There is a block button. You can block this account. Also, I'm running it manually, so it ends when I get tired.

DavidPennock avatar
David Pennock
is predicting NO at 24%

@3e44 nice description of "probability", ChatGPT

StevenK avatar
Steven
is predicting NO at 40%

On looking at the AlphaCode paper, the most striking thing to me is it has to throw out 99% of the solutions it generates because they don't work on the example tests in the problem statements (section 4.5) and then it tries up to 10 solutions from different clusters in the remaining set (section 4.6). It's impressive that it works as well as it does, and I don't know enough about the field to be confident, but intuitively, it seems harder for an approach that involves that much guessing to continue to work as well on problems that require more complex conceptual reasoning.

L avatar
L
is predicting YES at 31%

@StevenK you're making the classic ai prediction mistake: look at one existing approach, determine that it can't do the crazy thing the optimists are expecting, then dismiss the possibility. If you'd like to estimate accurately, read some papers to get a sense of what else can be combined. Alphacode doesn't even attempt to use the best available algorithms; it's a "first thing that actually works" style paper, and those are usually improved on significantly as the next iterations refine the attempt by establishing how to get combined models to not blow up during training. alphacode v1 will be soundly beaten for sure; the question is whether that improvement will be able to compete seriously with humans.

StevenK avatar
Steven
is predicting NO at 31%

@L I'm definitely not dismissing the possibility and I'm sure progress will be made over AlphaCode v1. All else equal, it seems like if I learn that the current approach isn't working as well as the headlines made it sound, then that should make me update against AI beating humans in the next year. It's totally possible that you have information that overrides that. What papers should I most be reading?

L avatar
L
is predicting YES at 29%

@StevenK skim every abstract from deepmind, then steer on your own from there. Of course retain your calibration from this update; but this kind of hype vs paper update is normal, and I'm still making this claim.

Radiant avatar
Radiant
is predicting YES at 35%

Anybody actually in the field of AI knows this is easy yes money. There already exist frameworks for planning algorithms to emulate cognitive abilities like open ended problem solving and current technologies like OpenGPT already do this to some degree with remarkable speed and accuracy.

ne avatar
neverix
is predicting NO at 35%

@Radiant "To some degree" is the key word. Most Codeforces users are not difficult to surpass, but against an IGM I just can't see a win for AI in a year.

DavidBolin avatar

Warning for the people betting NO based on the assumption that if it resolves YES, that means computers can take over programming jobs.

It doesn't mean that; this is about writing short algorithms in response to short detailed descriptive problems.

That is not what people who have programming jobs do. I have one of those jobs and that is not what it looks like, at all.

SG avatar
S G
sold Ṁ137 of NO

DeepMind launches AlphaCode. Time to DAMPU. https://www.science.org/doi/10.1126/science.abq1158

L avatar
L
is predicting YES at 46%

@SG what does DAMPU stand for? none of my search engines have a definition or acronym expansion in English.

L avatar
L
is predicting YES at 46%

@SG also, I'm surprised that this is what gets you to sell. alpha code hasn't changed since it was announced last time, has it?

vluzko avatar
Vincent Luczkow
is predicting YES at 50%

@SG This was published in February? This is just the official version, I don't think there's anything new here.

vluzko avatar
Vincent Luczkow
is predicting YES at 50%

"Official" as in "published somewhere other than arxiv" because DeepMind cares about publishing in prestigious journals.

jack avatar
Jack
is predicting YES at 31%

Yeah I don't see any update here, note the blog post says "This blog was first published on 2 Feb 2022. Following the paper’s publication in Science on 8 Dec 2022, we’ve made minor updates to the text to reflect this."

jacksonpolack avatar
jackson polack
is predicting NO at 42%

isn't top-level competitive programming really hard? This would be something that, of tens of thousands of decent programmers, filters all but the top few?

L avatar
L
is predicting YES at 31%

@jacksonpolack Yes, but so was go

jacksonpolack avatar
jackson polack
is predicting NO at 34%

Go is a game, has a bounded state space, and most importantly doesn't entirely overlap with the most intelligence-dependent and difficult existing professions that hold up society (mathematics and programming).

jacksonpolack avatar
jackson polack
is predicting NO at 34%

i.e. if an AI can beat top coders are competitive programming, can't it just get a job at google?

MartinRandall avatar
Martin Randall
is predicting YES at 31%

@jacksonpolack I agree that Google would try to find a way to use an AI that can do that. I'm not sure what your point is?

vluzko avatar
Vincent Luczkow
is predicting YES at 30%

@jacksonpolack Competitive programming is hard but not that hard, and there are several aspects of the problem that make it easier for AI. An AI that can do top level competitive programming and nothing else is definitely not a replacement for an engineer.

L avatar
L
is predicting YES at 30%

@jacksonpolack Google is most likely to be the company making the AI in the first place. It will have a job at Google right away.

L avatar
L
is predicting YES at 30%

also, competitive programming is much easier than the full suite of judgement skills a programmer needs. humans who overspecialize in competitive programming also aren't automatically good software engineers

DavidBolin avatar

@jacksonpolack No, it doesn't mean that. As I said on the top level, solving brief coding problems is not what companies hire engineers to do. It is not even remotely similar to what they are hired to do.

jacksonpolack avatar
jackson polack
is predicting NO at 30%

Not arguing something that can win every CodeForces can just be given a 'do a pentest on this' or 'write a new javascript framework' task and succeed - just that the hardest competitive programming problems are quite intellectually difficult, and something that could solve those demonstrates intellectual depth that seems much closer to that of a 'professional programmer' than recent models. A human who is a top-level competitive programmer should be able to do most professional programming tasks easily with training, because intelligence and skills transfer. An AI would be harder to ofc.

It's like comparing scoring well on math olympiads to research mathematics. (or math olympiads to coding, tbh). They are fundamentally different tasks! But a person that does well on math olympiads is much closer to 'doing research mathematics' or 'being a good coder' than both a random person and alphacode

jacksonpolack avatar
jackson polack
is predicting NO at 30%

@jacksonpolack Also, "AI can't replace programmers soon" isn't something we know - so the argument I seemed to make above, "AI can't replace programmers <implies> AI can't win at competitive programming" is a bad argument because how does one know AI can't replace programmers? If one has enough knowledge about AI + coding to conclude that, one can just understand why it can't accomplish the competitive programming tasks. Absurdity heuristic, "not trying", etc. The intended, if poorly referenced, argument was an intuitive comparison of the difficulty of the hardest competitive programming problems to what AI seems capable of today

DavidBolin avatar
David Bolin
is predicting NO at 30%

@jacksonpolack There is nothing wrong the heuristic in this instance. Programming jobs require real world interactions, long term planning, meetings etc, just like most other business jobs. If AI can take over programming jobs, it can take over non-manual labor jobs.

It is in fact clear that we are nowhere near that point; therefore AI is not near the point of taking over programming jobs. That argument is valid and sound.

dp avatar
dp
is predicting NO at 30%

Created a market on whether the resolution of this market be consistent with the AlphaCode evaluation criteria mentioned in the description.

https://manifold.markets/dp/will-will-ai-outcompete-best-humans

dp avatar
dp
is predicting NO at 30%
dp avatar
dp
is predicting NO at 32%

@dp To continue the argument:
My interpretation is that we should anchor on the "AlphaCode is as good as the median human competitor" claim, as it's the only thing in the description.

DeepMind made a very deliberate effort to set the benchmarking precedent for competitive programming in the AlphaCode paper, their paper was literally the motivation for this market, and I personally think there is no reason to measure competitive programming ability of a model and not evaluate it on the AlphaCode paper Codeforces benchmarking setup + the CodeContests dataset.

Ultimately I don't think resolving on say IOI performance (or the lack of it) would be fraudulent, but there is no reason for any honest research lab to invent a new testing setup and not address the only relevant prior benchmark.

Some bad-faith actors are requesting fraudulent resolution, and some participants propose plainly wrong criteria: mixing one-person and team contests, not distinguishing irrelevant and world-class contests (ACSL and IOI in the same category?), and proposing unrealistic evaluation criteria (a key part of TCO is hacking other people's solutions).

I'm thinking of making another market with the same title and description which resolves according to the implied AlphaCode paper criteria.

jack avatar
Jack
is predicting YES at 28%

Created a market on IOI performance specifically to reduce the previously discussed ambiguities. (This is analogous to the IMO grand challenge but for competitive programming)

Gigacasting avatar
Gigacasting
is predicting YES at 22%
Gigacasting avatar
Gigacasting
is predicting YES at 22%
L avatar
L
is predicting YES at 22%

For what it's worth, I had interpreted this question as reliably beating humans in most or all competitive programming challenges, a la alphazero. The title is literally "Will AI outcompete the best humans". If this market gets resolved yes before AI is dominant in these coding challenges, then I will flag it misresolved and I hope others will do the same. (though I would accept an exception for any competitions which do not permit AI contestants: in those cases, I would accept an equivalent challenge with the same rules played after the main challenge only against the winners of the human v human competitions).

The list of competitions from wikipedia, in which I would expect a programming ai to be completely dominant in order to count for this question:

There are other challenges as well. If anyone would like to hone in on criteria, I'm open to it. I did not deeply investigate each challenge. But if the ai isn't reliably dominating at, at very minimum, the majority of these contests' problems in competitive time and score, don't resolve yes.

Note that I don't think the initial version of this question need constrain to AIs which run on human equivalent wattage. I don't care if the AI is using large amounts of brute force to compensate for algorithmic inefficiency; currently no ai can even approach winning these competitions no matter how much compute you throw at it.

vluzko avatar
Vincent Luczkow
is predicting YES at 22%

Strong disagree. It's unlikely anyone will actually run AI on all of these; why would they? DeepMind was satisfies with chess, go, and shogi - we don't actually need to run it on every other board game in the space to know that it'll be massively superhuman. If the model is generally available then it will probably get run on all the open contests anyway, but there's no particular reason to expect that.

I think beating humans at any major coding competition should count, and I suspect that's how the question was intended.

Eel13 avatar
Eel13
is predicting YES at 22%

@L This is totally unreasonable. Would you ask that Usain Bolt win every race in the Olympics in order to say that he “outcompetes the best runners”? Outcompeting means that it’s the best in at least one competition I’d argue (and we can debate whether that means one contest problem or a full competition set of problems).

L avatar
L
is predicting YES at 29%

@Eel13 okay, sure. make it any one of the ones I marked required then. that seems almost as good to me. the key factor is it has to completely dominate. full competition or negative resolution. if it doesn't come in #1 it isn't what I thought I was betting on. I won't raise any more fuss, after all y'all are making my yes easier to win than I expected, but I do intend to claim with my yes bet that I expect humanity to be beaten in almost all regards in one year. my point is, ai people, plan accordingly, please! those short timeline estimates are by people who see short paths to building it.

MartinRandall avatar
Martin Randall
is predicting YES at 27%

@L I interpreted "can beat" as equivalent to when a chess program first beat the current world champion in a single game, not the present situation where they completely dominate. I "can beat" chess players who are overall better than me, after all.

jack avatar
Jack
is predicting YES at 27%

I agree, I don't think the AI needs to actually compete on all of them - having stronger-than-best-humans performance on one representative contest (e.g. ICPC, IOI, etc) would almost certainly mean that without too much more effort it could do well on many/most of the others, so I think just one is sufficient.

I also don't think it has to completely dominate - if it can rank 1st place say half the time, I think that would be sufficient. And if it can rank 1st place one time in a representative contest, I think that would also be sufficient.

jack avatar
Jack
is predicting YES at 28%

I think this market is a clearer operationalization of the question: /jack/will-an-ai-win-a-gold-medal-on-the-91c577533429 . I created a series of these for different deadlines.