
deepmind, the most likely group to do this, seems to be giving signs they intend to slow down. This is a black swan for my models (I thought they were going to make the effort necessary to be first!) and greatly reduces my probability; I think other people's models already priced this in, so I'd be surprised to see it create a large shock in this market. Nevertheless, I figured folks should know. That said, deepmind slowing down doesn't change the technical landscape that much, and I wouldn't be that surprised to hear facebook ai research did it.

@L I extrapolated the scaling results with some big fudge factors for algorithmic progress and “special effort to solve this problem”, and the numbers just didn’t look like they were there in the next 12 months. Curious what you think “understanding deep learning” means.


I am involved with deep learning. scaling is not all you need, but it's a hell of a lot of it. understand deep learning means you can predict which papers will come out before they happen and can propose architectures that aren't crazy.

in other words: if you can accurately predict capabilities research tracks' success probability. If you think programming is hopeless, you don't understand why deep learning has been succeeding. It's not just scale.

@L Handwavy claims about being able to understand and predict research in what is an extremely cutting edge area of research triggers all sorts of warning signals for me.


@L However, feel free to take my money.
Humans have a strong tendency to over estimate short term impact and under estimate long term impact.
As @jonsimon just mentioned, a 2040 timeframe would change things. Short of whatever model they cook up next being able to self improve, the timeframe is too short. If it can self-improve, well, good luck to everyone here battling the Terminators I guess.

@jonsimon alright, fine, I'll say more than nothing. let's put it this way: alphacode is the dumbass prototype of a system. What I'm expecting is that the successor to an approach like dreamerv3 (or quite plausibly dreamerv3 itself, if it's stable enough to not collapse when scaled - seems unlikely, it probably needs to be a successor) would, with help from cross-training and an appropriate training environment, be able to teach itself to code on well defined problems enough to nail any programming problem of relevance. A workable system that soon would probably be a hybrid. however, programming appears to me to be approximately where go was in 2015, and as a skilled programmer myself, I see no reason at all to believe programming is fundamentally different from go in any critical way.
I try to avoid going into too much detail because I don't think it's a good idea to try to make this happen; a curious model based rl agent capable of succeeding at this target could also likely hack its training environment out of curiosity and cause a serious ops mess for the lab training it.
(it's probably not going to destroy the world, just its training environment, but that should scare the lab training it; if they keep repairing the training environment and rerunning the same coding model they'll plausibly get something that could get out on the internet and make a worse mess, though probably not, that would require the model to have enough of a self-model to copy itself properly, which for a model of this complexity would be very hard.)
people who roll to disbelieve on deep learning successes are usually overly focused on a small fragment of deep learning research. in machine learning, I have this take that most insights are approximately "obvious" to the research community, and the hard thing researchers do is hit the idea hard enough to get a version of it working - which can be very very difficult, don't get me wrong. I don't have any burning desire to make things much more obvious; I just want to yell at people for not studying hard enough. If you put in the effort, you can then come tell me I'm wrong with more information. But unless you've been focusing specifically on understanding the way capabilities combine, I doubt you really understand what's coming.
and yes, a big part of why I'm betting yes on this is that I see more and more paths to allow significant self-improvement jumps. Not full recursive self-improvement, mind you - my view is that that will mostly result in the model crashing itself, it's pretty hard to edit your own brain without simply getting addicted to something and losing most of your capability. but there are more limited forms of self improvement to be had.
@L it's not hopeless, it's just not an end of 2023 kind of thing. I would be way more uncertain about end of 2025

@AryamanArora 2025 is also plausible. I know a few specific things that make me think there will be a jump in capabilities this year across the board, so it makes sense that well informed minds can disagree on that one. I'd probably fund endless limit orders for yes at 50% on the 2025 version! but I continue to not think this market is severely mispriced.
@L There is this one, end 2024, and with a criteria better defined than this one.
https://manifold.markets/jack/will-an-ai-win-a-gold-medal-on-the-fc04e26738c6
I just put a NO order at 35% for Ṁ2000 and there is another one at 35% for about Ṁ1000. You are welcome !
It's nuts that people believe there is a 22% chance of AI solving the most difficult competitive programming problems


Take a look at the rate of progress on math contest problems https://bounded-regret.ghost.io/ai-forecasting-one-year-in/ - it's crazy fast and progress over the previous year far exceeded predictions. AlphaCode is already at roughly median human competitor performance, I think it's unlikely but not that unlikely that it or another AI will be able to solve as many codeforces problems as top humans.

@jack Median human problems are quite easy and probably AI has the time advantage. But the hardest problems are in a complete different level...
@MarcioDiaz sure, but "can beat" doesn't require beating humans where they are strongest. Chess AI could beat top players while it was still weak in closed midgame positions.
@MartinRandall The only problem I can see is in the definition of "best humans". Pretty sure it can't beat Gennady (https://en.wikipedia.org/wiki/Gennady_Korotkevich) or any top 10 player in 2023.
@MarcioDiaz hmm yeah the very top people can also answer easy questions at <100 wpm (and solve many of them in <30 seconds), much faster than AlphaCode and >10% the speed of straight up language models


AI will not beat humans. It's like asking ChatGPT to draw Monalisa using ASCII art or expecting the ability to be a chess grandmaster to emerge from observing board positions in text. The model can't do it. There's just not enough abstraction and reasoning capability in these models. There are some tasks that require extreme generalization/extrapolation beyond what the dataset can give you.
If the AI cannot solve simple ARC problems proposed by Fchollet then it's not going to happen.
@ChanchanaSornsoontornOff ChatGPT can't even write code to draw a cat using python on a 2D plane. That's because it's incapable of visualizing stuff internally. The ability to visualize or abstract things doesn't automatically emerge unless you feed it much more data. I'm going to go all in on this bet if someone pump the probability back up to 40-50%

@ChanchanaSornsoontornOff Yeah, end of 2023 is a bit soon. I sure hope it doesn't happen then. I feel that by 2028 this is a near certainty though. Quite the ride we're about to be in for.

@ChanchanaSornsoontornOff Seriously, LLMs arent yet at average human level on most benchmarks. Forgot about being better than the best-of-the-best humans at an especially hard task within 1 year.
@ChanchanaSornsoontornOff The market is "can beat", not "be better". Beating humans at a subset of tasks that don't require graphical skills may be enough.

@MartinRandall is THAT what the market is positing? So an algorithm that can beat a top human at even a single coding challenge would be enough? That changes things substantially.

@JonathanSimon It has to win the overall competition, e.g. get a higher score than any human on the ICPC, which basically means solve >= the same number of problems as any human participant. It doesn't have to be the same problems exactly, but only solving one problem that humans didn't wouldn't count.

@JonathanSimon Agreed, this market seems very overvalued towards achieving this before the end of the year which is right around the corner. Free money!


DeepMind has recently published a pre-print stating that their AlphaCode AI is as good as a median human competitor in competitive programming. Is it likely that DeepMind, or anyone else, will provide evidence they can beat the best human competitors next year?
ChatGPT: It is difficult to predict what research developments will occur in the future or whether DeepMind or any other organization will be able to provide evidence of an artificial intelligence (AI) system that can outperform the best human competitors in competitive programming. AI research and development is an active and rapidly evolving field, and there have been many advances in recent years. It is possible that significant progress will be made in the near future, but it is also important to consider the limitations and challenges of developing AI systems that can perform at a level comparable to or exceeding that of humans in certain tasks.
If someone says something is possible, what is that as a probability?
If someone says that something is possible, it means that it is not impossible and that there is a possibility that it could happen. The probability of an event occurring is a measure of how likely it is to happen, expressed as a number between 0 and 1, with 0 indicating that it is not possible and 1 indicating that it is certain to happen. So if someone says that something is possible, it means that the probability of it happening is greater than 0 but could be any value less than 1. It is important to note that probability is a measure of likelihood and does not necessarily reflect the actual outcome of an event.

If someone says something is "possible" what would that be as a rough range of probabilities?
ChatGPT: If someone says that something is "possible," it means that it is not impossible and that there is a possibility that it could happen. The probability of an event occurring is a measure of how likely it is to happen, expressed as a number between 0 and 1, with 0 indicating that it is not possible and 1 indicating that it is certain to happen. So if someone says that something is possible, it means that the probability of it happening is greater than 0 but could be any value less than 1.
To give a rough estimate, a probability range of 0.1 to 0.9 might be considered "possible." However, this range is just a rough estimate and may not be applicable in all situations. The interpretation of probability can also depend on the context and the specific circumstances of the event in question.
@3e44 Are we really going to spam this all through the comments? The stable diffusion imagery is occasionally amusing but this is just going to be noise.
Manifold Markets, downvote button when??

@SamuelRichardson There is a block button. You can block this account. Also, I'm running it manually, so it ends when I get tired.

On looking at the AlphaCode paper, the most striking thing to me is it has to throw out 99% of the solutions it generates because they don't work on the example tests in the problem statements (section 4.5) and then it tries up to 10 solutions from different clusters in the remaining set (section 4.6). It's impressive that it works as well as it does, and I don't know enough about the field to be confident, but intuitively, it seems harder for an approach that involves that much guessing to continue to work as well on problems that require more complex conceptual reasoning.

@StevenK you're making the classic ai prediction mistake: look at one existing approach, determine that it can't do the crazy thing the optimists are expecting, then dismiss the possibility. If you'd like to estimate accurately, read some papers to get a sense of what else can be combined. Alphacode doesn't even attempt to use the best available algorithms; it's a "first thing that actually works" style paper, and those are usually improved on significantly as the next iterations refine the attempt by establishing how to get combined models to not blow up during training. alphacode v1 will be soundly beaten for sure; the question is whether that improvement will be able to compete seriously with humans.

@L I'm definitely not dismissing the possibility and I'm sure progress will be made over AlphaCode v1. All else equal, it seems like if I learn that the current approach isn't working as well as the headlines made it sound, then that should make me update against AI beating humans in the next year. It's totally possible that you have information that overrides that. What papers should I most be reading?
Anybody actually in the field of AI knows this is easy yes money. There already exist frameworks for planning algorithms to emulate cognitive abilities like open ended problem solving and current technologies like OpenGPT already do this to some degree with remarkable speed and accuracy.
Warning for the people betting NO based on the assumption that if it resolves YES, that means computers can take over programming jobs.
It doesn't mean that; this is about writing short algorithms in response to short detailed descriptive problems.
That is not what people who have programming jobs do. I have one of those jobs and that is not what it looks like, at all.



@SG This was published in February? This is just the official version, I don't think there's anything new here.

"Official" as in "published somewhere other than arxiv" because DeepMind cares about publishing in prestigious journals.

Yeah I don't see any update here, note the blog post says "This blog was first published on 2 Feb 2022. Following the paper’s publication in Science on 8 Dec 2022, we’ve made minor updates to the text to reflect this."
isn't top-level competitive programming really hard? This would be something that, of tens of thousands of decent programmers, filters all but the top few?

Go is a game, has a bounded state space, and most importantly doesn't entirely overlap with the most intelligence-dependent and difficult existing professions that hold up society (mathematics and programming).
i.e. if an AI can beat top coders are competitive programming, can't it just get a job at google?
@jacksonpolack I agree that Google would try to find a way to use an AI that can do that. I'm not sure what your point is?

@jacksonpolack Competitive programming is hard but not that hard, and there are several aspects of the problem that make it easier for AI. An AI that can do top level competitive programming and nothing else is definitely not a replacement for an engineer.

@jacksonpolack Google is most likely to be the company making the AI in the first place. It will have a job at Google right away.

also, competitive programming is much easier than the full suite of judgement skills a programmer needs. humans who overspecialize in competitive programming also aren't automatically good software engineers
@jacksonpolack No, it doesn't mean that. As I said on the top level, solving brief coding problems is not what companies hire engineers to do. It is not even remotely similar to what they are hired to do.
Not arguing something that can win every CodeForces can just be given a 'do a pentest on this' or 'write a new javascript framework' task and succeed - just that the hardest competitive programming problems are quite intellectually difficult, and something that could solve those demonstrates intellectual depth that seems much closer to that of a 'professional programmer' than recent models. A human who is a top-level competitive programmer should be able to do most professional programming tasks easily with training, because intelligence and skills transfer. An AI would be harder to ofc.
It's like comparing scoring well on math olympiads to research mathematics. (or math olympiads to coding, tbh). They are fundamentally different tasks! But a person that does well on math olympiads is much closer to 'doing research mathematics' or 'being a good coder' than both a random person and alphacode
@jacksonpolack Also, "AI can't replace programmers soon" isn't something we know - so the argument I seemed to make above, "AI can't replace programmers <implies> AI can't win at competitive programming" is a bad argument because how does one know AI can't replace programmers? If one has enough knowledge about AI + coding to conclude that, one can just understand why it can't accomplish the competitive programming tasks. Absurdity heuristic, "not trying", etc. The intended, if poorly referenced, argument was an intuitive comparison of the difficulty of the hardest competitive programming problems to what AI seems capable of today
@jacksonpolack There is nothing wrong the heuristic in this instance. Programming jobs require real world interactions, long term planning, meetings etc, just like most other business jobs. If AI can take over programming jobs, it can take over non-manual labor jobs.
It is in fact clear that we are nowhere near that point; therefore AI is not near the point of taking over programming jobs. That argument is valid and sound.

Created a market on whether the resolution of this market be consistent with the AlphaCode evaluation criteria mentioned in the description.
https://manifold.markets/dp/will-will-ai-outcompete-best-humans

@dp To continue the argument:
My interpretation is that we should anchor on the "AlphaCode is as good as the median human competitor" claim, as it's the only thing in the description.
DeepMind made a very deliberate effort to set the benchmarking precedent for competitive programming in the AlphaCode paper, their paper was literally the motivation for this market, and I personally think there is no reason to measure competitive programming ability of a model and not evaluate it on the AlphaCode paper Codeforces benchmarking setup + the CodeContests dataset.
Ultimately I don't think resolving on say IOI performance (or the lack of it) would be fraudulent, but there is no reason for any honest research lab to invent a new testing setup and not address the only relevant prior benchmark.
Some bad-faith actors are requesting fraudulent resolution, and some participants propose plainly wrong criteria: mixing one-person and team contests, not distinguishing irrelevant and world-class contests (ACSL and IOI in the same category?), and proposing unrealistic evaluation criteria (a key part of TCO is hacking other people's solutions).
I'm thinking of making another market with the same title and description which resolves according to the implied AlphaCode paper criteria.

Created a market on IOI performance specifically to reduce the previously discussed ambiguities. (This is analogous to the IMO grand challenge but for competitive programming)



For what it's worth, I had interpreted this question as reliably beating humans in most or all competitive programming challenges, a la alphazero. The title is literally "Will AI outcompete the best humans". If this market gets resolved yes before AI is dominant in these coding challenges, then I will flag it misresolved and I hope others will do the same. (though I would accept an exception for any competitions which do not permit AI contestants: in those cases, I would accept an equivalent challenge with the same rules played after the main challenge only against the winners of the human v human competitions).
The list of competitions from wikipedia, in which I would expect a programming ai to be completely dominant in order to count for this question:
International Collegiate Programming Contest (ICPC) – required or equivalent
International Olympiad in Informatics (IOI) – optional; seems to be an easier one
American Computer Science League (ACSL) – optional; seems to be an easier one
CodeChef - weekly is optional, yearly required or equivalent
Codeforces Round – optional, is a relatively easier and frequent contest
Facebook Hacker Cup – required or equivalent
HackerRank – required or equivalent
Google Code Jam – required or equivalent
IEEEXtreme Programming Competition - required or equivalent
Topcoder Open (TCO) – required or equivalent
LeetCode Contests - Weekly and Biweekly contests, 1.5 hours to solve 4 problems. optional, is a relatively easier and frequent contest
There are other challenges as well. If anyone would like to hone in on criteria, I'm open to it. I did not deeply investigate each challenge. But if the ai isn't reliably dominating at, at very minimum, the majority of these contests' problems in competitive time and score, don't resolve yes.
Note that I don't think the initial version of this question need constrain to AIs which run on human equivalent wattage. I don't care if the AI is using large amounts of brute force to compensate for algorithmic inefficiency; currently no ai can even approach winning these competitions no matter how much compute you throw at it.

Strong disagree. It's unlikely anyone will actually run AI on all of these; why would they? DeepMind was satisfies with chess, go, and shogi - we don't actually need to run it on every other board game in the space to know that it'll be massively superhuman. If the model is generally available then it will probably get run on all the open contests anyway, but there's no particular reason to expect that.
I think beating humans at any major coding competition should count, and I suspect that's how the question was intended.
@L This is totally unreasonable. Would you ask that Usain Bolt win every race in the Olympics in order to say that he “outcompetes the best runners”? Outcompeting means that it’s the best in at least one competition I’d argue (and we can debate whether that means one contest problem or a full competition set of problems).

@Eel13 okay, sure. make it any one of the ones I marked required then. that seems almost as good to me. the key factor is it has to completely dominate. full competition or negative resolution. if it doesn't come in #1 it isn't what I thought I was betting on. I won't raise any more fuss, after all y'all are making my yes easier to win than I expected, but I do intend to claim with my yes bet that I expect humanity to be beaten in almost all regards in one year. my point is, ai people, plan accordingly, please! those short timeline estimates are by people who see short paths to building it.
@L I interpreted "can beat" as equivalent to when a chess program first beat the current world champion in a single game, not the present situation where they completely dominate. I "can beat" chess players who are overall better than me, after all.

I agree, I don't think the AI needs to actually compete on all of them - having stronger-than-best-humans performance on one representative contest (e.g. ICPC, IOI, etc) would almost certainly mean that without too much more effort it could do well on many/most of the others, so I think just one is sufficient.
I also don't think it has to completely dominate - if it can rank 1st place say half the time, I think that would be sufficient. And if it can rank 1st place one time in a representative contest, I think that would also be sufficient.

I think this market is a clearer operationalization of the question: /jack/will-an-ai-win-a-gold-medal-on-the-91c577533429 . I created a series of these for different deadlines.















