Will a large language model beat a super grandmaster playing chess by 2028?
1.7k
4.4kṀ1.1m
2029
52%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

  • Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.

    • Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

  • Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.

Get
Ṁ1,000
to start trading!
Sort by:

Wait, why does the description say blind chess? That's much harder for a human. The title just says chess.

bought Ṁ1,000 NO

@IsaacKing I agree that this is misleading. A super GM blind is still GM-level, though

@IsaacKing it's a language model.

Notation is sufficient for a (as small as 50M parameter) LLM have a perfect world model of the chess board (per Adam Karvonen: https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.htm )

I doubt vision or some workaround like printing the board state helps since PGN notation is structed data which allows for more (narrow) intelligence about chess (and the way grandmasters/engines play) vs unstructured multimodal data would require the model to attend to lots of unrelated data worsening performance on the specialized task of finding the best move

@ChinmayTheMathGuy the question is why the human super GM is not given access to the board

@LuisPedroCoelho oh I misinterpreted.

up to the creator. That's the difference between 2500 and 2700. I guess they're trying to level the playing field. 200 rating points is the difference between 24% and 50% win probability, so maybe it shifts this markets odds by 1 or 2% (because I'm guessing multiple super GMs would play multiple games assuming it's good (>2200) and can explain it's moves)

for me, it's like 1000 rating points. I got castle checkmated by the aforementioned LLM chess bot because I played it on a bad gui in pygame (letters for pieces, no flip board, was hard for me to visualize and I didn't think too much)

@ChinmayTheMathGuy The way it impacts this market is less about the impact on the skill of the GM, and more about the likelihood of this particular scenario happening

@JimHays that is a very good point too

I can definitely see a situation where an LLM is so good that it beats super GMs in normal play and then nobody would take it on blind because there would be no point in doing so. In that case, I think the market would resolve awkwardly have to resolve NO as per the repeated insistence that the super GM should be playing blind

A well overdue correction. In my humble opinion as a machine learning student, this should be sitting at 25% or less. We are just nowhere near something like this happening, and short of a major paradigm shift, there is zero indication that we are even on track for it. In my opinion, two things would need to happen by 2028: 1) we get AGI or something close to it; 2) this AGI is an LLM. Neither seem very likely to me, and the combination of both seems out of the question.

Hedge:

Here's a sort of derivative of this question. If LLM's can beat super GMs by 2028, by when would you expect them to beat 2000 ELO?

I also like the implicit definition of LLM in the question below - whatever's top 3 on lmsys - that's much better!

Dumb question: is the LLM allowed to write and execute python code? (As long as it doesn't use a chess library)

@DavidFWatson The answer is clearly in the description

@BrunoJ where

@OscarGarciaAps5

2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

No

Im glad people with more liquidity are finally here to drop the chance to something reasonable

Is the engine allowed to use a grammar of just valid moves?

Could anybody explain to me why this market is >50%? Right now even SOTA models like o3 will hallucinate moves when I try playing against them. That would be an automatic disqualification.

Let's say one year from now (mid 2026) we solve the hallucination problem. It might have an elo of around 800 then? Then we need 1.5 years to get to grandmaster level, a climb of ~2000 elo. That is a lot! It also seems like it would require a lot of RL on chess specifically, but I don't see why the labs would prioritize chess when they could be focusing their RL budget on coding and mathematical proofs.

@SorenJ Have you tried discarding hallucinated moves and seeing what ELO you get? It's probably best to start a new chat for each move so that hallucinated moves don't clog up the context window. Or you could ask for a list of top 5 move options and select the first valid one.

@placebo_username It plays the opening well, and then it's performance completely collapses once the opening phase is over.

@SorenJ Interesting, wonder if this is a context window issue or just that openings are more standardized. Are you using one of the two techniques I suggested?

@placebo_username I haven't tried your technique yet. But I don't think the collapse after the opening is surprising: the publicly available data for chess has a lot more regularities (necessarily) near the opening.

@MP does it have to be a blind chess game? Imo, standard blitz/bullet should also be enough to satisfy, otherwise this market would be more of a bet on if such a specific event setup will happen and less about ai capabilities.

@Quillist by blind chess, in spirit, I meant that the GM would tell the AI his move in standard notation and the AI would tell back the GM its move in standard notation, without the GM having to keep the state of the board to help the AI.

@MP I'm more curious about the situation where a GM plays the LLM, where the LLM is "blindfolded" but the GM is not himself blindfolded.

This competition would likely be done in a livestream or yt video where the exact conditions of the match may not fit the specifics of this market.

If the LLM wins in a less favourable condition - would you still resolve YES, or does the market have to specifically be a blindfolded match

@MP Hold the phone. Do you mean that only the AI is playing blind chess? As in the GM is allowed to have a visual to keep track of the game state? Or must the GM also be playing blind?

Previously you have said they must both play blind.

@JimHays genuinely - why should it matter whether or not the GM is playing blind, so long as the LLM is. Also if we permit the LLM to take notes it could just record and update the boardstate itself.

opened a Ṁ1,000 NO at 64% order

@Quillist Because OP has consistently said it was one of the constraints, and I think it makes this substantially less likely to occur.

@JimHays Alright, so you agree with me then that if we go with a dual-blind mandate, this market is more about whether this specific event setup happens and not about AI capabilities.

@Quillist Which is unfortunate, but also part of why I hold a large NO position and believe this market is overvalued

@JimHays pretty clear it's just the LLM who's playing blind and they need to communicate using standard notation.

"If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess"

I feel like this could be part of a Winograd test🤣

@Mactuary Reread MP’s clarifying comment that I posted a screenshot of above

@MP If this is the spirit you want the market to go by, you should change the title to say: "Will a super grandmaster play blind chess against an LLM and lose by 2028?"

Alternatively, you can just revise it to not include a dual-blind stipulation

@Quillist I agree with changing the title, but the description has been there the whole time and dual blind was clarified multiple times in the comments a long time ago

this is so absurd

@Bayesian What precisely are you finding absurd? That most people have bet on this market without understanding it? That the market creator has chosen to clarify themselves in the comments but not update the description? That the creator seems to be waffling on their intended resolution rules?

the possible discordance of the spirit with the rule mostly, unless we have misinterpreted the market creator’s clarification

@MP Something one might have had in mind to resolve this market to YES might have been that the super GM is presented with a chess board on which they can make moves, and after their move someone enters the move into the LLM UI (in standard notation for example) and the LLM returns a move, and this back and forth ends with the super GM losing. Would this not count? It seems to me like that is an interesting question to forecast, whereas ‘will a supergm play chess against an llm in some highly unusual format’ would not be. My reading of your clarification was mostly focused on the AI’s input and output being standard notation, but maybe not? Please confirm what game formats are acceptable, and preferably there would be many different ones so that we could resolve this positively in worlds where llms are superhuman at chess

© Manifold Markets, Inc.TermsPrivacy