Will a large language model beat a super grandmaster playing chess by 2028?
➕
Plus
1.5k
Ṁ790k
2029
66%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

  • Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.

    • Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

Get
Ṁ1,000
and
S3.00
Sort by:

This may be an tougher challenge than expected it has 2 years since 2023 and no LLM has even come close to that elo the only llm that came semi-close in chess by just predicting moves based on its dataset was the mysterious gpt 3.5 turbo instruct. If Llms doesn't start playing chess and does not fail to keep track of the board state. I will have to sell by late 2026 or 2027. This is very concerning since this is almost 2 to 3 years away

@Blocksterpen3 Related, o1 pro lost to me easily (which was only the 2nd game of chess I played in years.) It also repeatedly got confused about the state of the board.

https://chatgpt.com/share/675e2bbb-2e88-8009-8382-b72bd610253c

@DavidBolin yeah I hope o3 or even o4 can play a coherent game of chess. Even deep seek r1 fails around move 13

@DavidBolin

LLMs get better at chess when given three examples of legal moves and their results and asked to repeat the entire previous set of moves before each turn. This can likely be applied to any game.

https://dynomight.net/more-chess/

Is this with no prompting?

bought Ṁ15 YES

Question: if o3 does this, would it resolve as yes? Also what does blind chess mean in context of a language model?

@RossTaylor I'm very confident that o3 will not beat a super GM

@AdamK Doesn’t answer the question - would that resolve as a yes if it did?

@RossTaylor Assuming o3 is also text-only, then yes. The "blind" criterion just means the model doesn't get to see pictures of the board

@AdamK Thanks for clarification! Is model allowed to imagine board states in its chain of thought?

@RossTaylor That's what it would have to be doing implicitly for the CoT to be useful. o3's CoT is almost certainly just text.

It's starting to look like this market is just a countdown to whenever one of the frontier labs decides to apply reasoning post-training to chess.

@AdamK and to whether a super grandmaster is bored enough to play vs an AI at chess.

@AdriaGarrigaAlonso I think that is actually a rather significant component of this question. You could reframe the resolution as "Will a super grandmaster play a serious game of chess against an LLM by 2030?". Even if LLMs continue to improve at chess (they currently aren't any good), this other contingency has to hold as well. Current market seems high.

Has anyone tested o1 or o1-pro on chess? Might be expensive to do, but I feel like it would be interesting. I predict it would not be significantly better than the 1600ish rating of gpt3.5-instruct

@dominic I think the more RLHFed the model is, the worse it is at chess. That's probably why 3.5 instruct is better than 4, 4o, and probably o1.

I might be wrong.

It should be better if the output is constrained to pgn format and fine tuned on stockfish analysis (available on lichess pgn file).

This already a transformer that’s at 2700 just predicting stockfish.

http://arxiv.org/pdf/2402.04494

"To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it."

No. If I shit in a box and write "LLM" on the side does that make it an LLM? Some logic applies if they give slight chatbot functionality to a chess engine and call it an LLM.

@BrandonNorman If you shit in a box and it beats a grandmaster, I for one will respect whatever you call it.

@RiskComplex We already have chess engines that can beat a grandmaster. The bet here is that specifically an LLM will do it.

@BrandonNorman If you can manage to make The Verge to report on your shit in a box that beats a super GM as an LLM, this market resolves to YES.

@MP You're making argument from authority about an authority I do not respect. The Verge will print stories about what makes them the most money, without regard to its truthfulness.

Also from the bongcloud reference, a single game only is required, not a match?

What about a universal strategic game engine that can play arimaa and chess etc?

bought Ṁ50 YES

Probably resolves YES, someone will fine tune some C-tier open-source model for chess specifically, and by 2028 the C-tier models will be good enough that it'll crush.

bought Ṁ700 NO from 61% to 59%

@JS_81 fine-tune it to what? recall real chess games very well? supergms can do that too, and more besides

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules