Will a large language model beat a super grandmaster playing chess by 2028?
1.9k
4.4kṀ1.5m
2029
57%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

  • Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.

    • Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

  • Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.

  • Update 2025-09-06 (PST) (AI summary of creator comment): - Time control: No constraints. Blitz, rapid, classical, or casual online games all count if other criteria are met.

    • “Fun game” clause: Still applies, but the bar to exclude a game as "for fun" is high; unusual openings or quick, unpretentious play alone don't make it a "fun" game.

    • Super grandmaster: The opponent must have the GM title and a classical Elo rating of 2700 or higher.

  • Update 2025-09-11 (PST) (AI summary of creator comment): - Reasoning models are fair game (subject to all other criteria).

  • Update 2025-09-13 (PST) (AI summary of creator comment): Sub-agents/parallel self-calls

    • An LLM may spawn and coordinate multiple parallel instances of itself (same model/weights) to evaluate candidate moves or perform tree search, including recursively. This is considered internal reasoning and is allowed.

    • Using non-LLM tools or external resources (e.g., chess engines like Stockfish, databases) remains disallowed.

Get
Ṁ1,000
to start trading!
Sort by:

I have to say I am impressed by the Carlsen chatgpt game. ChatGPT has gone way longer than I was expecting without illegal moves

What do you think it should happen if spinning up sub agents is a by default behavior of general purpose AIs? They could use many agents in parallel to make a breadth first search.

That's fair game, right? The LLM would only be bottlenecked by its own parallelism capacity and willingness by the AI Lab to offer such tokens.

Thoughts??

@MP This wouldn't be fair game because it wouldn't be just one LLM? I mean, what if the LLM queried stockfish?

@SorenJ An LLM querying itself seems a little different from an LLM querying some non-LLM program. But I agree that it's a tricky edge case.

I'm inclined to say this behavior it should be allowed because I see a good chance that all leading models have this behavior in a few years, so if they're ruled out then the market resolves NO for an uninteresting reason (people stop developing qualifying LLMs before 2028). It's not much of a step from the current reasoning models.

@placebo_username I disagree that because the leading models may have this behavior in a few years, that is a good reason to allow it. (Let's say the leading models all had the ability to query Stockfish in a few years-- does that mean we should allow it?)

I guess as an analog,y this is like saying, "Couldn't a human ask and query another human?" But that wouldn't be a fair match. Sure, a team of 100 grandmasters could all be sub-queried and analyze top candidate moves, but I wouldn't consider that a fair chess match of one human vs. one LLM.

@MP in this context what is a sub-agent? Would this be another instance of the "prime" LLM, running on/competing for the same hardware/resources as the "prime" agent?

I assume calling on stockfish is off limits much in the same way it would be considered cheating if a human were calling on stockfish to feed them moves during a game.

@ShitakiIntaki I am quite sure you can do this today. You send a position to Claude, it selects a handful of candidate moves, and Claude asks parallel instances of Claude to evaluate it. You do this recursively until some rule. At no point you did something dirty like querying stockfish.

This is, in very broad strokes, what GPT-5 Pro does anyways.

Here's the latest blindfolded video, Magnus Carlsen vs (presumably) GPT-5:

https://www.youtube.com/watch?v=3Fk_ihy4lIc

No change from status quo, plays illegal moves like crazy, and accepts illegal moves from the human.

@pietrokc But, the match did happen, so there’s that! I hadn’t expected such a match by now

filled a Ṁ20,000 YES at 70% order

Deepmind seems to have achieved this already: Grandmaster-Level Chess Without Search

"Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms."

@TimothyJohnson5c16 I don't think this counts, since it isn't a general purpose LLM, it's a chess-specific transformer model.

The author says: "My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines)."

@NathanHelmBurger Ah, okay, good point.

@TimothyJohnson5c16 Besides the obvious issue already mentioned, there is another:

> blitz Elo
Blitz chess is played with extremely tight time controls (5/3 or less on Lichess), giving humans little time to think. However, the problem description only makes a requirement about classical chess (70/30) Elo, and states "the model can write as much as it want to reason about the best move", which again is much closer to classical chess. Perhaps we need a clarification from the market creator @MP but I have been interpreting this market as about classical chess.

@TimothyJohnson5c16 Seems like an interesting article, though "We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points" is one hell of a bootstrap. The model is basically just a compression of Stockfish at that point. Imo this is less impressive than AlphaZero, which learns to play from scratch. Not sure this really counts as "without being created specifically [for chess]" as per the author's comment.

@Jasonb Yeah, it definitely doesn't satisfy the requirements for this market.

But I was still surprised to see that a model can compress Stockfish into only 270M parameters and still be extremely strong without any search.

@pietrokc just to clarify, I have put no constraints into the timing of the match. If in the funny chess dot com video with Magnus they recently put out, Magnus had lost, this market would resolve to yes.

I understand that this might mean solving to YES just because Hikaru was unpretentiously playing, blitzing moves into the board without much thinking.

The fun game clause still applies, but expect a high hurdle. Just because someone played the Bogo-Indian against ChatGPT, doesn't mean they are being fun.

What I did put out a requirement is in the elo rating. The person must be a super grandmaster, meaning having the GM title and a classic elo rating of 2700 or more.

@MP I think that's the wrong clarification. I understood this market to be fundamentally about whether LLMs will be better than super-GMs at the "thinking" part of chess. (Hence the classical Elo requirement, as opposed to rapid, bullet, etc.) If you give each player a total time of, say, 10 seconds, the human is not gonna be doing much thinking, and this market may resolve YES for boring reasons.

@pietrokc Idk. At least as of today, a Super GM would beat GPT-5 with 1 minute on the clock while the AI has hours. I'd think it's a remarkable outcome either way.

@MP True with one minute, but probably false with 10 seconds, right?

@placebo_username I don't think it's feasible to play ultra bullet in what we are calling blind chess here

I worry that a grandmaster would fraternize deez nutz

☝️🤓 there's no rule stating all the moves needs to be legal

@Quillist yes there is. The model needs to "beat" the human "at chess". A game where one or more players can make up rules during play is not "chess". And if you are playing chess, you've not "beaten" your opponent if you've broken the rules.

@Fion ☝️🤓 GothemChess is the most popular chess content producer - making him the ultimate authority on what the media thinks the rules of chess are, and in his tournaments AIs hallucinations are valid moves

© Manifold Markets, Inc.TermsPrivacy