Will a large language model beat a super grandmaster playing chess by 2028?
1.7k
4.4kṀ960k
2029
70%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

  • Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.

    • Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.

Get
Ṁ1,000
to start trading!
Sort by:

I think Now llms like O3 can play chess as I tested and it can beat around an 800 elo bot which is impressive considering how O1 went from barely beating 250 elo chess bots to O3 beating way higher leveled bots and rarely making illegal chess moves.I think this might become more possible

If the LLM writes the code to train a chess engine, requisitions the compute, trains it, and then runs inference on it, does that count?

@DavidSpies "The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database." Not op but sounds like a no to me.

@bananaLtsnh Hi! I see that you're fairly new here, welcome!

For your information, it's a violation of the community guidelines to advertise your markets on unrelated markets -- see https://manifoldmarkets.notion.site/Comment-guidelines-c4b4e6b0b0064b268970388f692c4745

What if an LLM uses Stockfish?

@DanielJohnston Also, what if the SGM uses Stockfish?

@CraigDemel 😂😂

bought Ṁ70 NO

@DanielJohnston The description said the model has to use only its own weights.

@MaxE What if the model is model does not use stock fish at inference time (own weights), but is trained on data generated by stockfish?

@DanielLawson I'm not sure a pure language model can grasp the logic that chess engines can employ without efficiency losses so immense that it's never worth making a llm to play high level chess before we make something better at logic

@MaxE I don’t agree, I think I scaled up version of this paper trained on stock-fish level data would work: Grandmaster-level Chess without Search: https://arxiv.org/html/2402.04494v1

You go from GM to super with just more data/larger model/similar architecture

opened a Ṁ100 NO at 69% order

@DanielLawson great, buy my limit order!

@CraigDemel I think it’s 100% techncialt feasible, but maybe 40-50% someone actually does it. For example, OpenAI could do it, but it’s not worth it for them to spend time/resources making a specialist trained LLM for chess. My claim is not as broad that every commercial LLM with be chess super GMs. It is that one could train an LLM trained / specialized in chess, and would reach super GM with sufficient and high-quality data.

@DanielLawson I wonder if super gms will change their strategy when playing against such a model. If you play weird positions that it wasn't trained on and play super passively to wait for a mistake

@DanielLawson that would be totally fair: grandmasters probably train by playing with stockfish too

reposted

Unless there is a loophole I'm missing here (train an ml system, fool nyt into calling it an llm), this is currently the most mispriced market on manifold.

If i can be assured this only applies to openai's o models, gemini's regular models, grok, claude etc. I would put half my manna into it. Is there a market about those specifically? (e.g. "will claude beat a super grandmaster by 2028")

I just played gemini pro-2.5-exp, it gave me it's queen after 15 moves. I think most bettors here don't play chess or don't use llms 😕

@FergusArgyll here's the game for the record. I thought < 5 seconds per move and played deliberately passively

@FergusArgyll Consider

  1. Current SoTA models are not trained for chess (evidenced by GPT-3.5 Turbo being much better than GPT-4o)

  2. Agents playing games is the next step for AI companies as reasoning models stabilize (OpenAI's roadmap says this explicitly; I don't know if Anthropic has publically said the same, but ClaudePlaysPokemon suggests it's top-of-mind for employees)

  3. Chess is a great game for #2, requiring substantial computation while being trivial to evaluate against stockfish and requiring little context

  4. "Computers beat top humans at chess... again" is great PR for a company that is promising investors AGI

  5. Two years is a long time in this space

@FergusArgyll I think users assume exponential progress

bought Ṁ100 NO

@FergusArgyll I also think it is mispriced but note that it requires the human to play blind and a single win by an LLM is sufficient to resolve YES...

Blind chess should be added to the market title.
"Will a large language model beat a super grandmaster playing blind chess by 2028?"

Hallucinations are getting rarer with the latest models, average centipawn loss is occasionally hitting this level. I'd expect the next round of releases will be worth playing for the super grandmasters. I bet this up to 76%, but I'm 90%+ confident.

https://www.lesswrong.com/posts/gNFixvxw7JxzvMjCJ/personal-evaluation-of-llms-through-chess

@Mactuary If you're gonna play ruy lopez, it's gonna know the 1st 15 moves. Open with 1. h4 or something and it's toast

opened a Ṁ2,000 YES at 64% order

RL finally works at scale with LLMs and this market doesn't shoot to 100%? what are we doing

@Bayesian yeah seems like a lock

@Bayesian a lot of people don't realize how much better a super grandmaster is than a regular human.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules