If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.
People are also trading
If the LLM writes the code to train a chess engine, requisitions the compute, trains it, and then runs inference on it, does that count?
@DavidSpies "The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database." Not op but sounds like a no to me.
@bananaLtsnh Hi! I see that you're fairly new here, welcome!
For your information, it's a violation of the community guidelines to advertise your markets on unrelated markets -- see https://manifoldmarkets.notion.site/Comment-guidelines-c4b4e6b0b0064b268970388f692c4745
@MaxE What if the model is model does not use stock fish at inference time (own weights), but is trained on data generated by stockfish?
@DanielLawson I'm not sure a pure language model can grasp the logic that chess engines can employ without efficiency losses so immense that it's never worth making a llm to play high level chess before we make something better at logic
@MaxE I don’t agree, I think I scaled up version of this paper trained on stock-fish level data would work: Grandmaster-level Chess without Search: https://arxiv.org/html/2402.04494v1
You go from GM to super with just more data/larger model/similar architecture
@CraigDemel I think it’s 100% techncialt feasible, but maybe 40-50% someone actually does it. For example, OpenAI could do it, but it’s not worth it for them to spend time/resources making a specialist trained LLM for chess. My claim is not as broad that every commercial LLM with be chess super GMs. It is that one could train an LLM trained / specialized in chess, and would reach super GM with sufficient and high-quality data.
@DanielLawson I wonder if super gms will change their strategy when playing against such a model. If you play weird positions that it wasn't trained on and play super passively to wait for a mistake
Unless there is a loophole I'm missing here (train an ml system, fool nyt into calling it an llm), this is currently the most mispriced market on manifold.
If i can be assured this only applies to openai's o models, gemini's regular models, grok, claude etc. I would put half my manna into it. Is there a market about those specifically? (e.g. "will claude beat a super grandmaster by 2028")
I just played gemini pro-2.5-exp, it gave me it's queen after 15 moves. I think most bettors here don't play chess or don't use llms 😕
@FergusArgyll here's the game for the record. I thought < 5 seconds per move and played deliberately passively

@FergusArgyll Consider
Current SoTA models are not trained for chess (evidenced by GPT-3.5 Turbo being much better than GPT-4o)
Agents playing games is the next step for AI companies as reasoning models stabilize (OpenAI's roadmap says this explicitly; I don't know if Anthropic has publically said the same, but ClaudePlaysPokemon suggests it's top-of-mind for employees)
Chess is a great game for #2, requiring substantial computation while being trivial to evaluate against stockfish and requiring little context
"Computers beat top humans at chess... again" is great PR for a company that is promising investors AGI
Two years is a long time in this space
@FergusArgyll I also think it is mispriced but note that it requires the human to play blind and a single win by an LLM is sufficient to resolve YES...
Hallucinations are getting rarer with the latest models, average centipawn loss is occasionally hitting this level. I'd expect the next round of releases will be worth playing for the super grandmasters. I bet this up to 76%, but I'm 90%+ confident.
https://www.lesswrong.com/posts/gNFixvxw7JxzvMjCJ/personal-evaluation-of-llms-through-chess
@Mactuary If you're gonna play ruy lopez, it's gonna know the 1st 15 moves. Open with 1. h4 or something and it's toast
@Bayesian a lot of people don't realize how much better a super grandmaster is than a regular human.