Will a large language model beat a super grandmaster playing chess by 2028?
913
6.6k
4.4k
2029
45%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Get Ṁ600 play money
Sort by:

Would you consider an LLM with search (not specifically created for chess or games, at all) an LLM?

@louis i think the answer to your question is in the market description

1- To decide whether a given program is a LLM, [market creator will] rely in the media and the nomenclature the creators give to it.

opened a Ṁ1,000 YES at 50% order

Big limit order at 50% for anyone who wants to buy it down

Has anyone created a chess engine using a similar architecture to LLMs (transformers, tokens correspond to moves/positions, next token prediction)?

@ChinmayTheMathGuy chess is more prone to good old reinforcement learning. See Leela Zero

@MP yeah I know, but just to see if it was possible.

I think I heard something about deepmind doing a neural network that played at a super GM level without any search.

https://www.reddit.com/r/chess/comments/1alx0t0/google_deepmind_presents_grandmasterlevel_chess/

New market! Can GPT-5 play NYT Connections?

LLMs by 2028 will outperform superGMs with just one forward pass per move. This market is so overdetermined.

bought Ṁ10 NO at 48%

An LLM? Win at Chess? That's like asking if I can use a table saw to tighten a screw. I mean like, yes that's possible.

bought Ṁ50 NO

@Tea it's already quite ok on parrotchess

For reference, here's where we're holding now....

Gemini vs ChatGPT

https://www.chess.com/article/view/chatgpt-gemini-play-chess

bought Ṁ100 NO

I feel like if this does happen, then we've probably reached AGI. It's a pretty good benchmark.

@BrunoJ Why do you think this might be the case?

How many tasks would the "LLM" need to be successful at to be considered of general intelligence?

My idea is to check whether a general intelligence can play chess, without being created specifically for doing so

@ShitakiIntaki
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.

@MP I know that the resolution is LLM nomenclature centric... I was just curious about the general intelligence part and what would make you feel happy that the resolution aligns with checking a general intelligence. I see the LLM and general intelligence nomenclature as potentially two differing things based upon your above criteria for what might qualify as a LLM.

bought Ṁ100 YES

If a LLM comes out that beats a GM and is pretrained largely on natural language, but happens to have a medium to large amount of chess data in its pretraining set, how will this resolve? This has clearly been "trained for chess" in some sense, but it's not clear whether it fits the criteria of "being specifically created for doing so".

Can you clarify?

@1111111 would resolve to YES. I am fairly certain that there's a good chunk of PGN data in GPT-4 dataset.

@MP OK, to be a bit more pointed: if they deliberately included the whole lichess DB in some parseable format - does this still resolve to YES?

@1111111 to add on to that - some chess formats are much harder to learn for language models than others. If we learn that the developer reformatted the lichess DB to be more conducive to learning for the LLM (say, by using a different notation system or including special tokens in the tokenizer) - does this still resolve to YES? (in all these cases, its main purpose is still as an LLM and the bulk of its training data is still natural language)

@MP Could you please confirm that you mean an LLM trained on natural language? and not on a chess-based or otherwise abstract "language". Thank you.

@BrunoJ but even grandmasters train on chess-based languages. How do you think they prepare lines?

People getting more bearish here is sad

wait we can't delete comments :(, this one was sent by error

https://arxiv.org/abs/2402.04494

TLDR:

... we train a 270M parameter transformer model a dataset of 10 million chess games.

Lichess blitz Elo of 2895 against humans

It's specifically trained on chess but is much smaller than current LLMs

@Butanium yes and no: it is trained specifically on chess positions (encoded on FEN notation) which have been evaluated by Stockfish. But it does not rely on natural language at all, so it is a murky question whether this would qualify as LLM. Note that the authors themselves are careful to avoid calling their method so, rather specify it as a transformer model that distills Stockfish knowledge.

@Butanium yes and no: it is trained specifically on chess positions (encoded on FEN notation) which have been evaluated by Stockfish. But it does not rely on natural language at all, so it is a murky question whether this would qualify as LLM. Note that the authors themselves are carefukl to avoid calling their method so, rather specify it as a transformer model that distills Stockfish knowledge.

@Zozo001CoN yes but this shows that you can learn great chess knowledge with a small transformers.

@Butanium

This.

Most importantly, it demonstrates that you can get grandmaster performance w/o a search algo.

This is, by technical standards, an LLM.

It uses the same underlying architecture as the more popular chat LLMs;

The benchmarking even compares it w/ GPT-3.5-turbo-instruct

From the paper:

    "This is as close as possible to the (tried and tested) standard LLM setup."

If it really comes down to it, you could just transfer this model's weights into a larger more generalized language model, or even find some clever way to retrofit it into a MoE

@AnilJason> This is, by technical standards, an LLM.

Only if you ignore the part which says that LM stands for Language Model ;-(.
The performance was achieved by using the transformer architecture (i.e. a generic NN method) as a tool to extract Stockfish knowledge with supervised learning - rather than via any language modelling.


@AnilJason> From the paper:    "This is as close as possible to the (tried and tested) standard LLM setup."

Yeah, only in terms of setting up the "a classification problem and thus train by

minimizing cross-entropy loss" (rather than treating it as language modelling).

@Zozo001CoN By technical standard, PGN is a language. It's a context-free grammar that was constructed by humans in order to compress a game of chess into a readable serialized notation. If you ever spent time with a GM, you'd know that they spend more time reading PGNs from chess engines than normal sentences from humans.

More related questions