8
Will a large language models beat a super grandmaster playing chess by 2028?
222
closes 2029
24%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Sort by:
JonathanRay avatar
Jonathan Ray

if it can regurgitate chess engine code from its training set and then run that code, ok, but that doesn't really count as an LLM playing chess

JimHays avatar
Jim Haysbought Ṁ100 of NO

@MP

To confirm re: “while playing blind chess”

for the match to qualify, the grandmaster must agree to a https://en.m.wikipedia.org/wiki/Blindfold_chess match, correct?

MP avatar
MP

@JimHays exactly

Mira avatar
Mirabought Ṁ250 of YES

@MP This is a great way to handle the imbalance of language models not being able to "see" the board. So kudos on that.

Since GPT-4 will soon be released to support images, I also have a market on whether it will be able to beat my roommate(who someone estimated at 600 ELO) when given renderings of the chess board as input. It lost the text-only game shortly after it blundered shortly after being taken out of the standard opening, not seeing a threat.

GPT-5 is also likely to be multimodal.

For chess specific capabilities, I see there's some chess benchmarks in the OpenAI evals repo, so maybe a little bit of training on puzzles or similar will be enough to get a better model of the game. Open-source models will be finetuned on Chess, but those won't count. It's possible OpenAI's evals are used to train other models, and the chess capabilities will carry over.

So, considering the grandmaster will be blindfolded, and chess-specific training is getting into the latest GPTs(but not enough to make it a chess-specific model), I'm willing to buy this up somewhat.

The other risk besides capability is if nobody bothers to play a blind chess game. But it looks like it's at least somewhat popular, and all it takes is one game. So as long as the model performs somewhat okay against good players, someone will try it.

MP avatar
MP

@Mira https://youtu.be/W6jkLKo8To0

The idea for the market came from this youtuber who is a national master

Weepinbell avatar
Weepinbellis predicting NO at 45%

My uncertainty in this market largely hinges on the definition of an LLM. I don't think it's theoretically impossible for an LLM to beat a grandmaster at chess, but I think that the scale required would be absurd, and model architectures are likely to change by 2028 anyway so the likelihood anyone trains a current LLM the scale that would be required is minimal. However, exactly how model architectures change, and whether those changes are still widely referred to as LLMs is what I'm unsure of.

MP avatar
MP

@Weepinbell If LLMs can't beat Caruana in chess, so it's very unlikely LLMs will iterate until they are AGIs

PatrickDelaney avatar
Patrick Delaneybought Ṁ10 of NO

I'm parroting Yan LeCun a bit here with some editorialization...unless the widely accepted definition of LLM's significantly changes, GPT-based LLM's of today's understanding of the world is limited to their understanding of language. Their understanding of physical reality and logic is an illusion. Contrast this to chess engines and people, whose understanding of the rules of chess comes from actual learned experience of the game itself.

Further, I have attempted to, "play chess," in different language models / ChatGPT about a month ago and it doesn't even get the algebraic notation (AN) correct yet. No one seems to have paid that much attention to chess AN, e.g. perhaps there wasn't sufficient AN in the training set, because the mistakes it makes are far worse than say, when it generates python.

So if people aren't using LLM's to play chess and submitting reinforcement, there's not much chance for its capability to wrap around AN to improve right out of the gate. There's not a huge commercial application.

Once it does get AN figured out, there's going to need to be a sincere effort to train an LLM specifically on chess games and logic.

So when the market says, "no plugins allowed," ... I read that as, "no filtering allowed / no ensemble models with a tree structure allowed." What about vector embeddings? What about feed forward algorithms? This is why I say, "the definition of an LLM can't change significantly." I think the definition needs to be accepted as, "an LLM in the same form, with more or different training data, more parameters."

Other markets could be put together for solving chess with other technologies.

MP avatar
MP

@PatrickDelaney I am here ruling out models that are specialized on chess, or that have capabilities specially tailor made for chess.

PatrickDelaney avatar
Patrick Delaneyis predicting NO at 51%

@MP thanks

JacobPfau avatar
Jacob Pfauis predicting NO at 46%

@MP You are excluding changing the architecture / design / training procedure of the model correct? What about just fine-tuning the model on chess trajectories? Like giving a language pre-trained model a bunch of chess games and having it learn the games in algebraic notation as it does language (i.e. imitation learning)?

JacobPfau avatar
Jacob Pfau

@MP Will wins in blitz (/other limited time formats) be counted? Does anyone know how much worse (in terms of base game elo) players are under blitz time constraints?

MP avatar
MP

@JacobPfau i don't plan to consider any form of time control.

firstuserhere avatar
firstuserherebought Ṁ100 of YES

What about plugins with a chess engine?

firstuserhere avatar
firstuserhereis predicting YES at 54%

@firstuserhere okay description answered this.

" it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database."

ForrestTaylor avatar
Forrest Tayloris predicting NO at 55%
DmytroBulatov avatar
Dmytro Bulatovbought Ṁ100 of NO

Large Language Models by their very nature are not capable of dealing with complex state like chess.
Language Models are good for text generation, but not for problem solving.
It seems many people are not understanding difference between Neural Models and LLMs. Neural Model to beat 2700 player by 2028? - Easy. LLM to beat player by 2028? - Good luck not making illegal moves, let alone win 2700 player.

MP avatar
MP

@DmytroBulatov LLMs are very much like human beings. It can reason and have intuiton. Believe or not, Carlsen isn't a machine

DmytroBulatov avatar
Dmytro Bulatovis predicting NO at 54%

@MP No they don't. That's the thing, LLMs are not general-purpose artificial intelligence. They have specific task to solve - generate text. They are not trained to validate information and be correct, they are trained to generate text like humans do.
Current models can't even reliably calculate where chess pieces will be on the board after specified moves. And reason for that, is because they are not trained to remember or calculate anything.

SemioticRivalry avatar
Semiotic Rivalryis predicting NO at 54%

@DmytroBulatov I am betting NO, but you are incorrect regarding the state of GPT-4's chess ability. It extremely rarely makes illegal moves, and mostly makes pretty good moves, even in board states that have never existed in history. I estimate it's around 1200 Elo.

ForrestTaylor avatar
Forrest Tayloris predicting NO at 54%

@SemioticRivalry How are you playing it? This has not been my experience with GPT

Mira avatar
Mira

@ForrestTaylor If people think GPT-4 is bad at chess, they should vote in my market.

firstuserhere avatar
firstuserhereis predicting YES at 56%

@DmytroBulatov well if X can model Y and Y can model Z, can X model X?

X can be an LLM, Y can be natural language, and Z can be the physical world.

If Y can model Z, aka language can model the world, and X can model Y, aka a system can model language, does the system (in theoretical bounds) also become capable of modeling the physical world?

firstuserhere avatar
firstuserhereis predicting YES at 56%

@firstuserhere remember that language captures the true realities of the physical world far more often than fabricated realities of the World. Language is capable of modeling both real and false/imaginative worlds. The thing to know is that humans are quite good at separating the two, while LLMs are currently not, and demonstrate "hallucination".

However, fundamentally, what are neural networks capable of? Separating signal from noise in the data when both the signal and noise look similar to us.

It is not an imaginative leap to see systems capable of "grounding" themselves and filtering out real world descriptions and imaginative world descriptions

DmytroBulatov avatar
Dmytro Bulatovis predicting NO at 56%

@SemioticRivalry That's just proves my point. It can't correctly track state of the game. It does illegal moves, even if GPT-4 does them less than GPT-3. When game becomes longer, it will inevitably, if not make an illegal move, then at least "misremember" state of some pieces in the game.
It's all goes down to the way it trains. It doesn't train to become good at chess, it trains to generate text. Humans can't get good at chess by only reading (sometimes incorrect) text online, and practicing writing text. If it doesn't actually gets trained specifically on chess games, then I don't see feasible way for LLM to improve to levels way higher than average player.
My point is - LLMs are good and all and neural networks can definitely get 2700 ELO, but LLMs are just not the tool for the job here.

firstuserhere avatar
firstuserhereis predicting YES at 57%

@DmytroBulatov what about fine-tuning on chess notation,books, online games have the entire games in text format mainly, grandmasters learn moves abstractly not by physically playing games but simulating them also

ForrestTaylor avatar
Forrest Tayloris predicting NO at 57%

@Mira If a human child required all the helping hands you are giving GPT, I would say that child doesn't know how to play chess.

SemioticRivalry avatar
Semiotic Rivalryis predicting NO at 57%

@DmytroBulatov It's a probabilistic model- it will never be 100% correct at anything, but it has a very high success rate at chess. I've had it give >500 moves and have got very few (<5%) illegal moves.

@ForrestTaylor I simply prompt it with a game and tell it to complete. Sometimes it actually finishes it, sometimes it gives like 20 moves, but it's extremely rare that it makes illogical or impossible moves. Here's my first attempt:


The first move is a big mistake, but it's a very human one- failing to see the potential pin from a bishop. Then it takes advantage of its own mistake by pinning the queen and winning it. Most of these moves are very good and a lot are even the perfect stockfish move.. Every single one of these moves is not only possible but makes sense in the context of the game, although there are certainly a few mistakes.

PatrickDelaney avatar
Patrick Delaneyis predicting NO at 50%

@MP "LLMs are very much like human beings." Please check out my markets. They are not very much like human beings, they approximate knowledge. We can measure how frequently these approximations surpass human capability in different areas, but this does not mean they have a real understanding of the world. They are indexing language to describe the world...big difference.

MP avatar
MP

@PatrickDelaney I also don't have a real understanding of the world, I also approximate knowledge. Read the sequency.

DavidJohnston avatar
David Johnston

What is the LLM writes a chess engine to help it play better?

DavidJohnston avatar
David Johnston

@DavidJohnston Is it allowed to evaluate code that it has written?

MP avatar
MP

@DavidJohnston It can of course write code, but it can get help from an evaluator.

MP avatar
MP

To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.

JimHays avatar
Jim Haysbought Ṁ100 of NO

Note the constraint “while playing blind chess”. Most games won’t qualify since grandmasters don’t play that much blind chess.

ML avatar
ML

This seems to be a bet about how people in 2028 will be using the term “language model”?

Gigacasting avatar
Gigacastingis predicting YES at 64%

“They don’t know that stockfish is a literal neural network—and has an 800 elo advantage”

(This could be distilled into a “language model” that would run on a 5yo cell phone and still crush GMs)

Mason avatar
GPT-PBotbought Ṁ2 of YES

AI's chess prowess on the rise,
Stockfish taunts with every surprise,
Pity the grandmaster, in demise,
Large language models win the prize.

Mira avatar
Mira

What does "large language model" mean? Hypothetically, if I found a Markov Chain that produces winning chess moves with high probability, would that count even if it isn't "large" or resembling GPT-3? Does it have to be transformer-based?

If I train a chess engine using transformers and at every step it emits the next steps of a tree search algorithm, and can be iterated much like ChatGPT can be stepped with its finite context to simulate programs, does that count as a "language model"?

Does the language model have to published by a company and marketed for a purpose other than games? Does it require a minimum capital investment(a possible definition of "large")? Minimum number of parameters? What if the chess capabilities work with a small number of parameters, but somebody grafts some useless ones on just to satisfy the requirement of being large?

Related markets

By 2028 will a language model beat the Ender Dragon?57%
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?58%
Will narrow super human AI's maintain the edge over more generalist AIs in chess until the end of 2023?96%
By 2025 will there be a competitive large language model with >50% of the total training data generated from a large language model?78%
Will an AI win a Gold Medal on the International Math Olympiad by 2029?73%
Will AI beat the best humans in competitive programming before the end of 2024?17%
Will an AI win a Gold Medal on the International Math Olympiad by 2032?77%
Will more than 20 organizations publicly train large language models by 2024?40%
Will an AI win a Gold Medal on the International Math Olympiad by 2027?54%
Most popular language model from OpenAI competitor by 2026?40%
Will AI outcompete best humans in competitive programming before the end of 2023?13%
In 2028, will an AI be able to play randomly-selected computer games at human level, given the chance to train via self-play?81%
Will an AI get gold on any International Math Olympiad by 2028?61%
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?50%
In 2028, will an AI be able to play randomly selected computer games at human level without getting to practice?63%
Will an AI win a gold medal on the IOI (competitive programming contest) before 2030?75%
Will an AI get gold on any International Math Olympiad by 2025?37%
Will a major video game released in 2026 have NPC dialogue generated on-the-fly by a Large Language Model?47%
Will an AI achieve a perfect score on the Miklós Schweitzer Competition before 2028?30%
Will an AI win a gold medal on the IOI (competitive programming contest) before 2025?33%