If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Update 2025-21-01 (PST) (AI summary of creator comment): - LLM identification: A program must be recognized by reputable media outlets (e.g., The Verge) as a Large Language Model (LLM) to qualify for this market.
Self-designation insufficient: Simply labeling a program as an LLM without external media recognition does not qualify it as an LLM for resolution purposes.
Update 2025-06-14 (PST) (AI summary of creator comment): The creator has clarified their definition of "blind chess". The game must be played with the grandmaster and the LLM communicating their respective moves using standard notation.
People are also trading
@Wott I am not the market poster, but I think that would not be allowed. Writing and running a chess engine would imo be the same as accessing one.
Also, I personally doubt it could even do that successfully without heavily using external libraries, which I feel would go against the spirit of the market.
I think a better question to ask @MP is whether the model is allowed to run code at all.
@Bayesian That’s correct, my message was not literal. I guess my point was that relative to an everyday person, these super GMs are nowhere near the same order of magnitude in terms of skill.
Traditional chess engines are superhuman, they are another order of magnitude better than super GMs.
LLMs don’t show anywhere near the same efficiency as chess engines, as they are generalised language completion functions, they are not specialised for chess. The amount of energy and processing power required to emulate the operation of a chess engine with an LLM would make it extremely inefficient.
With current technology, I could see an LLM using chain of thought to emulate a chess engine, however this would be so inefficient I can’t see it being fast enough.
If some new model architecture comes out that can play chess as well as do all of the current LLM functions, I believe it would be different enough to warrant not being labelled an LLM for the purposes of this question.
I agree with @JussiVilleHeiskanen the time is irrelevant here, as imo the LLM in its current form will never be suited for chess.
@KeithManning You should make the 2035 market, I think you could get a much better price. If you also dropped the weird blindness restriction, I think people would bet it up to like 80%.
@Blocksterpen3 There is this:
@JussiVilleHeiskanen I don't think that includes chatgpt agent and also that question has no using external tools which chatgpt agent uses
Very relevant - these are just about the exact conditions required for this market. It did not go well for ChatGPT.
@gamedev Bearish for people who think the primary concern here is whether AI will be able to do this; bullish for those who think the concern is whether a grandmaster will compete against that AI in blind chess.
@Frogswap Agreed. Although I really do believe that non-chessplayers will have a hard time understanding the difference in ability being demonstrated here. Even at continued exponential growth by the LLMs there is an enormous amount of ground to cover. ChatGPT says the difference between 1400 and 2800 is a factor of about 10,000.
@Frogswap If there was any doubt which way this was going to go, Pragg would not have done it in this jokey way
@FergusArgyll He doesn't, but chess.com gets all the major GMs to frequently provide content that they use for their channel around major events
@pietrokc it resurrected its dead queen which was funny.
If it would have been competitive pragg wouldn't let it obviously - he even asked if it wants just a free queen or a rook too
@pietrokc I think its pretty obvious that if the LLM is making illegal moves then it has NOT defeated a super GM in chess.
@gamedev Ugh, you AI haters are all the same. Have you considered the possibility that the LLM actually understands chess better than humans and knows the rules that we can only imagine? No, of course not. Now crawl back into your little hole while the rest of us enjoy necromancer's chess as she was meant to be played.
@gamedev I agree that if the LLM is allowed to make an illegal move it doesn't count. However, what if the LLM tries an illegal move, a human tells it the move is illegal, and prompts for another one, which is then legal and played? How many rounds of this are allowed until we declare the LLM lost? Does it lose on the first attempted illegal move? Fifth? We keep prompting for legal moves until they appear?
@pietrokc The natural implementation of that, to my mind, is to have the illegal-move penalty for the LLM is the same as whatever a human would be hit with if they made an illegal move in that position. (For whatever ruleset the game is being played under; different rulesets vary in their handling of illegal moves.) See e.g. https://chess.stackexchange.com/questions/181/i-made-an-illegal-move-what-happens for a discussion of a few particular rulesets' answers to that question.