
If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Related questions

Seems like this will be mostly determined by the number of games super grandmasters play against the best models.
Levy played a game recently and the bot did amazingly well. We are getting close. I give it a few more months

@Fion can't find the video now, but Levy has lost the great majority of blitz games he played against Hikaru when they used to play together (save for one Hikaru was not fully concentrated on), even when Hikaru played joke openings.

@JoaoBoscodeLucena undoubtedly, but is the "great majority" seen in practice greater or less than the "great majority" implied by their elos?
The structure of an algorithm for playing chess efficiently is very different from the structure of a language algorithm. Despite being machine learning methods, in order to obtain significant results, such as beating a super grand master, the algorithm needs to be built for this purpose. Today, gpt chat is not capable of playing an entire game of chess without making illegal moves. He wins any game, but he doesn't follow the rules of chess, this happens because the algorithm wasn't created with that purpose.
@RenatoCoelho1987 there's a market for this! https://manifold.markets/Tomoffer/will-manifold-deal-with-weird-singl?r=VG9tb2ZmZXI

I placed a bet and then immediately retracted my bet, at a loss, once I realized how lax the resolution criteria is.
Technically-speaking, if a no-name grandmaster played 1-minute blind blitz chess 1,000 times against an LLM that has been fine-tuned to be good at chess, and our grandmaster loses once by accidentally playing an illegal move, then this question ought to resolve to YES. This is not (I think) in the spirit of the question, but does satisfy the written resolution criteria as of September 24, 2023.
@Lsusr I don't think there's such thing as a no-name super-GM. There are only 35 of them.
The criteria is very similar to Kasparov and Deep Blue.
Also, I have the discretion of eliminating fun games.
Yes, the player losing one game jn 1,000 counts, but I don't see super GM wasting their time doing so

@Mira it doesn't say that in the description... it just says it has to be marketed as a LLM and not primarily a chess engine. A LLM can be fine-tuned on chess and still be marketed primarily as a LLM
@DylanSlagh yeah, but you'd need someone to fine-tune and NOT call it a chess engine for it not to count.

@MP Ah, gpt-3.5-turbo-instruct is very likely to have been finetuned on chess and not called a chess engine. So that scenario is more likely than it may sound.
6 months ago someone checked in a chess eval, and probably someone trained it on chess to improve the benchmarks.
https://github.com/openai/evals/commit/44295630b0c3f6c9befa6bd81586b54d1f334510
@MP Some chess will inevitably be in the training set. The question is whether it is enough to achieve the stated performance.

I like the fact that this question illustrated by an AI generated picture of a surreal chessboard, with a strange human eye attached to it.
@EliLifland have you directly used GPT-3.5-turbo to do this? Or just parrotchess.com ? I'm worried about a mechanical turk situation haha

@BenjaminShindel It’s legit. There’s now an open source lichess bot, see https://nicholas.carlini.com/writing/2023/chess-llm.html
Related questions













