Will a large language model beat a super grandmaster playing chess by 2028?
481
2.7K
2.7K
2029
50%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Get Ṁ500 play money

Related questions

Sort by:
GCS avatar
GCSbought Ṁ10 of YES

Seems like this will be mostly determined by the number of games super grandmasters play against the best models.

MichaelDarmousseh avatar
Michael Darmoussehpredicts YES

Levy played a game recently and the bot did amazingly well. We are getting close. I give it a few more months

AlQuinn avatar
Al Quinnpredicts NO

@MichaelDarmousseh levy is like 200pts below weak grandmaster, right?

MP avatar
MP

@AlQuinn the odds of Levy winning a Super Grandmaster are 2%.

Fion avatar
Fionpredicts NO

@MP that may be what their elos imply (I'll take your word for it; I haven't checked) but I'd put the odds even lower. If there's a serious game between levy and a super GM, let me know and I'll bet a market down to 1% or lower.

Paul avatar
Paulpredicts NO

@Fion Usually Elo is miscalibrated in the other direction. Eric Rosen for example has a 15% win rate against Danya despite being 600 points lower than him in blitz rating.

Fion avatar
Fionpredicts NO

@Paul interesting. Is that FIDE rating or like lichess or something?

My intuition would be that people would take online chess less seriously so maybe it would be different.

Or maybe I'm just wrong. :)

stoneocean avatar
stoneocean

@Fion can't find the video now, but Levy has lost the great majority of blitz games he played against Hikaru when they used to play together (save for one Hikaru was not fully concentrated on), even when Hikaru played joke openings.

Fion avatar
Fionpredicts NO

@JoaoBoscodeLucena undoubtedly, but is the "great majority" seen in practice greater or less than the "great majority" implied by their elos?

RenatoCoelho1987 avatar
Renato Coelhobought Ṁ10 of NO

The structure of an algorithm for playing chess efficiently is very different from the structure of a language algorithm. Despite being machine learning methods, in order to obtain significant results, such as beating a super grand master, the algorithm needs to be built for this purpose. Today, gpt chat is not capable of playing an entire game of chess without making illegal moves. He wins any game, but he doesn't follow the rules of chess, this happens because the algorithm wasn't created with that purpose.

kcs avatar
kcsbought Ṁ150 of YES

IMO, the only way this doesn't happen is if no one bothers to do it

Tater avatar
Taterpredicts NO

@kcs chess youtubers will make sure it happens if the model is public.

kcs avatar
kcspredicts YES

@Tater Good point lol

Lsusr avatar
Lsusrsold Ṁ135 of NO

I placed a bet and then immediately retracted my bet, at a loss, once I realized how lax the resolution criteria is.

Technically-speaking, if a no-name grandmaster played 1-minute blind blitz chess 1,000 times against an LLM that has been fine-tuned to be good at chess, and our grandmaster loses once by accidentally playing an illegal move, then this question ought to resolve to YES. This is not (I think) in the spirit of the question, but does satisfy the written resolution criteria as of September 24, 2023.

Mira avatar
Mirapredicts YES

@Lsusr Models specialized on chess aren't allowed.

MP avatar
MP

@Lsusr I don't think there's such thing as a no-name super-GM. There are only 35 of them.

The criteria is very similar to Kasparov and Deep Blue.

Also, I have the discretion of eliminating fun games.

Yes, the player losing one game jn 1,000 counts, but I don't see super GM wasting their time doing so

DylanSlagh avatar
Dylan Slagh

@Mira it doesn't say that in the description... it just says it has to be marketed as a LLM and not primarily a chess engine. A LLM can be fine-tuned on chess and still be marketed primarily as a LLM

MP avatar
MP

@DylanSlagh yeah, but you'd need someone to fine-tune and NOT call it a chess engine for it not to count.

Mira avatar
Mirapredicts YES

@MP Ah, gpt-3.5-turbo-instruct is very likely to have been finetuned on chess and not called a chess engine. So that scenario is more likely than it may sound.

6 months ago someone checked in a chess eval, and probably someone trained it on chess to improve the benchmarks.

https://github.com/openai/evals/commit/44295630b0c3f6c9befa6bd81586b54d1f334510

MP avatar
MP

@Mira I think we're on the same page. It's obvious they put some chess on the training set. This market would have been solved if it was 1,700 rating or whatever.

b575 avatar
Дмитрий Зеленскийbought Ṁ7 of NO

@MP Some chess will inevitably be in the training set. The question is whether it is enough to achieve the stated performance.

Marq avatar
Ion Marqvardsenpredicts NO

I like the fact that this question illustrated by an AI generated picture of a surreal chessboard, with a strange human eye attached to it.

benshindel avatar
Ben Shindel

@EliLifland have you directly used GPT-3.5-turbo to do this? Or just parrotchess.com ? I'm worried about a mechanical turk situation haha

benshindel avatar
Ben Shindel

@EliLifland That's amazing!