Will a large language model beat a super grandmaster playing chess by 2028?
➕
Plus
1.5k
Ṁ780k
2029
68%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Get
Ṁ1,000
and
S3.00
Sort by:

Has anyone tested o1 or o1-pro on chess? Might be expensive to do, but I feel like it would be interesting. I predict it would not be significantly better than the 1600ish rating of gpt3.5-instruct

@dominic I think the more RLHFed the model is, the worse it is at chess. That's probably why 3.5 instruct is better than 4, 4o, and probably o1.

I might be wrong.

It should be better if the output is constrained to pgn format and fine tuned on stockfish analysis (available on lichess pgn file).

This already a transformer that’s at 2700 just predicting stockfish.

http://arxiv.org/pdf/2402.04494

"To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it."

No. If I shit in a box and write "LLM" on the side does that make it an LLM? Some logic applies if they give slight chatbot functionality to a chess engine and call it an LLM.

@BrandonNorman If you shit in a box and it beats a grandmaster, I for one will respect whatever you call it.

@RiskComplex We already have chess engines that can beat a grandmaster. The bet here is that specifically an LLM will do it.

Also from the bongcloud reference, a single game only is required, not a match?

What about a universal strategic game engine that can play arimaa and chess etc?

bought Ṁ50 YES

Probably resolves YES, someone will fine tune some C-tier open-source model for chess specifically, and by 2028 the C-tier models will be good enough that it'll crush.

bought Ṁ700 NO from 61% to 59%

@JS_81 fine-tune it to what? recall real chess games very well? supergms can do that too, and more besides

Never mind, I didn't fully read

bought Ṁ450 YES

Additionally, both internal and external search indeed improve win-rates against state-of-the-art bots, even reaching Grandmaster-level performance in chess while operating on a similar move count search budget per decision as human Grandmasters.

2 traders bought Ṁ2,289 NO
bought Ṁ500 YES from 60% to 64%

@ismellpillows @dlin007 I am really curious as to why you two reacted bearishly to this paper.

@NeuralBets literally just strong priors and vibes given the ambitious nature of the market. you shouldn't put much stock in my bets cause i have zero contrarian expertise

@dlin007 oh. i thought it had something to do with that paper, since you replied with a trade to my comment.

@NeuralBets Bc the market requires a general intelligence

@ismellpillows I don't think the market creator meant literal AGI. It would play chess well, by definition. But fair point.

I’m not really sure how AGI is defined. Current LLMs are “general”, can’t beat super GM, and not AGI, right? My understanding is that the market requires an LLM that maintains generality and can beat super GM. So, for example, if someone made an LLM that’s equal to GPT-4 in every way except super good at chess, that would qualify. But it still wouldn’t be AGI, right?

Anyway, the model in the paper isn’t general because it only plays chess

https://dynomight.net/chess/
> I can only assume that lots of other people are experimenting with recent models, getting terrible results, and then mostly not saying anything. I haven’t seen anyone say explicitly that only gpt-3.5-turbo-instruct is good at chess. No other LLM is remotely close.

To be fair, a year ago, many people did notice that gpt-3.5-turbo-instruct was much better than gpt-3.5-turbo. Many speculated at the time that this is because gpt-3.5-turbo was subject to additional tuning to be good at chatting.

@RaulCavalcante Old news, see also the comments below.

bought Ṁ50 NO

Grand masters have defeated chess programs in the past by deliberately avoiding common lines. This would seem to be a particularly good approach against LLMs which are just predicting the next move rather than evaluating possible positions.

and what if an LLM can accurately predict what stockfish would do for its next move? Or predict what would be played in order to beat stockfish even?

@CaelumForder Then it would have be a Super AGI. It would have to model stockfish (or something like it) and that is precisely what LLMs do not do despite all the hope and hype. I'm not saying that it is definitely impossible but all the bizarre failures we see in LLMs today are due to them extrapolating outside their training data. Even if LLMs could beat stockfish they will never do it by predicting stockfish - To do that they would have to actually emulate stockfish and that would necessarily be much much less time efficient than stockfish so not be able to search as deeply - Moves in GM chess games are time limited. If an LLM were ever to beat computer program at chess it would be more like AlphaGo but even that would require something that doesn't exist in current LLMs. Of course beating a GM is easier than beating stockfish. I think the main problem here is the misunderstanding of the sense in which LLMs "predict" the next token.

So if it's a language modell then we're good? Does it have to be transformer based or can it be any architecture?

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules