By when will Kenshin9000 (or anyone else) “defeat all chess bots” using LLMs? (Permanent)
2024, by Election Day
2025 or earlier
2026 or earlier
2027 or earlier
2028 or earlier
2029 or earlier
2030 or earlier
2040 or earlier
2050 or earlier
2100 or earlier

This market resolves each option as NO if the date passes and Kenshin9000 (or anyone) has not defeated stockfish with an LLM-based chess engine.

All remaining options resolve YES once an LLM-based engine defeats stockfish (or top engine).

My resolution criteria are more strict than Mira’s:

  1. The LLM engine must have higher ELO than the latest stockfish (or whatever the top engine is at resolution time) at blitz timings with 99.9% confidence and be reproduced by 3+ people.

  2. The LLM engine must not use another chess engine at runtime.

For the purposes of this market, Large Language Models are 100M+ parameter general-purpose generative text models. A fine-tune of an LLM is ok, but the model cannot be solely trained on chess data. An LLM-based engine may use search, but node evaluation must be performed by invoking the LLM on each node (similar to AlphaZero, which is a DNN+search).

The LLM engine and Stockfish will run on the same hardware with the same time controls. The testing hardware should be either a commodity desktop or equivalent to the TCEC or other popular chess software tournament standards.

Get Ṁ600 play money
Sort by:

No incentive for long term bets anymore, but I think this market should be much lower

@someonec5dd the jig is up, no more free Mana 📉

@jgyou what?

@jgyou lol, that's not the end of free mana. More like the end of loans.

What is definition of LLM engine? Model trained on generic textual data? Can it have other components or must it be pure LLM model?

@Weezing An LLM engine is a chess engine which uses an LLM for node evaluation. It may still use search, but can’t use a non-LLM evaluation function.

@Paul And what is LLM in this context? Can it be trained just on chess specific text (for example chess notation)? Or just generic text?

@Weezing Great question. I’d say that it has to be a /language/ model, meaning general purpose, not chess-only training. A fine tune of a general purpose language model is fine, but a chess-only transformer model is not.

@Paul Wait a sec, this is completely different than what I thought the market was about when I bet! I thought we were betting on whether a LLM by itself could defeat stockfish, not a search engine that uses an LLM just for node eval. I wouldn't think of that as an LLM engine.

Like, taking AlphaGo as an example, it uses a neural net to direct the monte carlo tree search, so it's like half a neural net engine - the other half being the monte carlo tree search of course, which is also crucial to its success. I think calling AlphaGo a "neural network engine" would still be misleading. But using an LLM just for node eval is far less an LLM engine than AlphaGo is a neural network engine.

Also, what's stopping someone from just running the LLM engine run with a ton more compute = more depth than stockfish and "winning" that way? Are you requiring that they use the same amount of compute?

Btw I think the question of whether LLM+search can beat stockfish is much more interesting (because it's more plausible to actually happen), I just think it's extremely unclear from the question description.

@jack thanks for the feedback. I have updated the description to clarify the engine definitions and hardware/timing constraints.

Comment hidden

More related questions