Did kenshin9000 "beat all chess engines" with GPT-4?
resolved Apr 1

Tweet embed:

3. Approximately 10 days later, I will release a full chess engine based on GPT4, whose code/prompts anyone will be able to inspect and run against any other chess engine. GPT4's "performant output" will beat every other chess engine in existence in a tournament of any size.

Resolves YES or NO if there's consensus.

If it's ambiguous, the resolution will be "Does Kenshin's code win a game against Stockfish?"

If he never releases code(March 31, 2024 is the maximum cutoff) or he releases code that isn't a functioning chess engine, resolves NO.

If he releases his code and a moderator sees this market, please close it for trading.

I will be lenient on what counts as "using GPT-4". Any code that beats Stockfish and uses GPT-4 even tangentially will count. If he uses GPT-4 to generate slightly different reward functions but calls out to a chess engine to do heavy lifting, I will resolve YES if it wins a single game vs. Stockfish.

See also: /Mira/will-kenshin9000-release-a-function



Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
Sort by:

Kenshin9000's work was used as a basis to solve a $10,000 prompting challenge recently.

The NO bettors here failed to solve this challenge, so their opinions on the power of LLM prompting techniques may be underdeveloped compared to Kenshin's and thus more likely to be wrong.

@Mira_ Kenshin may be very knowledgeable about LLM prompting techniques, but for this application, that is overwhelmed by a lack of knowledge about how chess engines work!

Alphazero is a machine-learning model trained SPECIFICALLY for chess and even it is only marginally better than stockfish (although I have no doubt that ML engines will continue to improve)… it’s lunacy to think that GPT-4 contains something better than alphazero within it

@benshindel GPT-4 can’t even multiply numbers reliably, it’s very obvious that it has blind spots for the kind of precision and evaluation that a chess engine requires. There’s no way around calculating lines in certain chess positions, and even maximally effective prompting can’t engineer a way around that!

@Mira_ If you'd asked me whether the A B challenge is solvable with prompts I would've said yes, even when GPT 3.5 was the strongest engine around (I fully believe the challenge is solvable on GPT 3.5, maybe fine-tuned 3.5 - too busy to try though). I still heavily doubt kenshin's chess engine claim.

The main reason I doubt kenshin's claim is not really that I think LLMs are in principle incapable of being decent (if expensive-to-run) chess engines, but that I think they don't currently have an advantage over specialized chess engines in reasoning about chess and searching for proper moves. It is possible that GPT(4+n) will be able to beat current stockfish engines, but I don't see it happening with GPT4, assuming bounded computation time.

Follow up permanent market, with timings up to 2100 and stricter criteria: https://manifold.markets/Paul/by-when-will-kenshin9000-or-anyone?r=UGF1bA

@someonec4dd starting to get worried yet?

@jim it's Manifold who should be worried, this is all leveraged on house Mana.

@jim I already knew he probably wasn't releasing in March. But it doesn't really matter to me, since I am not actually going to lose anything. It's just fake money so doesn't matter to me.

@someonec4dd so u log on just to buy YES on a market that you know will resolve NO and also even if it did resolve YES you wouldn't care because it's just fake money that doesn't matter to u 🤔

i respect it

@someonec4dd I mean, you know that there IS monetary backing to mana, to some degree, right? Like, you did a significant amount of harm by betting with leveraged mana, this makes it more likely that manifold will stop allowing people to donate to charity using mana. There comes a time when it becomes a little rude to be this bad at betting

@NeoPangloss Unless I'm missing something, the total amount of mana increases every day since they give away so much for free. So, I'm not sure what your point is.

@someonec4dd This is actually the real reason I come to this website less often now.

@someonec4dd you get a quarter a day in mana equivalent when you come to the site and bet, which is what manifold is for. The loans are zero sum, unless you use them to go into the red in which case you inflated the currency because mana isnt mathematically well founded.

You inflated mana, which is partially cash backed, by a degree equivalent to hundreds of days of normal allowances. Moreso, you threw off the calibration of the site making bets you knew would be bad.

Yes, there is a degree to which intentionally bad predictions become at least a little rude. It makes loans, allowances, and charitable donations less sustainable.

@benshindel The grift is so real

@benshindel Why would they post this?

@Paul I for one can't wait for computer chess to be revolutionized by our long winded friend. Only a man who can write tweets as long as Kenshin could understand what it means to anchor a concept by repeating it 100 times in different ways


All: Kenshin has explicitly stated to me on Twitter that he is not using a chess engine "to help it play or plan".

It's possible he's lying or he doesn't know if he's using an engine, so I'll leave the Stockfish dependency market open. But the presumption should be that he's not just calling out to an engine.

I thought that he wasn't, based on his posts. But since there's been so much discussion, I decided to ask him.

@Mira_ are you changing the resolution criteria of the market? This comment seems to be in contradiction with the description, specifically

If he uses GPT-4 to generate slightly different reward functions but calls out to a chess engine to do heavy lifting, I will resolve YES if it wins a single game vs. Stockfish


But the presumption should be that he's not just calling out to an engine.

@Kire_ I don't see a contradiction. "kenshin9000 is not using a chess engine" is a factual claim, which may affect execution of existing written policy but doesn't change the policy.

If he lies and releases an engine, or if he didn't realize he was using an engine(because it was wrapped in a library), I would still accept it.

@Mira_ in my opinion that lowers the chance of him winning.

Not using engines at all is like using an abstract thinking of a grossmaster - but all of them lose to stockfish, and neural networks do not use the memory optimally.

@Mira_ yeah, so he's talking nonsense when he talks about "searching to a specified depth" and all that.

That is not happening purely through LLMs for the same reason it doesn't happen with humans.

@DavidBolin it might be a non branching search.

At each Node LLM says what the next move would be, but only one such "most promising" move per position down the tree, but starts with like 5 possible moves in the current position.

Having 5 nominations and looking 5 moves deep would be 25 calls. One more to nominate those moves and one more to summarise the results from branches and make a decision. 27 total in this example.

Such thing would not be able to see traps, but it can "look 5 moves deep"

Or it could nominate 2 best moves for each Node, so it is a not-so-fast-growing-search-tree..