Will a Large Language Model beat me at chess this year?
Mini
11
429
Jan 1
24%
chance

I’m rated around 1900 FIDE. At the end of the 2024 I’ll play a game against an LLM at a rapid time control, selected from the top 3 of the leaderboard (https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard). Resolves YES if I lose, NO if I win, and 50% for a draw.

Get Ṁ1,000 play money
Sort by:

we can only hope.

What prompt will you be using? I imagine that changes their performance quite a bit

Good point! On each move, I’ll provide it the moves played so far in PGN notation, as well as the current position in FEN notation. This way both ways of representing position would be in context and in a standard format.

I think that makes the model significantly worse than it could otherwise be. I'd recommend using whatever prompt someone that claims "SOTA LLM chess" or something came up with

I’m planning to use lichess to play the game, and those are the representations it provides. In a future market this might change.

bought Ṁ43 NO

When I tested this with ChatGPT 3.0 a while back, it couldn't even remember the board position and kept making illegal moves. How will you resolve if it does this?

Let’s say three illegal moves will result in a loss. Distinctions like Rad1 vs. Rd1 won’t count towards this, but I’ll ask it for clarification.