Will a Large Language Model beat me at chess this year?
13
100Ṁ873
resolved Jan 1
Resolved
NO

I’m rated around 1900 FIDE. At the end of the 2024 I’ll play a game against an LLM at a rapid time control, selected from the top 3 of the leaderboard (https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard). Resolves YES if I lose, NO if I win, and 50% for a draw.

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ31
2Ṁ29
3Ṁ22
4Ṁ19
5Ṁ17
Sort by:

Thanks for all traders who participlated in this market. I played a game against o1, which I won quite easily. Here is the PGN:

1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O Nxe4 5. Re1 Nd6 6. Nxe5 Nxe5 7. Rxe5+ Be7 8. d4 Nxb5 9. c4 Nd6 10. c5 Nc4 11. Re2 O-O 12. b3 Na5 13. Nc3 d6 14. Bf4 Bg4 15. Nd5 Bxe2 16. Qxe2 Nc6 17. cxd6 Bxd6 18. Rd1 Re8 19. Bxd6 Rxe2 20. Nxc7 Qxd6 21. Nb5 Rae8 22. Nxd6 Re1+ 23. Rxe1 Rxe1#

If you're interested in markets like these, please check out my new market which includes GPT-5, Grok 3, Claude 3.5 Opus, and others:

Is this going to be resolved

@Blocksterpen3 working on it today

we can only hope.

What prompt will you be using? I imagine that changes their performance quite a bit

Good point! On each move, I’ll provide it the moves played so far in PGN notation, as well as the current position in FEN notation. This way both ways of representing position would be in context and in a standard format.

I think that makes the model significantly worse than it could otherwise be. I'd recommend using whatever prompt someone that claims "SOTA LLM chess" or something came up with

I’m planning to use lichess to play the game, and those are the representations it provides. In a future market this might change.

bought Ṁ43 NO

When I tested this with ChatGPT 3.0 a while back, it couldn't even remember the board position and kept making illegal moves. How will you resolve if it does this?

Let’s say three illegal moves will result in a loss. Distinctions like Rad1 vs. Rd1 won’t count towards this, but I’ll ask it for clarification.

© Manifold Markets, Inc.TermsPrivacy