
Which of these models will beat me at chess once released? Resolves YES if they win, NO if I win, and 50% for a draw.
I'm rated about 1900 FIDE. When each of these models are released, I'll play a game of chess with them at a rapid time control. On each move, I'll provide them with the game state in PGN and FEN notation. If the models make three illegal moves, they lose. Responses like Nbd2 vs. Nd2 will not count towards this. I will play white.
Each option will stay open until the model is released, or it will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES.
If I judge that my opponent’s position is hopelessly lost, at the level of being down a rook without compensation, I will submit the current position to a friend. If they agree that the position is lost, the game will be adjudicated as a win for me.
The current system prompt is below. This may change over time.
“Let’s play a game of chess! I will be white, you will be black. On each turn, I will give you the pgn and the fen of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated pgn, etc. Also, do not use any external tools or search queries when making your decision.
If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.”
Update 2025-14-01 (PST) (AI summary of creator comment): - Model Type: Only general language models are being considered; chess-specific models are excluded.
Capabilities: The model must be able to output human languages and code.
Update 2025-05-11 (PST) (AI summary of creator comment): Regarding "Any model before X year" options:
These options will not resolve to 50% based on a draw in an individual game.
Such an option resolves to YES if any model released before the specified year wins its game against the creator.
It resolves to NO if no model released before the specified year wins its game against the creator (i.e., all relevant games are losses for the models or draws).
Update 2025-06-02 (PST) (AI summary of creator comment): For model series options (e.g., "Any Claude 4 model"):
The creator may resolve the option for the entire series after playing against one or more models from that series.
If the creator decides not to play additional models from that specific series, the option for the entire series will be resolved based on the outcome(s) of the game(s) played against models from that series up to that point (e.g., to NO if the tested model(s) lost and no further models from that series will be played).