Which of these language models will I beat at chess?
6
500Ṁ275
Jun 22
87%
GPT-3.5
80%
Grok 3
66%
GPT-4
66%
GPT-4o
66%
GPT-4o mini
66%
GPT-4.5
66%
o3
66%
o4-mini
66%
o4
66%
DeepSeek-V4
66%
Grok 4
66%
Any model released in 2025
59%
Every model released before 2025
50%
Claude Sonnet 4
50%
Claude Opus 4
50%
Every model released before 2026
41%
GPT-5
40%
Any model released in 2026
34%
Any model released in 2027
34%
Every model released before 2027

Which of these models will I beat at chess? Resolves YES if I win, NO if they win, and 50% for a draw.

Credit for this market goes to @mr_mino, who is much better at chess than I am.

This market should be interesting, as I expect that some existing models could already beat me. I have never played rated chess; I have not played a game of chess of any kind in years.

I will close this market every Saturday. When it closes, I will play a game of chess against the model with the highest market price, if the model is publicly available. Otherwise, I'll move on to the model with the second-highest price, and so on. If no models on this market are available to the public, the market will reopen until one is.

During the game, I may use a chessboard to keep track of the moves. I am not playing blindfold chess. I will not use the Internet or any chess engines during the game.

On each move, I'll provide the LLM with the game state in PGN and FEN notation. If a model makes three illegal moves, it loses. Responses like Nbd2 vs. Nd2 will not count towards this. The model also loses if it attempts to use external tools or the Internet during the game. I will play white. If I make an illegal move, I lose.

An unreleased model will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES.

The "every model released before X year" options resolve YES if, at any point after the start of that year, I have played and won against every listed model in this market that was released before the start of that year, and I am confident I would beat any omitted models from that time period. They resolve NO if I lose or draw against any eligible model released before that year.

The current system prompt is below. This may change over time.

“Let’s play a game of chess! I will be White; you will be Black. On each turn, I will give you the PGN and the FEN of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated PGN, etc. Also, do not use any external tools or search queries when making your decision.

If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.”

Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ10 YES

Cool experiment!

Playing a game against that many models might take a while, I'm curious how much time you will end up spending to play all of them.

© Manifold Markets, Inc.TermsPrivacy