(M100 subsidy) Will GPT-4 with image inputs beat a human at chess in a correspondence game? (2023)
42
221
930
resolved Jul 20
Resolved
N/A

This market predicts whether the earliest released model of GPT-4 that supports image inputs, assisted by Mira, will beat a selected human opponent in a game of chess. The game will be played on Lichess as a correspondence game, with GPT-4 taking the white pieces to provide a slight advantage.

Resolves YES if:

  • GPT-4, with Mira's assistance, wins the chess game against the selected human opponent.

Resolves 50% if:

  • The game ends in a draw.

Resolves NO if:

  • The selected human opponent wins.

  • 1 week passes, and GPT-4's last given move is illegal.

Resolves as NA if:

  1. The game cannot be completed within 1 week after the Trigger date.

  2. The selected human opponent is found to be using a chess engine or receives help from another person.

  3. GPT-4 is judged to have been given unfair prompting.

  4. No date in 2023 is a trigger date.

A Trigger date is a date satisfying:

  • Mira has access to GPT-4 with image inputs

  • Mira has adjusted tooling to support image inputs

  • Mira has tooling to render chess boards into an image

  • Date has been posted to this market

  • Mira and the human opponent have accepted the date for play

The selected human opponent is someone who knows the rules of chess but doesn't play frequently and will not be allowed the use of a chess engine during the game. They have stated "I will not practice, but I will try my hardest to win". They have previously beaten GPT-4 in Will ChatGPT beat a human at chess in a correspondence game? | Manifold Markets , but using text-only inputs.

Mira will act as a human assistant for GPT-4. 6 types of prompts are allowed:

  1. Provide the current state of the board. (I plan to use PGN, FEN, and an image render of the board)

  2. Request a list of candidate moves along with explanations.

  3. Request an analysis of a specific move and its likely continuation. Mira is not allowed to select a specific move for analysis; GPT-4 must select the list of moves to analyze.

  4. Request a ranking of moves from a previously generated list.

  5. Request a specific move be finally chosen given all of the above analysis.

  6. Notification that a move is illegal, along with an explanation of why.

If additional prompts are needed, Mira will exercise subjective judgment. Mira and GPT-4 will not be allowed to access any chess engine during the game. Mira will provide a transcript of prompts used. Please see the transcript in Will ChatGPT beat a human at chess in a correspondence game? | Manifold Markets for an example of a previous prompt style. The prompt style will be similar, except that an image rendering of the chess board will be given along with the FEN and move list.

If there is dispute about whether a prompt was unfair(such as by leaking preference for certain moves to GPT-4), the human opponent will be allowed to review the transcript, the discussion, and judge whether GPT-4 was given unfair advantage.

There is a 1 week time limit on completion of the game after the market closes. Otherwise, no strict time limit for either side on individual moves. If the opponent intentionally delays the game to run out the 1 week time limit, the market would resolve NA but Mira would be disappointed in them. If GPT-4 continues to give illegal moves to delay the game out to 1 week, the market resolves NO because an illegal move is an immediate loss in chess tournaments.

To avoid conflict of interest, Mira will not bet more than a token amount(M$10) in this market.

Get Ṁ200 play money
Sort by:
predicted YES

Bing was just released with image inputs. I'm NA'ing the market because GPT-4 cannot actually see chess boards in sufficient detail to even play chess. So we've agreed not to play the game.

"This is a challenging picture" and both misses pieces(black king) and sees pieces that aren't there(white bishop).

This chess analysis is completely wrong:

predicted NO

Google Bard is the same, or possibly even worse.

Did you check GPT-4's ability to understand the board given a FEN, for your previous challenge? Maybe the invalid moves or some blunders happened because it misinterpreted the board.

It could be interesting to test GPT's ability to play chess if its context contains a simpler representation. Perhaps a list of `${color} ${piece} at ${coordinate}` entries?

bought Ṁ100 of NO

strongly reversing my position after GPT-4's weird cost cutting lobotomy that's been happening lately

bought Ṁ50 of NO

I tested Google Bard's image input. Given a screenshot of a Lichess board, Bard hallucinates badly. It describes a board completely different from the one I gave it.
It is possible that GPT-4 has been trained on chess board images, in which case it might perform much better. But given what I've seen, I'm betting no.

I want to bet yes but this style of prompting will almost certainly hinder it significantly. If you do text only and just make it do moves it’s much better

@Mira do you have access to the multimodal version of GPT-4, or do you expect to have access to it soon-ish?

@R2D2 I'll get it when they release it to the general public. My guess is June/July. I have ChatGPT+ and might use that instead of the API, since I'm planning a simpler prompting strategy this time.

How good is the human?
You said "The selected human opponent is someone who knows the rules of chess but doesn't play frequently and will not be allowed the use of a chess engine during the game."
Does the human have an official chess ranking, or something else beyond knowing the rules?

@YonatanCale See the other market that was linked for an example game and comments on them. It's the same human.

Does anyone have a source on or can explain how does GPT seeing images work?

sold Ṁ0 of YES

@na_pewno They trained an autoencoder that maps to the same embedding space tokens are mapped to.

More related questions