(M20000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023)
36% chance. This market predicts whether GPT-4 will be able to solve "easy" Sudoku puzzles by December 31, 2023. I've tried some prompts, but it mostly pretends to do logical deduction and guesses, ends up in an inconsistency(or doesn't notice), and eventually gets stuck in a loop making mistakes.
Consider this market a challenge. Also see the group for this and related markets(GPT-4 Sudoku Challenge (2023) | Manifold Markets).
Resolves YES if:
A fixed prompt is found(and posted in the comments) that enables GPT-4 to reliably solve freshly-generated easy-rated Sudoku puzzles from Sudoku - Free daily Sudoku games from the Los Angeles Times (latimes.com), using only its language modeling capabilities and context as memory.
Resolves 50% if:
A fixed prompt is found(and posted in the comments) that enables GPT-4 to occasionally solve Sudoku puzzles.
Resolves NO if:
No fixed prompt that enables GPT-4 to even occasionally solve easy-rated Sudoku puzzles using the specified conditions is posted in the comments by December 31, 2023.
Resolves as NA if:
This market does not resolve NA.
Definitions
GPT-4 refers to either ChatGPT's GPT-4, or any model using the OpenAI Chat Completions API. "gpt-4" and "gpt-4-32k" are currently-known model ids, but anything labeled GPT-4 would count including the upcoming image support. The API is preferable since setting temperature to 0 will allow the judge to replicate your responses, but if your prompt has a high success rate ChatGPT could also be accepted. See the definitions of "reliably" and "occasionally" below for details on computing the success rate if more precision is needed.
Easy-rated Sudoku puzzle means a puzzle classified as easy by any reputable Sudoku site or puzzle generator. This market plans to use the LA Times(Sudoku - Free daily Sudoku games from the Los Angeles Times (latimes.com)) for judging, but I maintain the option to use a different Sudoku generator.
Fixed-prompt means that everything except the Sudoku puzzle provided to GPT-4 remains the same. The prompt may provide GPT-4 with instructions, but these instructions must not change for each puzzle. A solution must be found within 50 turns. Multimodal support is allowed to be used. The operator cannot give information to GPT-4 beyond the initial puzzle, so their inputs must be static. (e.g. just saying "continue" if ChatGPT runs out of output space and stops).
Formal definition of Solution
Given: A Chat Completion API entry is a pair (tag, message), where tag is one of "system", "user", "assistant", and message is any UTF-8 string. When multimodal GPT-4 is released, message can also be an image.
Given: A Turn is a pair (entries, response), where entries is a list of Chat Completion API entries and response is the UTF-8 encoded string that GPT-4 generates.
Given: A Transition Rule maps one list of entries to another list of entries, using the primitive operations:
Remove entry at fixed index(from beginning or end)
Insert a fixed message at a fixed index(from beginning or end).
Insert a representation or rendering of the initial Sudoku puzzle at a fixed index(from beginning or end). Since GPT-4 does not generate images, you won't be able to render a partially-solved board and submit that.
Insert the ChatGPT response to the input entry list to any fixed index(from beginning or end)
Given: A Fixed-prompt is any sequence of transition rules.
Given: The Operator is the human or program that is executing a fixed-prompt against the OpenAI API.
Given: A Sudoku puzzle and a solved Sudoku puzzle are strings that the operator subjectively accepts as being these. (See the examples at the end of this description. Slight variations such as replacing "0" with "." or " " would be accepted, but not converting the puzzle to a system of logical constraints.)
Then a Solution for the purposes of this market is a fixed-prompt satisfying all of:
"initial Sudoku puzzle" is bound to a specific such string or image.
The transition rules are applied for 50 turns to get a maximum of 50 GPT-4 responses.
The operator subjectively scanning for the first thing that looks like a solved Sudoku puzzle in those responses and then stopping, is able to input the solution into a Sudoku checking tool and confirms that it is a solution to the initial Sudoku puzzle.
Examples
The simplest valid pattern is:
("User", <some initial prompt>)
("User", <provide puzzle>)
("Assistant", response 0)
("User", "continue") ;; or any other fixed input
("Assistant", response 1)
("User", "continue")
....
("User", "continue")
("Assistant", solution)
With at most 50 "Assistant" entries(50 turns). The only "dynamic" input here is entry #2 which has the puzzle, and the rest is ChatGPT's responses. So this counts as a "fixed prompt" solution. You're allowed to insert more prompts into the chain after the puzzle, as long as the decision to include them or their contents do not depend on the puzzle. For example, you might have a prompt that causes ChatGPT to expand the puzzle into a set of logical constraints. You're allowed to drop sections from the chain when sending context to GPT-4 , as long as the decision to drop does not depend on the contents of any section.
Candidate solutions will be converted to code and run using a script(Mira-public/manifold-sudoku (github.com)). You are not required to interact with this script when submitting a solution, but @Mira will attempt to use it to judge your solution so it may help in understanding the format.
Language modeling capabilities means that GPT-4 is not allowed to use any external tools, plugins, recursive invocations, or resources to aid in solving the Sudoku puzzle. It must rely solely on its language modeling capabilities and the context provided within the prompt. This is less relevant when using the API or Playground, and more relevant to using ChatGPT.
Reliably means the prompt succeeds at least 80% of the time, on freshly-generated puzzles. Occasionally means the prompt succeeds at least 20% of the time, on freshly-generated puzzles. I will run any proposed solution against 5 puzzles, with more testing to be done if it succeeds at least once or if there is disagreement in the comments about whether it meets a threshold(perhaps I got very unlucky). More testing means choosing a fixed pool of puzzles and calculating an exact percentage. I currently plan to choose "all easy-rated Sudoku puzzles in January 2024 from LA Times" as my pool. Since judging solutions requires me spending real money on API calls, I may optionally require collateral to be posted: $10 of mana(Ṁ1000) for quick validation, and $100 of mana(Ṁ10k) for extended validation. Collateral will be posted as a subsidy to an unlisted market that resolves NA if the candidate passes testing, or collected equal to Mira's API costs if not. Anyone can post collateral for a candidate, not just the submitter. Detailed testing will be done with the API set to temperature 0, not ChatGPT.
@Mira as market creator will trade in this market, but commits not to post any solution, or to provide prompts or detailed prompting techniques to other individuals. So if it resolves YES or 50%, it must be the work of somebody other than Mira.
Example Puzzles
From Sudoku - New York Times Number Puzzles - The New York Times (nytimes.com) on March 28. 2023, "Easy"
210000487
800302091
905071000
007590610
560003002
401600700
039007000
700100026
100065009
Solution:
213956487
876342591
945871263
327594618
568713942
491628735
639287154
754139826
182465379
Edit History
Mar 26, 2:53pm: Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023) → (M1000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023)
Mar 27 - Clarified that judging will use freshly-generated puzzles.
Mar 29 - Added example with Chat Completions API to help specify allowed prompts.
Apr 3 - Clarified that dropping Chat Completion API turns is allowed.
Apr 20 - Added a more formal description of the solution format.
Apr 21 - Candidate solutions must be posted in the comments before market close.
Apr 27, 6:43am: (M1000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023) → (M11000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023)
Apr 30, 1:57am: (M11000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023) → (M20000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (2023)
April 30, 2:57 am: Added that the percentage is defined against a fixed pool of puzzles, if it solves at least one in a preliminary test of 5.
April 30, 5:37 am: Judging will be done with the API. ChatGPT may be accepted if it has a high success rate, but if there's any debate I will use the API with temperature 0. New York Times is chosen as the presumptive source of Sudoku puzzles.
May 5, 2 pm: Link to script on Github, changed puzzle provider to LA Times.
May 7, 3 pm: Details on posting collateral for API costs.
July 16, 7:38 AM: @Mira conflict of interest commitment.