(M1000 subsidy!) Will a prompt that enables GPT-4 to solve easy Sudoku puzzles be found? (in April)
Basic
44
140k
resolved May 4
Resolved
NO

This market predicts whether GPT-4 will be able to solve an "easy" Sudoku puzzle before May 1, 2023.

Based on /Mira/will-a-prompt-that-enables-gpt4-to . The criteria here are the same, just with a shorter timeframe.

Resolves YES if:

  • A fixed prompt is found that enables GPT-4 to reliably solve freshly-generated easy-rated Sudoku puzzles from any reputable Sudoku site, using only its language modeling capabilities and context as memory.

Resolves 50% if:

  • A fixed prompt is found that enables GPT-4 to occasionally solve Sudoku puzzles.

Resolves NO if:

  • No fixed prompt that enables GPT-4 to even occasionally solve easy-rated Sudoku puzzles using the specified conditions by the deadline.

Resolves as NA if:

  • The market creator retains the right to mark this market as NA or to modify the rules within the first week for any reason or no reason. Otherwise, this market does not resolve NA.

Definitions:

  • GPT-4 refers to either ChatGPT's GPT-4, or any model using the OpenAI Chat Completions API. "gpt-4" and "gpt-4-32k" are currently-known model ids, but anything labeled GPT-4 would count including the upcoming image support.

  • Easy-rated Sudoku puzzle means a puzzle classified as easy by any reputable Sudoku site or puzzle generator. A newspaper that regularly publishes Sudokus would be a great candidate. The puzzles cannot be trivial, and publication for humans is intended to reject trivial puzzles. If there's a dispute about whether a puzzle generator gives trivial puzzles, I will hold a poll.

  • Fixed-prompt means that everything except the Sudoku puzzle provided to GPT-4 remains the same. The user may provide GPT-4 with instructions, but these instructions must not change for each puzzle. A solution must be found within 50 turns. So a prompt encoding an exponential-sized backtracking algorithm would not be a solution, but one that encourages ChatGPT to try simple logical constraints would likely work. Multimodal support is allowed to be used. The human cannot give information to GPT-4 beyond the initial puzzle, so their inputs must be static. (e.g. just saying "continue" if ChatGPT runs out of output space and stops).

  • Language modeling capabilities means that GPT-4 is not allowed to use any external tools, plugins, or resources to aid in solving the Sudoku puzzle. It must rely solely on its language modeling capabilities and the context provided within the prompt.

  • Reliably means the prompt succeeds at least 80% of the time, on freshly-generated puzzles.

  • Occasionally means the prompt succeeds at least 20% of the time, on freshly-generated puzzles.

  • Terms can be adjusted within one week after market creation. After that, terms can only be refined to have narrower meanings or to have additional examples added.

Example Sudoku puzzles

21....487

8..3.2.91

9.5.71...

..759.61.

56...3..2

4.16..7..

.39..7...

7..1...26

1...65..9

Solution:

213956487

876342591

945871263

327594618

568713942

491628735

639287154

754139826

182465379

Get Ṁ600 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,341
2Ṁ1,050
3Ṁ201
4Ṁ153
5Ṁ120
Sort by:

This....literally just works? No fancy prompting required.

predicted NO

@LucHayward This was addressed in the other market https://manifold.markets/Mira/will-a-prompt-that-enables-gpt4-to#9XyoEpgqDdwTwx2dJkfk - you are using a puzzle that GPT already has seen (along with its solution of course). The question very specifically requires freshly generated puzzles.

@Catnee I've not read through your solution yet, and maybe it works, but since it was posted after the market had already closed, and it being no longer April, i don't think it should count for this market. If it works, it'll count for the 2023 market.

predicted YES

@firstuserhere it was posted 12 minutes before market closed, you can check timestamps

predicted NO

@Catnee yeah i just checked/saw that. Weird because when I checked i didn't see any comment, which is why I bet. Maybe my internet was bad, and market didn't load fully. That's why i made the above comment^

don't resolve for now, I have a solution, but it will take some time to analyze it, it is not very good, but it more or less works.

predicted YES

Find all cells that have only one candidate in this Sudoku puzzle in one shot like a champ without any mistakes and list all of them

 

Then check your findings step by step carefully, analyzing every proposed cell by writing existing numbers in the same row, column and square of a proposed cell. don't write your candidate number first, make a guess with a cell, check existing numbers and only then write a complete list of candidates, and only then check if there are more than one

 

If they really are cells with one candidates, fill them by writing new state of the puzzle

 

(optional) Point to the mistakes if there are any

2 1 | | 4 8 7

8 | 3 2 | 9 1

9 5 | 7 1 | _

-------+-------+------

7 | 5 9 | 6 1

5 6 | 3 | _ 2

4 1 | 6 | 7 _

-------+-------+------

3 9 | 7 |

7 | 1 | _ 2 6

1 | 6 5 | _ 9

 

Here are breakdown of squares:

upper-left square contains cells: c(A,A) c(A,B) c(A,C) c(B,A) c(B,B) c(B,C) c(C,A) c(C,B) c(C,C)

upper-center square contains cells: c(A,D) c(A,E) c(A,F) c(B,D) c(B,E) c(B,F) c(C,D) c(C,E) c(C,F)

upper-right square contains cells: c(A,G) c(A,H) c(A,I) c(B,G) c(B,H) c(B,I) c(C,G) c(C,H) c(C,I)

middle-left square contains cells: c(D,A) c(D,B) c(D,C) c(E,A) c(E,B) c(E,C) c(F,A) c(F,B) c(F,C)

middle-center square contains cells: c(D,D) c(D,E) c(D,F) c(E,D) c(E,E) c(E,F) c(F,D) c(F,E) c(F,F)

middle-right square contains cells: c(D,G) c(D,H) c(D,I) c(E,G) c(E,H) c(E,I) c(F,G) c(F,H) c(F,I)

bottom-left square contains cells: c(G,A) c(G,B) c(G,C) c(H,A) c(H,B) c(H,C) c(I,A) c(I,B) c(I,C)

bottom-center square contains cells: c(G,D) c(G,E) c(G,F) c(H,D) c(H,E) c(H,F) c(I,D) c(I,E) c(I,F)

bottom-right square contains cells: c(G,G) c(G,H) c(G,I) c(H,G) c(H,H) c(H,I) c(I,G) c(I,H) c(I,I)

 

Here are analysis by sectors:

 

Rows:

In row A, we have 2, 1, 4, 8, and 7.

In row B, we have 8, 3, 2, 9, and 1.

In row C, we have 9, 5, 7, and 1.

In row D, we have 7, 5, 9, 6, and 1.

In row E, we have 5, 6, 3, and 2.

In row F, we have 4, 1, 6, and 7.

In row G, we have 3, 9, and 7.

In row H, we have 7, 1, 2, and 6.

In row I, we have 1, 6, 5, and 9.

 

Columns:

In column A, we have 2, 8, 9, 5, 4, 7, and 1.

In column B, we have 1, 6, and 3.

In column C, we have 5, 7, 1, and 9.

In column D, we have 3, 5, 6, and 1.

In column E, we have 7, 9, and 6.

In column F, we have 2, 1, 3, 7, and 5.

In column G, we have 4, 6, and 7.

In column H, we have 8, 9, 1, and 2.

In column I, we have 7, 1, 2, 6, and 9.

 

Squares:

Upper-left square (A1 to C3):

2 1 _

8

9 _ 5

Numbers present: 1, 2, 5, 8, 9

 

Upper-center square (A4 to C6):

_

3 _ 2

_ 7 1

Numbers present: 1, 2, 3, 7

 

Upper-right square (A7 to C9):

4 8 7

_ 9 1

_

Numbers present: 1, 4, 7, 8, 9

 

Middle-left square (D1 to F3):

7

5 6 _

4 _ 1

Numbers present: 1, 4, 5, 6, 7

 

Middle-center square (D4 to F6):

5 9 _

3

6

Numbers present: 3, 5, 6, 9

 

Middle-right square (D7 to F9):

6 1 _

2

7

Numbers present: 1, 2, 6, 7

 

Bottom-left square (G1 to I3):

_ 3 9

7

1

Numbers present: 1, 3, 7, 9

 

Bottom-center square (G4 to I6):

7

1

_ 6 5

Numbers present: 1, 5, 6, 7

 

Bottom-right square (G7 to I9):

_

_ 2 6

9

Numbers present: 2, 6, 9

Example of YOUR work:

–example--

I think Cell (F,E) and Cell (D, A) are good candidates, lets analyze them:

 

Cell (F, E): In row F, we have 4, 1, 6, and 7. In column E, we have 9, 7, and 6. In the middle-center square, we have 5, 9, 3, and 6. There are no: 2 and 8. So, we can't fill this cell yet.

 

Cell (D, A): In row D, we have 7, 5, 9, 6, and 1. In column A, we have 2, 8, 9, 5, 4, 7, and 1. In the middle-left square, we have 7, 5, 6, 4, and 1. There are no: 3.

 

Updated puzzle state:

 

2 1 | | 4 8 7

8 | 3 2 | 9 1

9 5 | 7 1 | _

------+------+------

3 7 | 5 9 | 6 1 _

5 6 | 3 | _ 2

4 1 | 6 | 7 _

------+------+------

3 9 | 7 |

7 | 1 | _ 2 6

1 | 6 5 | _ 9

 

Updated sectors states:

In row D, we have 7, 5, 9, 6, 3, and 1.

In column A, we have 2, 3, 8, 9, 5, 4, 7, and 1.

In the middle-left square, we have 7, 5, 6, 4, 3, and 1.

--example end--

DO THIS UPDATES FOR ROWS, COLUMNS AND SQUARES AFTER EVERY STEP

 

I repeat, do not write your candidate numbers first, make a guess about a cell, check existing numbers and only then write a complete list of candidates, and only then check if there are more than one

 

Analyze EMPTY cells for candidates and AS SOON as you find a cell with only one candidate: update puzzle state and corresponding row, column and square states.

 

repeat those steps until puzzle is solved

 

don't apologize or say any of that corporate bullshit

predicted YES

@Catnee hmm, seems like some automatic formatting, anyway, @Mira can confirm that she have a document with a prompt

predicted YES

@Catnee Also my method requires some initial "teaching" that cannot be fully prompted in text without API access

predicted YES

@Catnee for now, I can show at least one good example where it started to work more or less as intended. I can't show more yet, because messages with GPT-4 are capped at 25 per 3 hours

predicted NO

@Catnee This is the first solution that's been submitted. I'll judge this one according to the rules in the main market, since @jack intended this to be a mirror with only a different closing date.

  1. Was the candidate posted in the comments by close-time? YES. About 11 minutes before close.

  2. Does the candidate match the fixed-prompt format?

YES - looks like it's equivalent to this:

prompt 0 = [
    insert fixed text beginning with "Find all cells" and ending with "there are any",
    insert a representation of the puzzle,
    insert fixed text "Here are breakdown of squares:[...]",
    insert a different representation of the puzzle,
    insert fixed text beginning with "Example of YOUR work:" and ending with "don't apologize or say any of that corporate bullshit",
    insert assistant response
]

prompt n = [insert fixed text "continue", insert assistant response]

This prompt does not have explicit rules to drop text from its context, relying instead on ChatGPT's rolling summarization. If I need to judge using the API, I would drop older (continue, assistant response) pairs but leave everything in prompt 0.

  1. Is the puzzle representation acceptable? Your first representation(a classic Sudoku puzzle) is acceptable. A second representation is something not specified in the original rules, but I will accept having multiple representations because you could always claim the text in-between is part of the representation.

My main concern with the second representation:

(See the examples at the end of this description. Slight variations such as replacing "0" with "." or " " would be accepted, but not converting the puzzle to a system of logical constraints.)

However, in the main market, I allowed a similar format: https://manifold.markets/Mira/will-a-prompt-that-enables-gpt4-to#ijrIFCANuYyTemGsMwWY

Listing out the rows and columns should be accepted, since I already approved that. The squares... listing the individual numbers should be accepted similarly to rows and columns; the 3x3 layout I'll accept because they're part of the main puzzle too.

So YES - this puzzle representation is acceptable.

  1. I will test this against the most recent 5 puzzles published in the NY Times, and if it solves at least one I will do more testing against an expanded pool. My pool for this market would be "all easy puzzles published by NY Times in May 2023", so we would need to delay resolution for up to 1 month to confirm. I'll use ChatGPT4 for the initial 5, since that is what you used to develop this candidate solution; more extensive testing I would use Playground or the API. I will perform no error correction, only try each puzzle once, and post a transcript here later.

If you have no objections with this process @Catnee , I will begin testing your solution against 5 Sudoku puzzles.

predicted YES

@Mira Part of my solution requires some initial dialog, i didn't had much time to document it, since i can't just copypaste it as a part of a prompt. Maybe i should clarify its implementation with more details, but that would technically be outside of time bounds

predicted NO

@Catnee @Mira Am I still waiting or has this been settled?

predicted YES

@jack I don't know, Mira hasn't confirmed anything to me yet

predicted NO

@jack You're still waiting, but it's likely going to resolve NO. I'm taking the opportunity to write a script to do the judging.

predicted NO

@Catnee @jack Resolves NO. I tested against 5 puzzles from the LA Times easy Sudokus. LA Times was chosen over NY times because they have an archive of past puzzles, and I'll be changing the presumptive source to them. Every "first thing that looks like a solved Sudoku" had at least two duplicates in some square when I looked.

Each transcript has the complete conversation history - the first couple it generated the Sudoku in one turn, but April 28 took 3 turns, and April 27 took 4 turns. But earlier in development, slight variations of your prompt would make it run for 10 or even 20 turns, which is why I decided to write a script.

It cost me $43 in API calls total: $35 while developing the script, and $8 to run it for these transcripts. GPT-4 wrote about half the code for working with Sudokus, including your row/column/box representation, even though it can't solve Sudokus itself.

Transcripts:

Catnee Sudoku - LA Times 2023-04-27 - Pastebin.com

Catnee Sudoku - LA Times 2023-04-28 - Pastebin.com

Catnee Sudoku - LA Times 2023-04-29 - Pastebin.com

Catnee Sudoku - LA Times 2023-04-30 - Pastebin.com

Catnee Sudoku - LA Times 2023-05-01 - Pastebin.com

You did not specify how the conversation continues after that initial prompt, so I used the word "continue", kept the last Assistant response in context, and dropped anything the original prompt and the last two responses.

I'll write a post on the main market soon with instructions on how you can reproduce these transcripts if you want to spend $8 on API calls.

Ok wait. Is self-looping allowed? Meaning that the model attaches the next question for itself to the initial answer, and that question is mirrored back to the model.

predicted YES

@Swordfish42 I'm not sure I understand what that means, but GPT4 should be able to solve freshly generated puzzles., it can't generate its own puzzles.

You can ask for clarifications in Mira's market https://manifold.markets/Mira/will-a-prompt-that-enables-gpt4-to and I'll follow the same rules.

predicted YES

What if someone finds one that works for the old ChatGPT based on GPT-3.5 ?

predicted YES

@Swordfish42 That doesn't count unless they find one that works for GPT4 - same as Mira said on the other market. It would also be very surprising if that were the case.

Why did you put 1 Jan 2024 as market close date?!?

@R2D2 Because I duplicated Mira's market. Fixed.