Will Nikola get GPT-4 to solve basic sudoku puzzles by next Monday?
24
217
490
resolved May 12
Resolved
YES

He'll be trying to find a prompt (just the API, no external tools) in order to solve the NYT easy sudoku: https://www.nytimes.com/puzzles/sudoku/easy. But if he ends up solving it using GPT-4's tools (e.g. browsing), I'll still resolve this market positively. Interactive prompting protocols (e.g. prompting the model multiple times, or getting it to check its work) are allowed so long as the prompter isn't using any logic that incorporates their knowledge of solving sudoku puzzles (edge cases resolved via my judgment).

Resolves positively if the prompting strategy works for more than 75% of held-out puzzles he tests on.

Resolves next Monday, as he'll be doing it over this weekend.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,587
2Ṁ86
3Ṁ82
4Ṁ40
5Ṁ22
Sort by:
bought Ṁ100 of YES

To summarize my previous comments: GPT-4 can one-shot this with minimal instruction (as much as five words alongside the starting board) using the code-interpreter. Max only needs to test my prompts and verify my results.

predicted YES

GPT-4 Code interpreter successfully solved Mira's example from the main sudoku market

predicted YES

@Nikola
Write a python class to solve a sudoku puzzle.

For the class, define a variable called current_board which inputs a 9x9 numpy matrix. Zeroes inside that matrix denote empty cells.

Here is a glossary of terms:

- Rows are labelled 1-9.

- Columns are labelled A-I.

- There are 81 cells labelled A1-I9.

- There are 9 boxes containing 9 cells each:

- UL (Upper Left) contains cells A1-C3.

- UM (Upper Middle) contains cells A4-C6.

- UR (Upper Right) contains cells A7-C9.

- ML (Middle Left) contains cells D1-F3.

- MM (Middle Middle) contains cells D4-F6.

- MR (Middle Right) contains cells D7-F9.

- LL (Lower Left) contains cells G1-I3.

- LM (Lower Middle) contains cells G4-I6.

- LR (Lower Right) contains cells G7-I9.

- Cells can contain integers 0-9. 0 denotes an empty cell.

Define these functions:

check_missing_in_row(x,y): this function checks which integers from 1-9 are not in a specific row

check_missing_in_column(x,y): similar except for column

find_box(x,y): returns the string of the 3x3 box a specific cells belongs to

find_missing_in_box(box_string): returns which integers 1-9 are not in a in a 3x3 box

find_candidates_for_cell(x,y): for an empty cell, returns which integers are neither in its box, its row, or its column

find_all_coordinates_of_empty_cells(): returns all coordinates of cells that contain 0

Define the variable boxes, which contains a dictionary of the box string names and the indices they correspond to.

You will run the following algorithm:

While the number of empty cells is larger than 0:

For each cell:

If cell is empty (contains the integer 0):

If the number of candidates for possible values in that cell equals 1, replace the 0 in that cell with the only possible candidate.

After the while loop, print the final (solved) board state.

here is the initial board state:

board_string = """

210000487

800302091

905071000

007590610

560003002

401600700

039007000

700100026

100065009

"""

predicted YES

@Nikola Output:
[[2 1 3 9 5 6 4 8 7] [8 7 6 3 4 2 5 9 1] [9 4 5 8 7 1 2 6 3] [3 2 7 5 9 4 6 1 8] [5 6 8 7 1 3 9 4 2] [4 9 1 6 2 8 7 3 5] [6 3 9 2 8 7 1 5 4] [7 5 4 1 3 9 8 2 6] [1 8 2 4 6 5 3 7 9]]

This exactly matches the example's solution:

213956487

876342591

945871263

327594618

568713942

491628735

639287154

754139826

182465379

predicted YES

@Nikola Does it really require such an elaborate prompt? I don't have access to the plugins but I was under the impression that "Write a python script that can solve sudoku puzzles and show me output for this puzzle where 0 represents an empty cell" would work.

predicted YES

@light It does not. Here's a much simpler prompt:

Write a python class to solve a sudoku puzzle.

Write functions that find in which cell you can put which possible integers. Then, if a cell has only one candidate, put that integer inside of it.

Here is the starting board:

board_string = """

210000487

800302091

905071000

007590610

560003002

401600700

039007000

700100026

100065009

"""

predicted YES

@Nikola Amazingly, this one works too, but it uses a more brute-force approach:

Write a python script to solve a sudoku puzzle.

Here is the starting board, with zeros representing empty cells:

board_string = """

210000487

800302091

905071000

007590610

560003002

401600700

039007000

700100026

100065009

"""

predicted YES

@Nikola How about 0 instructions. Like creating a .txt file that looks like this


# represents 0 empty cells
sudoku = """210....
"""
# solution


and giving it just that.

predicted YES

@light It can even solve the hard NYT puzzle of today, try this one;
board_string = """

900003607

002008005

036020000

000000000

750004031

309200000

090040506

040056002

000000000

"""

predicted YES

@light it can't zero shot solutions on its own, it needs to write code in order to solve

predicted YES

@Nikola Here's a minimal prompt:
# 0 represents empty cells

board_string = """

210000487

800302091

905071000

007590610

560003002

401600700

039007000

700100026

100065009

"""

# solve

predicted YES

@Nikola Who knew it would take roughly 5 words

predicted YES

@Nikola Ah I see. I saw in the examples that it does some processing automatically when given files but it makes sense since its just a text file without any goal.

Oh wow. And you didn't even tell it it was a sudoku.

bought Ṁ100 of YES

Mira stop buying no, I literally just solved it using the code interpreter

bought Ṁ5 of NO

@Nikola RIP - I thought "no external tools" was going to exclude that, and you were using the ChatGPT-based interactive prompting strategy you were asking about.

Ah so plugins are allowed in this one. Is it confirmed that plugins are used with GPT-4? From the demonstrations I've seen the plugins usually have green icon (GPT-3.5) and seem to output faster than GPT-4 (see here https://www.youtube.com/watch?v=YfQ4Weg0s3A). I'll assume you'll still resolve Yes no matter the model.

Since both models can easily write code that solves sudoku the Code Interpreter plugin should be able to solve it.