Will GPT-4 solve any freshly-generated Sudoku puzzle? (2023)

407

3.9kṀ460k

resolved Dec 12

Resolved

YES

ALL

Resolves YES if someone finds a fixed prompt as defined in the main market that succeeds at solving any Sudoku puzzle listed at Sudoku - Free daily Sudoku games from the Los Angeles Times (latimes.com ) that was generated after the comment was posted.

You are allowed to experiment with ChatGPT, but judging will be done with the API with temperature set to 0 for reproducibility.
Any puzzle - easy, medium, or hard - will qualify. No other puzzle provider is allowed for this market.
Solution must be posted in the comments in 2023, and later confirmation of solution must also be posted in the comments. Market creator will not proactively check solutions against every new puzzle, but will check solutions that are found and posted.

Related markets

Main market: /Mira/will-a-prompt-that-enables-gpt4-to
GPT-3.5 no finetuning: /Mira/will-gpt35-solve-any-freshlygenerat
GPT-3.5 finetuning allowed: /Mira/will-finetuned-gpt35-solve-any-fres
GPT-4: /Mira/m100-subsidy-will-gpt4-solve-any-fr-c5b090d547d1

Technical AI Timelines

GPT-4 speculation

GPT-4

Derivative Markets

GPT-4 Sudoku Challenge (2023)

Contests

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ25,956
2		Ṁ14,016
3		Ṁ7,296
4		Ṁ5,307
5		Ṁ2,851

People are also trading

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will GPT-4 escape?

5% chance

Video generation model solves Sudoku puzzles by EOY 2026?

Sort by:

predictedYES

The resolving solve was the December 7 puzzle which took $8 and 2.4 hours. Transcript is here.

I also have a solve on October 1 which doesn't qualify but shows that the solving is repeatable. And 2 failures on some more December puzzles that I'm still looking at.

But it has solved a single puzzle, as this market requires.

predictedYES

I believe I have a solve on the October 1 puzzle I was using to test my script. I have closed the 3 "Will GPT-4 solve any puzzle?" markets so that people don't trade on seeing my Github push. I still need to test a recent December puzzle before I can resolve them. I'll reopen the markets if it fails December.

predictedYES

@Mira I have left a comment on the main market with a transcript for the December 7 puzzle, which was solved using @EmilyThomas ' technique.

https://manifold.markets/Mira/will-a-prompt-that-enables-gpt4-to#o5AV2qSKPLxKvjlR4bmd

I will wait on resolving this market to people a chance to review it for mistakes, and so that I can test more puzzles.

predictedNO

Why it close

predictedNO

Why is this market going slightly down? Is the verification not going well?

predictedYES

@TamasSzelei Could just be timer running out combined with people freeing up mana for other things

predictedNO

@Tumbles I guess. I accepted my loss and I’m holding until the end. I’m honestly very surprised that it seems to work (and also slightly annoyed that it got a big uprade way after many of us bet NO, though I admit I still thought it won’t work with the extra context).

predictedYES

Is this just waiting for validation of the solve to resolve?

predictedYES

A link here to the recent post on the main market (comment link), for anyone not on it and confused about the price jump.

predictedYES

@EmilyThomas This was supposed to be a much easier challenge - a single solve sounds a lot easier than 20% or 80% in the main market. But it seems like it was nearly as difficult!

I will work on testing it on the next few December puzzles, but it seems promising.

predictedYES

@Mira In terms of getting a solve for this market, there's a function included in the submission that checks if a sudoku can be solved at all using this prompt/method, to avoid wasted attempts.

I wonder what this market will be at on Dec 1 if the puzzle is not yet solved. (Anyone up for a meta market?)

predictedNO

@JoshuaHedlund

predictedNO

Why did this market jump so much today?

predictedYES

@eccentricity openAI made gpt4 better, cheaper, more accessible, which should make this challenge easier & cheaper to try

predictedNO

@Gen I suppose that makes sense, but I think the market has overreacted if the model isn't fundamentally much smarter.

what % of software projects finish on time again? I wonder if we'll have reproducible MVP by 'projected' delivery deadline of jan 1st...

week later "boss I'm taking my vacation this month to work on something...special"

What happened? Did @firstuserhere try to replicate Emily's solution and failed?

predictedYES

@Shump no firstuserhere just likes making irrational bets for some reason /shrug

@Shump It looks like @Mira tried to and it failed to solve the Sudoku https://github.com/Mira-public/manifold-sudoku/blob/main/transcripts/emily-1.000394650060000003008150000039007000457002060800900014000000080900061000015280046.sudoku_log.txt At the bottom of this file is this sudoku:

First of all there are blanks. Secondly the top row far right should be 825 rather than 52_

@JoshuaB Thanks, I was just typing this out

huh, what are we thinking as the cause of this:

1) intentional fraud
2) repairable mistake by emily
3) unrepairable mistake by emily
4) method is flaky/inconsistent with p(solve) ~ .5
5) method is flaky/inconsistent with p(solve) ~ .05

etc

@jacksonpolack i think its still solvable and Emily (who knows this solution the best) can fix it. my bet is on #2

but only 66% on #2 :)

@jacksonpolack How's this for confidence?

My guess is somewhere between 4 and 5. Emily said she intentionally tested her method on a puzzle that can only be solved by elimination. It's possible that the puzzle above does not fit that description

predictedYES

@Shump The problem is that the puzzle above is the same puzzle that Emily put forward as her solve puzzle.

predictedYES

@jacksonpolack Emily's prompt seems like a very strong prompt. The solve only had one mistake (putting a 5 in the 1st row, 7th column instead of an 8) and it managed to keep going with filling in the puzzle in a logically consistent manner until it had nothing left that it could fill in without putting in a number that would not be allowed.

@JoshuaB Incredible! Though fixing with that level of granularity...

People are also trading

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will GPT-4 escape?

5% chance

Video generation model solves Sudoku puzzles by EOY 2026?

24% chance

Related markets

🏅 Top traders

People are also trading

People are also trading

Related questions