Resolves YES if someone finds a fixed prompt as defined in the main market that succeeds at solving any Sudoku puzzle listed at Sudoku - Free daily Sudoku games from the Los Angeles Times (latimes.com) that was generated after the comment was posted.
You are allowed to experiment with ChatGPT, but judging will be done with the API with temperature set to 0 for reproducibility.
Any puzzle - easy, medium, or hard - will qualify. No other puzzle provider is allowed for this market.
Solution must be posted in the comments in 2023, and later confirmation of solution must also be posted in the comments. Market creator will not proactively check solutions against every new puzzle, but will check solutions that are found and posted.
Related markets
Main market: /Mira/will-a-prompt-that-enables-gpt4-to
GPT-3.5 no finetuning: /Mira/will-gpt35-solve-any-freshlygenerat
GPT-3.5 finetuning allowed: /Mira/will-finetuned-gpt35-solve-any-fres
GPT-4: /Mira/m100-subsidy-will-gpt4-solve-any-fr-c5b090d547d1
The resolving solve was the December 7 puzzle which took $8 and 2.4 hours. Transcript is here.
I also have a solve on October 1 which doesn't qualify but shows that the solving is repeatable. And 2 failures on some more December puzzles that I'm still looking at.
But it has solved a single puzzle, as this market requires.
I believe I have a solve on the October 1 puzzle I was using to test my script. I have closed the 3 "Will GPT-4 solve any puzzle?" markets so that people don't trade on seeing my Github push. I still need to test a recent December puzzle before I can resolve them. I'll reopen the markets if it fails December.
@Mira I have left a comment on the main market with a transcript for the December 7 puzzle, which was solved using @EmilyThomas ' technique.
https://manifold.markets/Mira/will-a-prompt-that-enables-gpt4-to#o5AV2qSKPLxKvjlR4bmd
I will wait on resolving this market to people a chance to review it for mistakes, and so that I can test more puzzles.
@TamasSzelei Could just be timer running out combined with people freeing up mana for other things
@Tumbles I guess. I accepted my loss and I’m holding until the end. I’m honestly very surprised that it seems to work (and also slightly annoyed that it got a big uprade way after many of us bet NO, though I admit I still thought it won’t work with the extra context).
A link here to the recent post on the main market (comment link), for anyone not on it and confused about the price jump.
@EmilyThomas This was supposed to be a much easier challenge - a single solve sounds a lot easier than 20% or 80% in the main market. But it seems like it was nearly as difficult!
I will work on testing it on the next few December puzzles, but it seems promising.
@Mira In terms of getting a solve for this market, there's a function included in the submission that checks if a sudoku can be solved at all using this prompt/method, to avoid wasted attempts.
@eccentricity openAI made gpt4 better, cheaper, more accessible, which should make this challenge easier & cheaper to try
@Gen I suppose that makes sense, but I think the market has overreacted if the model isn't fundamentally much smarter.
what % of software projects finish on time again? I wonder if we'll have reproducible MVP by 'projected' delivery deadline of jan 1st...
week later "boss I'm taking my vacation this month to work on something...special"
@Shump It looks like @Mira tried to and it failed to solve the Sudoku https://github.com/Mira-public/manifold-sudoku/blob/main/transcripts/emily-1.000394650060000003008150000039007000457002060800900014000000080900061000015280046.sudoku_log.txt At the bottom of this file is this sudoku:
First of all there are blanks. Secondly the top row far right should be 825 rather than 52_
@jacksonpolack i think its still solvable and Emily (who knows this solution the best) can fix it. my bet is on #2
@Shump The problem is that the puzzle above is the same puzzle that Emily put forward as her solve puzzle.
@jacksonpolack Emily's prompt seems like a very strong prompt. The solve only had one mistake (putting a 5 in the 1st row, 7th column instead of an 8) and it managed to keep going with filling in the puzzle in a logically consistent manner until it had nothing left that it could fill in without putting in a number that would not be allowed.