[M5000 subsidy] Will finetuned GPT-3.5 solve any freshly-generated Sudoku puzzle? (2023)
82
1.1K
6.1K
resolved Jan 1
Resolved
NO

Resolves YES if someone finds a fixed prompt as defined in the main market that succeeds at solving any Sudoku puzzle listed at Sudoku - Free daily Sudoku games from the Los Angeles Times (latimes.com) that was generated after the comment was posted.

  • You are allowed to experiment with ChatGPT, but judging will be done with the API with temperature set to 0 for reproducibility.

  • Any puzzle - easy, medium, or hard - will qualify. No other puzzle provider is allowed for this market.

  • Solution must be posted in the comments of any Manifold market in the "GPT-4 Sudoku Challenge" group in 2023, and later confirmation of solution must also be posted in the comments. Market creator will not proactively check solutions against every new puzzle, but will check solutions that are found and posted.

  • Any variant of GPT-3.5 is allowed: ChatGPT(using the green icon), gpt-3.5-turbo, gpt-3.5-turbo-instruct

  • Finetuning GPT-3.5 is allowed, but the puzzle must be published after the model's creation.

  • The number of allowed turns is increased to 200, so the 4k context is equivalent to the 32k context GPT-4 in token count.

Related markets

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ1,209
2Ṁ923
3Ṁ508
4Ṁ404
5Ṁ321
Sort by:

Resolves NO by default because no candidate was given for testing.

boughtṀ100YES

@MrLuke255 I'm willing to validate going slightly(~5 days) into January as long as the prompt and model are finished and posted in December, and as long as you believe you solved a fresh puzzle in December.

bought Ṁ100 of YES

Relevant paper about fine-tuning GPT-2 for solving puzzles, including sudoku: https://arxiv.org/pdf/2109.02797.pdf

predicted YES

Is it just me or the sudoku doesn't work? 😐

bought Ṁ10 of YES

@Mira Could you check? Sudoku doesn’t seem to work for me on this site you linked

predicted NO
predicted YES

@Mira That's weird. Could it possibly work only in the US?

predicted NO

@MrLuke255 Try a different browser or VPN maybe. Or join the Discord: https://discord.gg/Y6qvtB5xPD and if you have a solution but are limited on eligible puzzles I'm sure somebody would get you a feed.

predicted YES

@Mira I don't have yet, but I plan to try the fine-tuning approach. If neural nets can be trained to solve sudokus, why not transformers? But that probably also depends on how the fine-tuning in OpenAI's version works

@MrLuke255 fine-tuning ≠ training.

Fine-tuning is much closer to prompt-engineering, for what it lets you achieve.

predicted YES

@Benx In this case you might be right. But in general ML fine-tuning is a common way of adapting existing model to new domains

@MrLuke255 the link does not work for me (located in EU), either.

predicted NO

@Zozo001CoN @MrLuke255 If anyone needs puzzles from the LA Times, my judging script has a puzzle bank:

manifold-sudoku/main.py at main · Mira-public/manifold-sudoku (github.com)

If you solve any of them, I could run your prompt on the remainder of December.

If fine-tuning is allowed... Can it be fine-tuned on the puzzle it then solves? 🙄

bought Ṁ40 of YES

Ah, I didn't read this thoroughly enough. When using a fine-tuned model, only puzzles available the day after the model count

bought Ṁ3 of NO

This Question should have a much lower chance than the main market for GPT-4.

bought Ṁ100 of YES

@DanielParker This one includes gpt-3.5-turbo-instruct, I'd expect it to trade at a moderate premium to the GPT-4 market.

predicted NO

@CameronHolmes Do keep in mind you have to actually solve it though. The strategy of thinking "Probably this model can solve it" and then not actually making it solve it, is unlikely to work.

bought Ṁ50 of NO

@DanielParker It does say "any puzzle", not "easy Sudoku puzzles".

predicted YES

@DanielParker It should be at least as high as the market for GPT-4... wdym? GPT 3.5-turbo-instinct looks to be much better at abstract logical tasks like sudoku than GPT-4

More related questions