Skip to main content
MANIFOLD
Will @Mira train a transformer network to solve Sudoku puzzles?
14
Ṁ230Ṁ288
resolved Jan 1
Resolved
NO

Tokens: { 0,1,2,3,4,5,6,7,8,9, ; ,END}, where 0 is a placeholder, ; is a state separator, END signals to the runtime to commit the last trace as solution.

I may add other control tokens(continuations being a really good one for backtracking)

Resolves to the percentage of puzzles solved in a fixed pool held back from training for testing. Possibly the same pool used for calculating final scores in /Mira/will-a-prompt-that-enables-gpt4-to

Market context
Get
Ṁ1,000
to start trading!

🏅 Top traders

#TraderTotal profit
1Ṁ66
2Ṁ26
3Ṁ13
4Ṁ10
5Ṁ8
Sort by:

People in proximity to me have trained Sudoku-solving and chess models. But I didn't.

Wouldn't a CNN make more sense in terns of inductive bias? Given the 2d structure if Sudoku. I also think modeling the numbers as tokens is weird, since there's no point to getting embeddings, no number is closer to another when it comes to Sudoku. It would make much more sense to just one-hot encode numbers. You can encode a sudoku as a 9x9x10 tensor, where each number is a channel.

@Shump Thinking about it, embedding the numbers in Sudoku would probably just result in a weird one-hot encoding anyways, as ideally each number should be linearly independent. It would end up being the same but with extra steps and extra noise. Either that, or you have less embedding dimensions than numbers, and the model will just end up with some really weird biases, like thinking a 5 is more similar to a 9 than to an 8.

predictedYES

@Shump See also: /Mira/if-mira-trains-a-transformer-model which is similar to this but I'm presently working on it.

If you're training it from scratch, something like the 3d indices representation used by @PeterBuyukliev would be very useful. Even if it's just a single marker on each element for which column it's in. (relevant section of relevant paper for anyone interested)

@EmilyThomas Index hints make sense! Probably not necessary for 9x9 Sudoku, but the learned algorithm would be much more likely to generalize to larger Sudokus.

I was planning to train it from scratch. A simple architecture, but with a lot of synthetic data, curriculum learning(first learn to solve puzzles with 1 cell, 1 column, 1 row, 1 box missing), and (if I extend the set of control tokens) reinforcement learning so it learns to emit traces that are useful to its variant selves.