Will @Mira make a better Sudoku prompt than @EmilyThomas?
Basic
16
Ṁ694
resolved Mar 11
Resolved
NO

The metric for any given puzzle is a binary "Did it solve it?", with no consideration for how elegant the process is/isn't, or how close it got.

Results will use:

  • My already submitted version 3 solution, currently been testing by Mira on the January puzzles.

  • Mira's solution, currently being developed.

I don't know exactly how Mira's solution will be tested (how many puzzles), so:

  • Resolves YES if Mira's prompt is better than mine (higher solve percentage, + common sense of intent, to avoid a successful solve on only a single attempted puzzle counting as 1/1 = 100% etc).

  • Resolves NO if my prompt is better than Mira's (higher solve percentage, + common sense of intent), or if Mira does not finish a solution or test it on any puzzles. Also resolves NO if the prompts objectively tie.

  • Resolves N/A if neither Yes or No makes sense.

May resolve early if all testing is finished early. May extend if testing needs longer, for reasons agreed to be sensible.

Main Sudoku Market

Market on Mira's Attempt

Get
Ṁ1,000
and
S1.00
Sort by:

@traders Market is reopened, as Mira has taken to the Sudoku Madness and is not yet done.

@Mira Any sudoku updates? I'm trying to figure out if I should reopen, resolve, or wait for results.

@EmilyThomas You can probably reopen this one. My first algorithm ended up being exactly equivalent to yours(simulation solved it iff your simulator solved it), and I'm not going to submit a copycat even if it might get a higher solve %.

So two ways to get an improvement are to 1. Add a technique that requires tracking per-cell candidates and not just per-set(like hidden singles) or 2. Stick with only keeping grid as state, but forking the state to support a backtracking algorithm.

This is what a Fibonacci calculator looks like:

<program>
g1192848: IF(.N, .B := (.A + .B) , .A := B, .N := (.N - .1) ; CALL(g1192848))
MAIN: .N := 3 , .A := 0 , .B := 1 ; CALL(g1192848))
</program>

It keeps a state record of every variable, an "active program" that is a semicolon-separated list of primitive instructions, and general recursion is implemented by lazily unfolding words. Comma-separated variable sets are done "simultaneously"(the state record is read and the right side of every "x := y" is replaced with definitions), while semicolon-separated instructions are done sequentially. Values can be integers, vectors, or symbols(unfolding a symbol stored in state is similar to calling a function that's passed as an argument).

GPT-4 can't reliably index into lists, but you can map over a list by repeatedly taking the front and back while incrementing a counter in the state record. So my Sudoku algorithm needs to do as much as possible in a single pass. And since the active program is copied every time an instruction is executed, it's important to create a lot of labels(the "g1192848") whose definitions are unfolded as needed, moreso than you would in a regular programming language. (In the Fibonacci example above, the ".r0 := (.A + .B) , .A := B , .B := r0 , .N := (.N - .1)" should probably be its own word just because there are 2 steps with it active and "CALL(word)" would pay for itself in tokens by being shorter. I might change my symbol generator to be token-efficient, since "g1192848" is just whatever (gensym) returns. There's also an explicit "HALT" instruction.

The traces for these programs don't look at all like natural language LLM prompts. The average person is just going to see pages of incomprehensible noise, that is the internal state of a machine. And you don't write the executed programs, you "compile" to the format from a source language that has loop constructs, lexical scoping, and function definitions.

I'm currently trying to do a "streamable markup-based algorithm" that uses 3 techniques in a single loop(2 of which together are equivalent to your algorithm) repeating while progress is made. The main risk is the state record or active program gets too large for GPT-4 to understand, even despite trying to design for it. If that doesn't work, I'll add nondeterminism to the virtual machine(state is a list of (state record, active program)) and then keep only the grid as state and not the full markup.

If I can't get either markup-based or nondeterminism-based algorithms running, then your grid-only approach will stand as the best algorithm that I could come up with.

I was debating whether I even wanted to attempt this since nobody took my 1% order on my market, but I decided on January 28th to start trying anyways.

@Mira "Ah yes, Sudoku Madness, I'd recognize it anywhere. Worst case I've seen in months!"

  • Hidden singles by itself had an 86% solve rate over 77 puzzles.

  • Obvious singles (the method I implemented) had a 79% solve rate over the same puzzles (there were both some puzzle successes and some puzzle fails which differed between the two runs).

  • Trying to get the full cell candidates without mistakes is where madness lies. I won't tell you not to try, because I also tried (and succeeded/abandoned it with an abysmally high token/turn count). But it is... possible.

When I planned out how to implement hidden singles it was the same difficulty/tokens/turns/complexity/appropriate-word as obvious singles, but I didn't implement it. Instead I'd pinned my hopes on an even shinier method with a 99% solve rate, which was literally just doing both hidden and obvious singles in one go.

If you can do a working equivalent to hidden singles (or better), then you win. (or if you add an improvement to the current one, etc, etc).

In my final method/attempt (the one I failed right at the end), I was trying to do too much per turn. The one that ended up as my main submission (obvious singles, version 3) was resistant to mistakes by being redundant and using more tokens for the same work, in my final attempt I tried too much finesse, and paid for it. I think I could have gotten the 99% method to work, but would have needed to start it over completely, going for less shiny and more... more of the bulky mistake-proofing.

Good Luck!!!

I can test it on the January pool so the numbers are comparable.

@Mira Well! This just got more interesting.

Also, GPT-4 originally gave me the market title "Battle of the Sudoku Titans: @Mira vs. @EmilyThomas – Who Will Reign Supreme in Puzzle Mastery?", but that seemed a bit much 😅.

Related questions