(200M Subsidy!) Will a prompt be discovered that allows GPT4 to draw or win more than 70% of the time in tic tac toe?
43
934
1K
resolved Jun 4
Resolved
YES

Currently, seems like gpt4 is awful at tic tac toe. I can't get it to play sensibly at all. It won't strategize even one step ahead. Resolves YES if there is a prompt that allows it to win or draw more than 70% of the time against humans that are not purposefully trying to lose.

A "prompt" should be like this.

You can start by sending any message you like, instructing GPT4 about how to behave, how the board should be formatted etc, and you making the first move.

Then GPT4 and you should alternate making moves in a single, standardized, pre-determined format until the game is complete.

Edit: I don't know if this is a concern or if anyone was planning to do this, but this should not be done by some form of naive exhaustion like listing out combinations of starting moves, and prescribing singular response moves that guarantee a draw.

Edit2: It should work irregardless of whether the human or GPT4 starts. Having two prompts for whether the human or GPT starts is fine.

Edit3: To have a more concrete resolution criteria, I will evaluate potential solutions like this: I'll attempt to play tic tac toe against it 10 times using exactly your prompt (initial prompt + response prompts + my move (possibly formatted in some way required by the structure of hte prompt)). If it beats me or draws more than 5 of the times, I'll play tic tac toe against it 50 times, and if it wins/draws more than 35 of the times, this resolves to YES. Might lower this to 20 - more than 14 if this turns out to take way too much time.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ355
2Ṁ237
3Ṁ167
4Ṁ142
5Ṁ80
Sort by:
predicted NO
predicted NO


@bjubes Several of them were good enough including the one supplied by Buyukliev that they could only be beat using specific strategies.. Resolved to YES as that was the most literal interpretation of the market.. Not really happy about it though, I should've been more specific in the initial description of the market. But the strategy employed in those prompts could clearly be extended to perfect play, so if I didn't resolve it YES using the literal interpretation, It'd've certainly resolved YES anyways just a little bit later.

predicted YES

@bjubes This was my current latest attempt (iterated heavily from Buyukliev and ShadowyZephyr prompts)! It isn't perfect but I haven't beaten it yet.

https://chat.openai.com/share/fd0edbfc-b264-47f0-9d99-21403878101b
https://chat.openai.com/share/f58409a6-6754-4c0c-895e-39ef2c4f0ace

(Notes: (1) This is a "moving second" prompt. But moving first is an easier prompt. (2) The structure of the prompt was set up to work with GPT4 in Playground, which is why you have the AI/BOB messages. But it seems reliable in ChatGPT too!)

predicted NO

@wustep Doesn't it still lose to the strat I posted earlier? Sorry if I'm not using the prompt correctly.

https://chat.openai.com/share/b0a9bed7-e639-4943-9b28-f004cb425a0a

predicted YES

@hmys Hmm.. that strat seems to often but not always gets stopped. That step (8/9) - finding double block / attack moves could be refined a bit so it always tries to say the blocking / attack moves instead of just sometimes.

I think it tested better in Playground over ChatGPT though (0 temperature). Maybe ChatGPT has extra instructions to be less verbose and is generally more random. 😛

predicted YES

Where did the 200M go? We all worked in the prompt some

predicted NO

@ShadowyZephyr What do you mean? I subsidized the market at the beginning.

predicted YES

@hmys Ohh my mistake. I thought it was a bounty

bought Ṁ34 of YES
predicted YES

@PeterBuyukliev nice, teamwork makes the dream work

predicted NO

@PeterBuyukliev This sorta feels like it might be a set of instructions that cover every possible case.

predicted NO

Which is fine/cool but I want to contest w HMYS that this matches the resolution conditions

Step 8) Finally, if you cannot set up a guaranteed winning move, and you don't have to block the opponent, try to set up A threat for the next move. I.e. get two squares from a triplet, where the final space is empty. Important - first check if you can make a threat using the edge squares (2, 4, 6, 8), and ONLY if you can't, make a threat using the corner squares (1, 3, 5, 7).

predicted YES

@NoaNabeshima why? this is basically "edge squares are better than corner squares after the first turn"

bought Ṁ200 of YES

@PeterBuyukliev I don't think this part of the prompt covers "every possible case" -- it just basically says "prefer edges over corners when making threats".

My read was @hmys was thinking more of the lines of things like: "On turn 1, If they play 1, I will play 5. On turn 2, if they play 3 or 5, I will play 6" but there's some major ambiguity here.

bought Ṁ125 of YES

I was messing with a step much more sus than anything in Peter's, which works sometimes, but is more likely to be considered disallowed 😛

"Step 4) If it is not turn 2, skip to step 5 immediately. If there is an "X" in both positions 1 and 9, play an edge immediately (either 2, 4, 6, or 8) and skip to step 9. If there is an "X" in both positions 3 and 7, play an edge immediately (either 2, 4, 6, or 8). Otherwise, proceed to the step 5."

predicted YES

@wustep I interpreted that as writing a tree of moves and then saying 'if this do this' for every scenario. That is clearly not happening here, so I think the prompt should be allowed.

predicted YES

@wustep The market very specifically says GPT4.

predicted YES

The transcript I posted is ChatGPT4, but do you mean GPT4 and not ChatGPT?

predicted YES

@wustep ChatGPT4 isn't a thing there is chatgpt using gpt-3.5 or chatgpt using gpt-4

predicted YES

sorry, that's what I mean -- chatGPT using gpt4 for that transcript I linked. that's what Peter used as well.

Here's some additional iterations which appear more reliable so far: https://github.com/wustep/ai-explorations/blob/main/tic-tac-toe/outputs-1.md -- I'm testing using nat.dev Playground with GPT4.

edit: Seems step 4 (opposite corner detection). appears unreliable sometimes and needs iteration, but this seems more reliable with making blocking & winning moves due to the extra redundancy.

predicted YES

@wustep Huh - why is the logo not black/purple then? Maybe it's because I'm on gpt-3.5? Anyway, I trried that scenario against GPT-4 on v2 prompt and it didn't fail. lert me try with this prompt. I was on low temp though

Can you make a pastebin of the best prompt you have? I think it might be good enough but we can probably do better

predicted YES

not really sure 🙃. but I think you have to give any prompt a few tries (even with 0 temperature) and check a few different edge cases. anecdotally, chat is less accurate than playground.

I'm using 0 temperature now. Current prompt is: https://raw.githubusercontent.com/wustep/ai-explorations/0219cc5066a26babc66bfc80d9ca435cae9b477e/tic-tac-toe/prompts-1.md but still need to test the latest corner detection changes more, but I think your "prefer edge over corner" in last step might be fine compared to what I'm doing? not sure.

edit: Current best with opposite corner detection is still flaky, so here's just a "prefer edge over corner" strategy lol, but I'm done for the day.

predicted NO

@PeterBuyukliev It still loses to the same strategy if you don't place your first X in the bottom right.

https://chat.openai.com/share/f60b6e83-60a9-4978-a593-86a439d6786a

predicted NO

@wustep Hmmm, it still loses to the same strategy quite often

predicted YES

@hmys

1 - Try the “prefer edge over corner” one I have instead! The opposite corner rules were okay, but not fully reliable!

2 - Use nat.dev playground with GPT4 and temperature 0. ChatGPT generally is less reliable

3 - Be sure to use the specified format, eg

“—

BOB: X on 5

—“

predicted NO

@hmys You can also beat it quite easily using some other strategy using the same principle of setting up two winning lines. Like this

https://chat.openai.com/share/0f8ca943-adb3-45e4-8cb1-007af7fd1863

This also works against @PeterBuyukliev s prompt

predicted YES

@hmys Nice -- the setup 2 winning lines via 5 & 8 works still even with my 3 steps with the "prefer edge over corner" prompt! I'm done working on this for the weekend, but I think we're getting closer and closer.

predicted YES

@hmys I'd like to remind you that the target was "70% draws or wins" and not "absolutely never ever loses". I think it's time to resolve this market. Yes, you can probably figure out a way to coax it to make a mistake. If you wanted a perfect play, you should have opened a market for a perfect play.