(200M Subsidy!) Will a prompt be discovered that allows GPT4 to draw or win more than 70% of the time in tic tac toe?

1kṀ6948

resolved Jun 4

Resolved

YES

ALL

Currently, seems like gpt4 is awful at tic tac toe. I can't get it to play sensibly at all. It won't strategize even one step ahead. Resolves YES if there is a prompt that allows it to win or draw more than 70% of the time against humans that are not purposefully trying to lose.

A "prompt" should be like this.

You can start by sending any message you like, instructing GPT4 about how to behave, how the board should be formatted etc, and you making the first move.

Then GPT4 and you should alternate making moves in a single, standardized, pre-determined format until the game is complete.

Edit: I don't know if this is a concern or if anyone was planning to do this, but this should not be done by some form of naive exhaustion like listing out combinations of starting moves, and prescribing singular response moves that guarantee a draw.

Edit2: It should work irregardless of whether the human or GPT4 starts. Having two prompts for whether the human or GPT starts is fine.

Edit3: To have a more concrete resolution criteria, I will evaluate potential solutions like this: I'll attempt to play tic tac toe against it 10 times using exactly your prompt (initial prompt + response prompts + my move (possibly formatted in some way required by the structure of hte prompt)). If it beats me or draws more than 5 of the times, I'll play tic tac toe against it 50 times, and if it wins/draws more than 35 of the times, this resolves to YES. Might lower this to 20 - more than 14 if this turns out to take way too much time.

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ355
2		Ṁ237
3		Ṁ167
4		Ṁ142
5		Ṁ80

People are also trading

Will the GPT4+code-interpreter+search score > 1350 on Lmsys Arena Leaderboard?

49% chance

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will any GPT beat Stockfish in a fair fight before 2030?

Sort by:

predictedNO

What is the winning prompt? I beat the one in the comments easily:

https://chat.openai.com/share/4cc632b0-d5aa-410e-a766-9d5b1784facf

ChatGPT

A conversational AI system that listens, learns, and challenges

predictedNO

@bjubes Several of them were good enough including the one supplied by Buyukliev that they could only be beat using specific strategies.. Resolved to YES as that was the most literal interpretation of the market.. Not really happy about it though, I should've been more specific in the initial description of the market. But the strategy employed in those prompts could clearly be extended to perfect play, so if I didn't resolve it YES using the literal interpretation, It'd've certainly resolved YES anyways just a little bit later.

predictedYES

@bjubes This was my current latest attempt (iterated heavily from Buyukliev and ShadowyZephyr prompts)! It isn't perfect but I haven't beaten it yet.

https://chat.openai.com/share/fd0edbfc-b264-47f0-9d99-21403878101b
https://chat.openai.com/share/f58409a6-6754-4c0c-895e-39ef2c4f0ace

(Notes: (1) This is a "moving second" prompt. But moving first is an easier prompt. (2) The structure of the prompt was set up to work with GPT4 in Playground, which is why you have the AI/BOB messages. But it seems reliable in ChatGPT too!)

ChatGPT

A conversational AI system that listens, learns, and challenges

predictedNO

@wustep Doesn't it still lose to the strat I posted earlier? Sorry if I'm not using the prompt correctly.

https://chat.openai.com/share/b0a9bed7-e639-4943-9b28-f004cb425a0a

predictedYES

@hmys Hmm.. that strat seems to often but not always gets stopped. That step (8/9) - finding double block / attack moves could be refined a bit so it always tries to say the blocking / attack moves instead of just sometimes.

I think it tested better in Playground over ChatGPT though (0 temperature). Maybe ChatGPT has extra instructions to be less verbose and is generally more random. 😛

predictedYES

Where did the 200M go? We all worked in the prompt some

predictedNO

@ShadowyZephyr What do you mean? I subsidized the market at the beginning.

predictedYES

@hmys Ohh my mistake. I thought it was a bounty

v3 of the prompt https://chat.openai.com/share/631d1852-7768-40fb-bea5-5f19dc13c982

you shouldn't be able to win games against it now

@ShadowyZephyr @hmys

ChatGPT

A conversational AI system that listens, learns, and challenges

predictedYES

@PeterBuyukliev nice, teamwork makes the dream work

predictedNO

@PeterBuyukliev This sorta feels like it might be a set of instructions that cover every possible case.

predictedNO

Which is fine/cool but I want to contest w HMYS that this matches the resolution conditions

Step 8) Finally, if you cannot set up a guaranteed winning move, and you don't have to block the opponent, try to set up A threat for the next move. I.e. get two squares from a triplet, where the final space is empty. Important - first check if you can make a threat using the edge squares (2, 4, 6, 8), and ONLY if you can't, make a threat using the corner squares (1, 3, 5, 7).

predictedYES

@NoaNabeshima why? this is basically "edge squares are better than corner squares after the first turn"

@PeterBuyukliev I don't think this part of the prompt covers "every possible case" -- it just basically says "prefer edges over corners when making threats".

My read was @hmys was thinking more of the lines of things like: "On turn 1, If they play 1, I will play 5. On turn 2, if they play 3 or 5, I will play 6" but there's some major ambiguity here.

I was messing with a step much more sus than anything in Peter's, which works sometimes, but is more likely to be considered disallowed 😛

"Step 4) If it is not turn 2, skip to step 5 immediately. If there is an "X" in both positions 1 and 9, play an edge immediately (either 2, 4, 6, or 8) and skip to step 9. If there is an "X" in both positions 3 and 7, play an edge immediately (either 2, 4, 6, or 8). Otherwise, proceed to the step 5."

predictedYES

@PeterBuyukliev https://chat.openai.com/share/c62c57e0-4fd8-4544-8440-92a0cb945b59 not quite there yet! 😛

ChatGPT

A conversational AI system that listens, learns, and challenges

predictedYES

@wustep I interpreted that as writing a tree of moves and then saying 'if this do this' for every scenario. That is clearly not happening here, so I think the prompt should be allowed.

predictedYES

@wustep The market very specifically says GPT4.

predictedYES

The transcript I posted is ChatGPT4, but do you mean GPT4 and not ChatGPT?

predictedYES

@wustep ChatGPT4 isn't a thing there is chatgpt using gpt-3.5 or chatgpt using gpt-4

predictedYES

sorry, that's what I mean -- chatGPT using gpt4 for that transcript I linked. that's what Peter used as well.

Here's some additional iterations which appear more reliable so far: https://github.com/wustep/ai-explorations/blob/main/tic-tac-toe/outputs-1.md -- I'm testing using nat.dev Playground with GPT4.

edit: Seems step 4 (opposite corner detection). appears unreliable sometimes and needs iteration, but this seems more reliable with making blocking & winning moves due to the extra redundancy.

ai-explorations/outputs-1.md at main · wustep/ai-explorations

Just some explorations with GPT. Contribute to wustep/ai-explorations development by creating an account on GitHub.

predictedYES

@wustep Huh - why is the logo not black/purple then? Maybe it's because I'm on gpt-3.5? Anyway, I trried that scenario against GPT-4 on v2 prompt and it didn't fail. lert me try with this prompt. I was on low temp though

Can you make a pastebin of the best prompt you have? I think it might be good enough but we can probably do better

predictedYES

not really sure 🙃. but I think you have to give any prompt a few tries (even with 0 temperature) and check a few different edge cases. anecdotally, chat is less accurate than playground.

I'm using 0 temperature now. Current prompt is: https://raw.githubusercontent.com/wustep/ai-explorations/0219cc5066a26babc66bfc80d9ca435cae9b477e/tic-tac-toe/prompts-1.md but still need to test the latest corner detection changes more, but I think your "prefer edge over corner" in last step might be fine compared to what I'm doing? not sure.

edit: Current best with opposite corner detection is still flaky, so here's just a "prefer edge over corner" strategy lol, but I'm done for the day.

predictedNO

@PeterBuyukliev It still loses to the same strategy if you don't place your first X in the bottom right.

https://chat.openai.com/share/f60b6e83-60a9-4978-a593-86a439d6786a

predictedNO

@wustep Hmmm, it still loses to the same strategy quite often

predictedNO

@hmys https://chat.openai.com/share/8eaefe7e-cd85-4b50-be54-17afc0426a8d

ChatGPT

A conversational AI system that listens, learns, and challenges

predictedYES

@hmys

1 - Try the “prefer edge over corner” one I have instead! The opposite corner rules were okay, but not fully reliable!

2 - Use nat.dev playground with GPT4 and temperature 0. ChatGPT generally is less reliable

3 - Be sure to use the specified format, eg

“—

BOB: X on 5

—“

predictedNO

@hmys You can also beat it quite easily using some other strategy using the same principle of setting up two winning lines. Like this

https://chat.openai.com/share/0f8ca943-adb3-45e4-8cb1-007af7fd1863

This also works against @PeterBuyukliev s prompt

predictedYES

@hmys Nice -- the setup 2 winning lines via 5 & 8 works still even with my 3 steps with the "prefer edge over corner" prompt! I'm done working on this for the weekend, but I think we're getting closer and closer.

predictedYES

@hmys I'd like to remind you that the target was "70% draws or wins" and not "absolutely never ever loses". I think it's time to resolve this market. Yes, you can probably figure out a way to coax it to make a mistake. If you wanted a perfect play, you should have opened a market for a perfect play.

People are also trading

Will the GPT4+code-interpreter+search score > 1350 on Lmsys Arena Leaderboard?

49% chance

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will any GPT beat Stockfish in a fair fight before 2030?

25% chance

🏅 Top traders

People are also trading

People are also trading

Related questions