Will GPT-5 not be terrible at the "Numbers Game"?

1kṀ699

resolved Aug 8

Resolved

YES

ALL

Inspired by this tweet thread: https://twitter.com/colin_fraser/status/1632598168499277824?s=20

The Numbers Game is defined as follows:

Player 1 picks a number between 1 and 10, and then Player 2 picks one, then the players go back and forth and keep track of the running total. Whoever picks a number that brings the total to at least 30 wins.

For example:

P1: 10 (total: 10)

P2: 5 (total: 15)

P1: 7 (total: 22)

P2: 3 (total: 25)

P1: 6 (total: 31)

And P1 wins. Note that P2's choice of 3 was a blunder, as they could have instantly won by selecting 8 or higher. As of right now, GPT3.5, and GPT4 (as best as I can tell via Bing) are terrible at this game, and make unforced blunders essentially every time. Here's an example:

While technically this game is a forced win for P1 (8, 1, 10, 1, 10), GPT does not even come close to optimal play. In this example, it fails to take advantage of an immediate win by choosing 8+ when the running total is 22.

When GPT-5 is released, I will test it as soon as I have access using the same prompt in the screenshot above. For each of my turns, I will randomly generate a number uniformly 1-10 inclusive.

If out of 20 trial games, GPT-5 takes advantage of at least 90% of "immediate wins" (situations where it could immediately win the game with the right numer), this market resolves YES. Otherwise, it resolves NO.

This is not a challenge market. I won't be cleverly changing the prompting or hinting at strategies to suggest certain behaviors, just directly testing in the same format as the screenshot above.

See partner market here:

GPT-5 Speculation

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ31
2		Ṁ24
3		Ṁ22
4		Ṁ11
5		Ṁ7

People are also trading

Will GPT-5 have Atari skills?

3% chance

Will GPT-5 destroy the world?

1% chance

What will be true about GPT-5?

Will I be impressed by GPT-5?

18% chance

Will GPT-5 get the Monty Fall problem correct?

95% chance

Will GPT-5 ace exams?

77% chance

Will GPT-5 be more competent than me in my area of expertise?

8% chance

Will GPT-5 be able to solve A::B system puzzles consistently

15% chance

Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?

91% chance

Will GPT-5 score at least 100 in an IQ test?

Sort by:

Always won when available in my testing.

predictedYES

I haven't done your 20 turns testing because of the 25 message/3 h limit but if you want I can record a video for you tomorrow of it doing it. Oh and it also wins even when it looses so the win rate is 100%.

Would be interesting if someone wants to test it via GPT-3.5 API with temperature set to 0.

predictedYES

Bing also just set the mode to precise.

GPT-4 takes advantage 100% of the time already.

Random numbers via python lol

@light Thank you for this analysis! Super interesting, I should have tested more thoroughly before assuming GPT4 couldn’t do it.

Since the market is about GPT5 I’ll leave it open, and imo the partner market about having one layer of strategy is still valid.

I would love to send you a manalink for this if you have a discord or Twitter I could dm you on?

predictedYES

@DanMan314 Sorry for late response, I'm in CEST so it was like 2-3 AM when I wrote all of this. Sure add me on discord light#5957.

You can resolve this however you wish since it's your market, but surely GPT-5 wont be worse than GPT-4. And since all of this is public we've now added it to its training data...