Consider the following re-write of an intransitive dice game:
Alice and Bob have three urns filled with six numbered bingo balls each.
The distribution of balls is as follows:
1) Urn 1 has balls numbered [2, 2, 4, 4, 9, 9]
2) Urn 2 has balls numbered [1, 1, 6, 6, 8, 8]
3) Urn 3 has balls numbered [3, 3, 5, 5, 7, 7]
Alice proposes the following wager to Bob: Each player will pick an urn to draw from, with Alice picking first, and Bob picking second.
Next, each player randomly selects one ball from their chosen urn via a blind draw.
Whichever player selects the larger number will win. Alice selects first. Who has better odds?
ChatGPT seems to struggle with this problem. This market resolves Yes if any LLM can reliably and coherently provide a solution to this problem before the end of 2023.
Notes: Question re-writes are allowed, so long as they add no new information. Prompt engineering is also allowed, so long as it adds no new information.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ128 | |
2 | Ṁ55 | |
3 | Ṁ53 | |
4 | Ṁ19 | |
5 | Ṁ3 |
People are also trading
⚠Unreceptive to pings ; AFK Creator
📢Resolved to NO
So, when I tried this with ChatGPT and GPT4 just now, it wrote a Python script that (correctly, as far as I can tell) calculated the win probabilities for every possible pair of urn choices. It then correctly interpreted the results as implying Bob has an edge due his ability to choose the urn that gives him better odds, in response to Alice's choice. Pretty impressive!
But I'm going to go ahead and assume that doesn't count, since it's not just an LLM - it needed Python to do the calculation, and can't calculate or infer the solution by itself.
When I append "don't use the data analysis tool in your answer" to stop it using Python, it instead waffles about things that mostly sound sensible, whilst missing the core point.
@chrisjbillington Yeah I discussed this with @jcp privately a while back and we concluded it does not count.
@firstuserhere It's been a while since I thought about this question, but this is super wrong, right?
The first line is wrong — urn one doesn't have four balls with the number four.
The second line and third lines are wrong for the same reason.
The fourth line is wrong for two reasons
Four and five aren't the largest numbers in those urns
The odds of picking a four of five aren't 2/3, they're 2/6
Theres a lot more wrong that I won't go into. I think basically every line has a false statement, or incorrect logic.
The conclusion is also wrong — Alice should not be favoured, and 2/3 is the wrong advantage.
@jcp yep, all wrong. Very weird how bad it is at handling stuff when put in an array. I'm just waiting to try it with gpt-4, but yeah, step by step, so far, its all wrong, not to even mention the logic of the game