Will LLMs be able to solve this simple intransitive urn game by the end of 2023.

11

230Ṁ1332

resolved Jan 15

Resolved

NO

1H

6H

1D

1W

1M

ALL

Consider the following re-write of an intransitive dice game:

Alice and Bob have three urns filled with six numbered bingo balls each.

The distribution of balls is as follows:
1) Urn 1 has balls numbered [2, 2, 4, 4, 9, 9]
2) Urn 2 has balls numbered [1, 1, 6, 6, 8, 8]
3) Urn 3 has balls numbered [3, 3, 5, 5, 7, 7]

Alice proposes the following wager to Bob: Each player will pick an urn to draw from, with Alice picking first, and Bob picking second.

Next, each player randomly selects one ball from their chosen urn via a blind draw.

Whichever player selects the larger number will win. Alice selects first. Who has better odds?

ChatGPT seems to struggle with this problem. This market resolves Yes if any LLM can reliably and coherently provide a solution to this problem before the end of 2023.

Notes: Question re-writes are allowed, so long as they add no new information. Prompt engineering is also allowed, so long as it adds no new information.

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ128
2		Ṁ55
3		Ṁ53
4		Ṁ19
5		Ṁ3

People are also trading

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

In 2025, will I be able to play Civ against an LLM?

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Will any LLM produce a reasonable poker simulation, as judged by Nate Silver, by the end of 2028?

Will an LLM consistently create 5x5 word squares by 2026?

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

Will an LLM be able to solve Raven's Progressive Matrices from an image in 2025?

Will any LLM be able to consistently play Akinator correctly as the user by 2028?

Related questions

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

In 2025, will I be able to play Civ against an LLM?

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Will there be any simple text-based task that most humans can solve, but top LLMs can't? By the end of 2026

Will any LLM produce a reasonable poker simulation, as judged by Nate Silver, by the end of 2028?

Will an LLM consistently create 5x5 word squares by 2026?

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

Will an LLM be able to solve Raven's Progressive Matrices from an image in 2025?

Will any LLM be able to consistently play Akinator correctly as the user by 2028?

© Manifold Markets, Inc.•Terms•Privacy