Will GPT-4 be able to identify the winner of the Alignment Awards in 1/10 unsorted tournament runs?
Basic
6
Ṁ54
resolved Dec 8
Resolved
NO

[Ran out of space in the title, so read for more details]

The market will resolve to YES if the winner of the tournament is the same as the winner of the contest on at least 1 run out of 10.

The market will resolve to NA if I fail to execute the experiment within a week. Currently the code seems to run correctly on 5 random entrants, so I don't expect issues.

The experiment is to describe the conditions and criteria of the Goal Misgeneralization contest of the Alignment Awards to GPT-4, and then ask it which of two 500-word summaries is more likely to have won the competition. Winning entries go to the next round, till there is only 1 winner left. If there are an uneven number of entries in a round, then the last entry automatically graduates to the next round and will then become the first entry of that round.

The original competition was judged on more than the 500-word summary, so it is entirely possible that this is insufficient information to identify the winner (even for the original judges). The 500 word summaries were submitted and written by the original entrants.

Each time the tournament is run, the entrants are randomized, so different pairs compete against each other in the first round.

There are 52 entrants. Duplicates have been removed but irrelevant submissions have not.

PS: I don't think I can share the prompt yet, and I do realize that's a big factor in betting. Vaguely, the prompt contains all major details of the competition (including names of the judges and criteria for scoring) and includes encouragement to reason step by step. Per 'match', GPT-4 is run once to reason out the winner between 2 entrants, and then run again to extract the winner label from the reasoning text (to avoid manual extraction).

Get
Ṁ1,000
and
S3.00
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules