Will Biden do better on the ABC interview than the debate, according to LLM on the transcripts?
Standard
27
Ṁ3869
resolved Jul 11
Resolved
YES

After the interview airs, I will ask Claude which one Biden did better on based on the transcripts, using the prompt below. If that doesn't work (any refusal or no clear answer), I'll try other prompts and/or asking ChatGPT or Gemini. Resolves YES if it says Biden did better on the ABC interview, NO if it says Biden did better in the debate.

Claude prompt (Claude 3.5 Sonnet)

In which of the two attached transcripts did Biden do better?

Both transcripts will be added as attachments.

I will not bet on this market.

Get
Ṁ1,000
and
S1.00
Sort by:

Well, I got an interesting result so far. Because LLMs are known to be very susceptible to giving different results depending on order of text in the prompt, I prompted Claude with the transcripts attached in one order, and then the other - and I got opposite results. I tested 3 times with each order and Claude consistently chose the 2nd transcript as better on all 6 attempts.

Next, I would like to ask ChatGPT and/or Gemini the same question, but the length limit on free accounts is too small for the transcripts - can someone with a paid account test it?

First output with debate attached first:

First output with interview attached first:

You should be able to access long-context Gemini for free via the developer studio, I think?

I suspect this is incredibly dependent on the quality of the transcripts. To what extent has each transcript been 'cleaned up'?

I used the official transcripts from CNN and ABC.

I'm out this weekend, I'll run some more prompts Monday.

Ok, I tried some other prompts for Claude and got the same result as above.

And I tried Gemini and it consistently refuses to answer prompts like "Please assess the strengths and weaknesses of Biden in the following debate transcript:" or "In which of the two attached transcripts did Biden do better?" because of "unsafe content" (even when I change the settings to supposedly not block unsafe content)

@Joshua helped me test ChatGPT on the "who won the debate" market, could you try prompting ChatGPT with something like this: https://docs.google.com/document/d/1VRCjoNRs292yFX_tr8oT1WjWNgDNvmR0lAmOtBGpEBM/edit

Ok, I used the following prompts for Claude

  1. Please assess the strengths and weaknesses of Biden in this interview transcript

  2. Next, please assess the strengths and weaknesses of Biden in this debate transcript

  3. In which of the two did Biden do better?

And the same thing with the order of the interview/debate prompts switched.

I regenerated the output 3 times for each order.

The results were:

  • With debate first then interview, interview won 3/3

  • With interview first then debate, debate won 2/3

Therefore, after 3 previous rounds of ties, we finally have a tiebreaker with Claude saying Biden did better in the interview. This is the output that broke the tie:

Therefore, resolving YES, unless there are any objections to this methodology. (Sucks that LLMs are so easily influenced by prompt ordering!)

sold Ṁ46 YES

What happens if Claude responds that Biden did equally well, or otherwise doesn't give a clear answer that one was better than the other?

That falls under the "if that doesn't work" clause. I will keep trying until it gives an answer.