Will Biden do better on the ABC interview than the debate, according to LLM on the transcripts?

1kṀ3869

resolved Jul 11

Resolved

YES

ALL

After the interview airs, I will ask Claude which one Biden did better on based on the transcripts, using the prompt below. If that doesn't work (any refusal or no clear answer), I'll try other prompts and/or asking ChatGPT or Gemini. Resolves YES if it says Biden did better on the ABC interview, NO if it says Biden did better in the debate.

Claude prompt (Claude 3.5 Sonnet)

In which of the two attached transcripts did Biden do better?

Both transcripts will be added as attachments.

I will not bet on this market.

US Politics

Politics

2024 US Presidential Election

Joe Biden

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ278
2		Ṁ84
3		Ṁ69
4		Ṁ60
5		Ṁ51

8 Comments

26 Holders

32 Trades

Sort by:

Well, I got an interesting result so far. Because LLMs are known to be very susceptible to giving different results depending on order of text in the prompt, I prompted Claude with the transcripts attached in one order, and then the other - and I got opposite results. I tested 3 times with each order and Claude consistently chose the 2nd transcript as better on all 6 attempts.

Next, I would like to ask ChatGPT and/or Gemini the same question, but the length limit on free accounts is too small for the transcripts - can someone with a paid account test it?

First output with debate attached first:

First output with interview attached first:

You should be able to access long-context Gemini for free via the developer studio, I think?

I suspect this is incredibly dependent on the quality of the transcripts. To what extent has each transcript been 'cleaned up'?

I used the official transcripts from CNN and ABC.

I'm out this weekend, I'll run some more prompts Monday.

Ok, I tried some other prompts for Claude and got the same result as above.

And I tried Gemini and it consistently refuses to answer prompts like "Please assess the strengths and weaknesses of Biden in the following debate transcript:" or "In which of the two attached transcripts did Biden do better?" because of "unsafe content" (even when I change the settings to supposedly not block unsafe content)

@Joshua helped me test ChatGPT on the "who won the debate" market, could you try prompting ChatGPT with something like this: https://docs.google.com/document/d/1VRCjoNRs292yFX_tr8oT1WjWNgDNvmR0lAmOtBGpEBM/edit

Ok, I used the following prompts for Claude

Please assess the strengths and weaknesses of Biden in this interview transcript
Next, please assess the strengths and weaknesses of Biden in this debate transcript
In which of the two did Biden do better?

And the same thing with the order of the interview/debate prompts switched.

I regenerated the output 3 times for each order.

The results were:

With debate first then interview, interview won 3/3
With interview first then debate, debate won 2/3

Therefore, after 3 previous rounds of ties, we finally have a tiebreaker with Claude saying Biden did better in the interview. This is the output that broke the tie:

Therefore, resolving YES, unless there are any objections to this methodology. (Sucks that LLMs are so easily influenced by prompt ordering!)

sold Ṁ46 YES

What happens if Claude responds that Biden did equally well, or otherwise doesn't give a clear answer that one was better than the other?

That falls under the "if that doesn't work" clause. I will keep trying until it gives an answer.

🏅 Top traders

Related questions