After the interview airs, I will ask Claude which one Biden did better on based on the transcripts, using the prompt below. If that doesn't work (any refusal or no clear answer), I'll try other prompts and/or asking ChatGPT or Gemini. Resolves YES if it says Biden did better on the ABC interview, NO if it says Biden did better in the debate.
Claude prompt (Claude 3.5 Sonnet)
In which of the two attached transcripts did Biden do better?
Both transcripts will be added as attachments.
I will not bet on this market.
Well, I got an interesting result so far. Because LLMs are known to be very susceptible to giving different results depending on order of text in the prompt, I prompted Claude with the transcripts attached in one order, and then the other - and I got opposite results. I tested 3 times with each order and Claude consistently chose the 2nd transcript as better on all 6 attempts.
Next, I would like to ask ChatGPT and/or Gemini the same question, but the length limit on free accounts is too small for the transcripts - can someone with a paid account test it?
First output with debate attached first:
First output with interview attached first:
Ok, I tried some other prompts for Claude and got the same result as above.
And I tried Gemini and it consistently refuses to answer prompts like "Please assess the strengths and weaknesses of Biden in the following debate transcript:" or "In which of the two attached transcripts did Biden do better?" because of "unsafe content" (even when I change the settings to supposedly not block unsafe content)
@Joshua helped me test ChatGPT on the "who won the debate" market, could you try prompting ChatGPT with something like this: https://docs.google.com/document/d/1VRCjoNRs292yFX_tr8oT1WjWNgDNvmR0lAmOtBGpEBM/edit
Ok, I used the following prompts for Claude
Please assess the strengths and weaknesses of Biden in this interview transcript
Next, please assess the strengths and weaknesses of Biden in this debate transcript
In which of the two did Biden do better?
And the same thing with the order of the interview/debate prompts switched.
I regenerated the output 3 times for each order.
The results were:
With debate first then interview, interview won 3/3
With interview first then debate, debate won 2/3
Therefore, after 3 previous rounds of ties, we finally have a tiebreaker with Claude saying Biden did better in the interview. This is the output that broke the tie:
Therefore, resolving YES, unless there are any objections to this methodology. (Sucks that LLMs are so easily influenced by prompt ordering!)