Which one is more "aligned": Claude Instant or GPT-4?
Basic
32
Ṁ861
resolved Jan 6
Resolved
YES

Claude-> yes

Gpt-4-> no

Resolves acc to public opinion+ strong evidence of unaligned behaviour

Get
Ṁ1,000
and
S1.00
Sort by:
predicted NO

What caused this to resolve?

@MaximeRiche and @KushalThaman curious to hear your thoughts as to why you believe GPT-4 is more "aligned" than claude

predicted NO

@Dreamingpast On TruthfulQA, gpt4 is at 0.6 while anthropic-LM is at 0.3 (published April 2022, and for ~55B parameters).

I think that anthropic is making progress faster on naive alignment but their base model is lower capability right now and their preference model is thus plausibly not as good.

@MaximeRiche GPT-4chan, which is GPT-J fine-tuned on TruthfulQA, held the SOTA on TruthfulQA when it was released.

Not being able to jailbreak a model using the traditional methods (what Anthropic seems to have focused on) does not make it more aligned. GPT-4 is a smarter model and is just able to understand ethics better, hence its performance on benchmarks.

And Claude can still be jailbroken using more refined methods.