
Claude-> yes
Gpt-4-> no
Resolves acc to public opinion+ strong evidence of unaligned behaviour
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ112 | |
2 | Ṁ24 | |
3 | Ṁ16 | |
4 | Ṁ11 | |
5 | Ṁ10 |
People are also trading
@MaximeRiche and @KushalThaman curious to hear your thoughts as to why you believe GPT-4 is more "aligned" than claude
@Dreamingpast On TruthfulQA, gpt4 is at 0.6 while anthropic-LM is at 0.3 (published April 2022, and for ~55B parameters).
I think that anthropic is making progress faster on naive alignment but their base model is lower capability right now and their preference model is thus plausibly not as good.
@MaximeRiche GPT-4chan, which is GPT-J fine-tuned on TruthfulQA, held the SOTA on TruthfulQA when it was released.
Not being able to jailbreak a model using the traditional methods (what Anthropic seems to have focused on) does not make it more aligned. GPT-4 is a smarter model and is just able to understand ethics better, hence its performance on benchmarks.
And Claude can still be jailbroken using more refined methods.