Based on widely accepted benchmarks such as MT-bench.
🏅 Top traders
# | Name | Total profit |
---|---|---|
1 | Ṁ119 | |
2 | Ṁ45 | |
3 | Ṁ26 | |
4 | Ṁ11 | |
5 | Ṁ10 |
Claude 3 Opus was at the top of the chatbot arena leaderboard after release, until OpenAI released an updated GPT4-turbo that regained the lead once again.
See for example this market that resolved on Claude 3 Opus taking the top spot:
/LoganZoellner/will-any-chatbot-beat-gpt4-by-july
Several LLMs are considered to be in the same capabilities "weight class" as current GPT-4, and some perform better than older GPT-4 releases:
https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
There are other benchmarks we can talk about, but the ELO rating on chatbot arena is becoming a standard, hard to game metric for comparing chatbots according to human judgement.
I think this should resolve YES!