As of market creation, there are a few, but not 20. Off the top of my mind, we have
Mistral Mixtral
Inflection-2
Anthropic Claude 2
Google Gemini Pro
Grok.
GPT-4
@mods gpt-3.5 is 80th on lmsys, can you resolve? https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard
Would the following models count:
- Anthropic Claude 1 (outranks GPT-3.5 here)
- Anthropic Claude 2.1
- GPT-4-Turbo
In general, if you define GPT-3.5 as "GPT-3.5-Turbo-1106", then there are already 17 models outranking it on this leaderboard https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard
@JonasVollmer , by GPT3.5, it was meant the original version: gpt-3.5-turbo-0613, which will deprecate soon. I'll operationalize the market better.
@JuJumper If it's accessible to people not affiliated with the creator of the model, that's a public release
@firstuserhere My suggestion is that separate versions of similar models should not count separately, and you should only count one per series of models (i.e. only the best OpenAI GPT, only the best Gemini model, only the best PaLM model, etc)
Mixtral does not currently outperform GPT-3.5-Turbo on most benchmarks: https://arxiv.org/abs/2312.11444
(Although I'm unsure whether that paper was using the instruction fine-tuned version of Mixtral, which could make a big difference.)