Mistral Large 2 outperforms Llama 3.1 405b Instruct on Chatbot Arena on August 12th?
➕
Plus
50
79k
resolved Aug 13
Resolved
NO

Mistral Large 2 reportedly outperforms on Arena Hard as well as MT Bench:

Get Ṁ1,000 play money

🏅 Top traders

#NameTotal profit
1Ṁ6,701
2Ṁ1,026
3Ṁ681
4Ṁ613
5Ṁ459
Sort by:

resolves no

Has anyone had good experiences with mistral large 2? because there have been other great benchmarks too

Mistral Large 2 is now on the leaderboard with 1248, 15 points short of 405B

https://chat.lmsys.org/?leaderboard

Looks like this will likely resolve N/A, strange how little people care about this model?

nope! haven't been following it but surprised by how low it is.

It catches up quite a bit on math, coding but still doesn't pass

if mistral 2 is not on the leaderboard, does that resolve NO or N/A?

Good question! I’d say N/A, the point of the market is to aggregate information on whether or not Mistral’s model will do well with human raters, not speculate on whether it will be included on the leaderboard.

If lots of traders in this market disagree I’m very willing to change my mind.

I agree with n/a

No no, N/a is fair

Does this measure the difference at a single point, or or once per day, continuously? I.e. "ever outperform before" vs "outperform on"

I think the right choice for markets like this is "outperform on", because otherwise they'll move up and down with noise and that biases the market towards yes vs what you want to ask which is 'is it better'

good points — let's say outperform on august 12th, midnight pacific time.