Will Mistral-Large be considered on par or better than GPT-4? (text)
Basic
28
Ṁ3048
resolved Mar 6
Resolved
NO

Introduction:

This market assesses the potential of Mistral-Large to match or exceed the performance of OpenAI's GPT-4 in the realm of Large Language Models (LLMs).

Resolution Criterion:

Resolution will be based on Mistral-Large's performance compared to the highest-scoring version of GPT-4 (currently the 1106 version with a score of 1249) on the 🏆 LMSYS Chatbot Arena Leaderboard. The market will resolve when Mistral-Large's performance is officially recorded on the leaderboard, determining if it scores higher than GPT-4’s peak performance.

Discussion of Edge Cases:

1. Different Versions of GPT-4: If newer versions of GPT-4 are released before the resolution, the comparison will be made with the highest-scoring version available at the time of resolution.

2. Naming Variations of Mistral-Large: If the model anticipated as Mistral-Large is released under a different name, the market will focus on this successor, regardless of its name, as long as it represents the next significant iteration of the Mistral series. The comparison will be made with the latest version of this model at the time of resolution, ensuring that the market accurately reflects the advancements and capabilities of Mistral's next major release in LLM technology.

3. Changes in Benchmarking Standards: If there are significant changes to the benchmarking criteria or scoring method on the LMSYS Chatbot Arena Leaderboard, the market will use the consensus of expert opinion and published reviews in leading AI research forums and publications.

4. Non-Publication or Inaccessibility of Scores: If Mistral-Large’s score is not published or made accessible on the LMSYS Chatbot Arena Leaderboard for any reason, the market will use the consensus of expert opinion and published reviews in leading AI research forums and publications.

5. Delayed Releases: If Mistral-Large is not released or its performance is not recorded on the leaderboard by a specified future date, a deadline of Jan 1 2027 can be set for resolution, beyond which the market may also resolve as 'NO'.

These edge cases ensure a comprehensive and fair assessment, accounting for possible variations and developments in the AI field prior to the market's resolution.

Get
Ṁ1,000
and
S1.00
Sort by:

Resolving NO.

@Sss19971997 that's surprisingly close to the original gpt-4 huh

@RemNi "Original" would be 0314, right?

Curious as to what evidence people are looking at who are voting yes here

guys, 81.2 on MMLU does not mean better or worse on lmsys. let's wait for the result

Comment hidden