Why does Grok 3 only have language ratings in English, Chinese, Russian, and Korean in the LM Arena?
5
1kαΉ€2520
Mar 20
75%
Other.
53%
It's purely coincidental.
19%
Because Musk is cooperating with the Chinese, Russians, and/or North Koreans.
16%
Because it's based on a Chinese model primarily catering to those markets.

As of the latest update from LM Arena (lmarena.ai) on 2025-02-16, chocolate (Early Grok-3) only has ratings in English, Chinese, Russian, and Korean, while ratings for French, German, Spanish, and Japanese are absent. I take this as evidence that Grok-3 is based on a Chinese model.


Resolution

  • "Because it's based on a Chinese model..." resolves as YES if no credible alternative explanation emerges within 30 days.

  • "Because Musk is cooperating with the Chinese..." resolves as YES if no credible alternative explanation emerges within 30 days. Grok-3 seems too strong to be merely a fine-tuned version of, for example, DeepSeek v3 / r1, suggesting either an advanced, unreleased Chinese model or some level of cooperation, such as e.g. shared training data.

  • "It's purely coincidental" and "Other" will resolve as NO unless supporting evidence appears within 30 days.

I may add additional explanations if they seem warranted.


I will not bet in this market.

  • Update 2025-02-21 (PST) (AI summary of creator comment): Clarification to Resolution Criteria:

    • The new update on LM Arena now includes a Japanese rating that has slightly over 300 votes, which confirms the threshold theory.

    • This update disproves the earlier implied evidence for Grok 3 being based on a Chinese model.

    • Consequently, unless new evidence emerges before March 20:

    • The options "Because it's based on a Chinese model...", "Because Musk is cooperating with the Chinese...", and "It's purely coincidental" will resolve as NO.

    • The option "Other." will resolve as YES.

Get
αΉ€1,000
to start trading!
Sort by:

i made a custom leaderboard viewer and oddly enough it has rankings in those languages there, implying that the rankings exist but lmarena.ai just doesn't show them

@traders It looks like there might be a good statistical explanation for English, Chinese, Russian, and Korean: LMarena seems to only show a model in a given category when it reaches 300 votes. Korean barely surpasses that threshold with 387 votes. French, German, Spanish, and Japanese have either a lower or only slightly higher share than Korean, so it is possible that they simply haven't passed the threshold yet. In that case, the next LMarena update would be expected to include some additional languages like German and Japanese for "chocolate (Early Grok-3)."

@traders


The arena has a new update, which now has a Japanese rating that has slightly over 300 votes, just as expected. This clearly disproves the evidence for Grok 3 being based on a Chinese model. Unless new evidence emerges before March 20, the options "Because it's based on a Chinese model..." and "Because Musk is cooperating with the Chinese..." and "It's purely coincidental" will resolve as NO, while "Other." will resolve as YES.

@ChaosIsALadder I don't understand how this isn't coincidental - it just so happens that the arena doesn't show models with enough data, which counts as a coincidence

@KTibow It's coincidental, but not purely coincidental. The original list of languages with rating might as well have contained Japanese instead of Korean, but Spanish and French would have been much less likely because there are much less Spanish and French prompts. Since there are some statistical rules governing this phenomenon making it predictable, it can't be claimed it's purely coincidental.

@traders I clarified the resolution criteria in the market description while maintaining the original meaning.

Related questions

Β© Manifold Markets, Inc.β€’Terms + Mana-only Termsβ€’Privacyβ€’Rules