Meta cheated at LM Arena to pump Llama-4's score?

Question

Currently Llama-4-Maverick is sitting at #2 in Arena (https://lmarena.ai/?leaderboard). However, overall feedback about Maverick does not reflect this, and the examples released (see: https://x.com/lmarena_ai/status/1909397817434816562 ) are sometimes baffling as to how Maverick got chosen. There are accusations of cheating.

So, did they outright cheat, rather than merely Goodhart?

Resolves to YES if at any point I am 95%+ confident that Meta cheated, or someone cheated on their behalf, in a way that impacted their ordinal ranking.

Resolves to NO if at any point I am 95%+ confident that Meta did NOT cheat, and no one else cheated on their behalf, in a way that impacted their ordinal ranking.

If neither occurs within a year, this resolves to my probability that this was the result of cheating, with a strong prior towards fair market price. If this market gets big enough to care and a better resolution mechanism with the same goal is suggested, I might switch to a different rule here prior to 7/1/25.

WARNING: SUBJECTIVE EVALUATION MARKET if evidence is not definitive. I don't know any other way to offer this market, and I WILL NOT BE ARGUING ABOUT THAT unless someone wants to pay my hourly (hint: don't do that).

[link preview]Update 2025-04-08 (PST) (AI summary of creator comment): Different version use is not, by itself, sufficient for cheating.

If Meta used a different version of the model solely for the arena purposes, that does not meet the bar for cheating.

There must be additional evidence of misconduct beyond using a different model version that affected their ranking.

The resolution will require that more than just a version change be evident before concluding that cheating occurred.

Update 2026-04-08 (PST) (AI summary of creator comment): The creator indicates this market is likely to resolve NO. Meta has confessed to using different model versions for Arena, but per the resolution criteria, using a different model version alone does not constitute cheating. Resolution will remain NO unless a strong argument is made for why it should be otherwise.

Manifold Markets · Accepted Answer

No — resolved on Apr 9, 2026 by Manifold Markets prediction market.

#	Trader	Total profit
1		Ṁ783
2		Ṁ80
3		Ṁ75
4		Ṁ68
5		Ṁ65

🏅 Top traders

People are also trading

People are also trading

Related questions