On December 31 2025, will the LMSys code arena's best closed-source LLM out-perform the best open-weights LLM by less than 50 points?
As of July 27, 2024 the gap is 58 ELO points.
If LMSys ceases to exist or to evaluate models, I will resolve to 50%.
If a model is open-weights but the LMSys eval uses an API e.g. deepseekv2-API this still qualifies as open-weights (unless I get evidence that the API version was different enough to affect this question; in such a case I would resolve to 50%).
Chart from https://x.com/maximelabonne/status/1779801605702836454 This shows all-question ELO whereas this market resolves by coding-only ELO, the trend is similar.
https://x.com/amebagpt/status/1836875571906666836
The LMSYS main arena gap over time (1st vs 2nd, not necessarily OS)