EOY 2025: Will open LLMs perform at least as well as 50 Elo below closed-source LLMs on coding?

1kṀ3820

2026

30%

chance

ALL

On December 31 2025, will the LMSys code arena's best closed-source LLM out-perform the best open-weights LLM by less than 50 points?

As of July 27, 2024 the gap is 58 ELO points.

If LMSys ceases to exist or to evaluate models, I will resolve to 50%.

If a model is open-weights but the LMSys eval uses an API e.g. deepseekv2-API this still qualifies as open-weights (unless I get evidence that the API version was different enough to affect this question; in such a case I would resolve to 50%).

Chart from https://x.com/maximelabonne/status/1779801605702836454 This shows all-question ELO whereas this market resolves by coding-only ELO, the trend is similar.

Update 2025-05-28 (PST) (AI summary of creator comment): The creator has indicated that the market title has been updated to provide further clarity on the resolution criteria. This action was taken in response to a user's question about how the market resolves, particularly in scenarios involving the ELO difference between open-source and closed-source models. Please refer to the updated market title for the most precise definition of the resolution condition.

Technical AI Timelines

LLMs

Programming

AI Alignment

Chatbot Arena Leaderboard

Get

1,000

to start trading!

People are also trading

Will China have the best open LLM at EOY?

68% chance

What will be true of OpenAI's best LLM by EOY 2025?

Will the best public LLM at the end of 2025 solve more than 5 of the first 10 Project Euler problems published in 2026?

75% chance

In 2025, will I be able to play Civ against an LLM?

15% chance

Will an LLM get > 50% on hard problems on LiveCodeBench Pro?

50% chance

By 2025 end, will it be generally agreed upon that LLM produced text/code > human text/code for training LLMs?

11% chance

400-point pwn solved by an LLM by 2025

47% chance

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

26% chance

Will an LLM beat a Super GM Bot on chess.com by 2028?

50% chance

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

Sort by:

bought Ṁ5 YES

What if open source models beat closed source models by more than 50 points? For example, o5 is at 1000 elo, and DeepSeek R2 is at 1100 elo. What will it resolve to?

@JamesJohnson updated title to make clear

https://x.com/amebagpt/status/1836875571906666836

The LMSYS main arena gap over time (1st vs 2nd, not necessarily OS)

x.com

If no one objects, I'll update question to read: "We'll go along with any LMsys evaluation updates: e.g. if there's a code-hard / code-style control etc. we'll use whatever the fanciest LM sys eval ends up being as long as it's code-only."

For clarification: if open source LLM overtakes closed-sourced one, will market resolve as "Yes"?

Yes

bought Ṁ10 YES

Thanks for clarification. I would buy "yes". I expect that in even worst case open source will advance with similar speed to closed source. I think Arena will eventually saturate, and shrink gap between top tiers artificially