Will open-source AI remain at least one year behind proprietary AI? (ACX, AI 2027 #4)

MANIFOLD

Ṁ1kṀ765

2030

75%

chance

ALL

Prediction #4 from:
My Takeaways From AI 2027 - by Scott Alexander

Market context

Technology

Technical AI Timelines

AI Impacts

AI 2027

Get

1,000

to start trading!

People are also trading

Will xAI have an AI model restricted from public use in 2026?

20% chance

Will AI Research Be Mostly Autonomous By June 1 2027?

22% chance

Will OpenAI go public in 2026?

21% chance

Will open-source AI win? (through 2028)

32% chance

Will Open AI ‘collapse’ in the next two years?

22% chance

Will OpenAI publicly share software it developed to make its AI run on chips from different providers, in 2026?

25% chance

Will OpenAI be in the lead in the AGI race end of 2026?

31% chance

Will OpenAI exist in Jan 2027?

97% chance

Will OpenAI lose the lead in AI continually for a period of at least 6 months during the next three years?

62% chance

Will xAI stop working on AI research by 2029?

28% chance

Sort by:

A big problem with this market is that the premise it's based on—that there is an "at least one year gap" between frontier proprietary and open models—is demonstrably untrue and has not been true throughout most of 2024 and all of 2025. I'd argue even as early as mid-2023 this was already untrue.

Llama 2 70B (Jul '23) was roughly on par with GPT-3.5 (Nov '22); the benchmarks used at the time weren't great but it was firmly in the same league. Llama 3.1 405B (Jul '24) was fully on par with early GPT-4 Turbo (Nov '23) at least, and certainly ahead of the vanilla GPT-4 releases. That's a gap of ~8 months sustained over a year. DeepSeek R1 (Jan '25; feels forever ago but hasn't even been a year!) was miles ahead of any model released prior to September 2024, that's a gap of ~4.5 months, the smallest it's ever been. Even if we squint our brain really hard and argue that the likes of GLM-4.7, DS v3.2 Finale, MiniMax M2.1, Ernie 5.0, and Kimi K2 are still roughly on the level of o3-high (Apr '25) or early Gemini 2.5 Pro (Mar '25) and not in any way ahead of them (also untrue but let's entertain the thought), we're talking an overall capability gap of at most 7–8 months. Since mid-2023, we've never seen the gap grow bigger than 8 months.

All of this tracks both across objective capability indexes like AAII and subjective communal preference aggregators like LM Arena. So even if we consider R1 to be a sole blip of relative capability spurt, the real gap has never actually exceeded a year and nothing indicates it growing bigger. If anything, we see more and more Chinese players catch up and fit within that 8 month gap despite being severely constrained by available compute; otherwise we'd see it shrink to like 1–3 months in no time.

So I really don't know what Scott Alexander's numbers are based on (surely not anything demonstrable, otherwise we'd clearly see it) but it's gonna be hard to resolve this based on that alone unless we just go with whatever Scott himself thinks is the case. But as it stands, it's simply unfalsifiable. (This is not a criticism of the arguments in the post itself, just this particular premise it—and hence this market—goes with.)

@moozooh I think people generally refer to this graph

https://artificialanalysis.ai/models/open-source#progress-in-open-weights-vs-proprietary-intelligence

@mitchellreynolds I don't know if they "generally" refer to that graph but even the graph itself mostly shows exactly I was talking about: that the gap was never bigger than 8 months and is actively shrinking over time (even if we put aside how weird some of the points of comparison on it are—if e.g. Llama 2 7B, Solar Mini, or Grok 3 Mini are the best points of comparison at any point, the problem is clearly with the methodology).

@moozooh Ah I see, I interpret the ACX 1-2 year gap as a concrete stand-in timeframe that's directional vs a precise one. My assumption is that both OP and ACX are saying that OSS & Frontier Labs aren't trading places every other month or so.

The other part implicitly not included in the graph is that Frontier Labs have access internally for a handful of months (eg Mythos is in a private release state right now and for the near-term. Also iirc Anthropic said they've had internal access for a handful of months).

Either way, I think the question should be a little more precise. "Will the best open-source LLM surpass the best closed-source LLM?" I want to put frontier labs instead of closed-source but you could argue that Meta and Alibaba are frontier labs. Also the lead of an OSS LLM will be short-lived by default.

@mitchellreynolds I mean the question explicitly says "at least one year" so that indicates it's not just a directional timeframe but a specific time gap of capability which is a highly contentious premise.

Regarding internal access, sure, there's room for speculation there, but to be fair to the Chinese labs, we also don't know enough to say confidently how much time passes between them having something new internally and releasing it publicly. We know for a fact that they're much more severely constrained on R&D resources and more lax on implementing RL-based guardrails so it's reasonable to assume that the time gap between discovery and release there is somewhat shorter, but even with that in mind nothing indicates any of the frontier labs had anything close in capability to Kimi 2.5 or GLM-5.1 until something like GPT-5 Pro became available, let alone mid-spring '25. The gap is definitely shrinking.