Prediction #4 from:
My Takeaways From AI 2027 - by Scott Alexander
People are also trading
A big problem with this market is that the premise it's based on—that there is an "at least one year gap" between frontier proprietary and open models—is demonstrably untrue and has not been true throughout most of 2024 and all of 2025. I'd argue even as early as mid-2023 this was already untrue.
Llama 2 70B (Jul '23) was roughly on par with GPT-3.5 (Nov '22); the benchmarks used at the time weren't great but it was firmly in the same league. Llama 3.1 405B (Jul '24) was fully on par with early GPT-4 Turbo (Nov '23) at least, and certainly ahead of the vanilla GPT-4 releases. That's a gap of ~8 months sustained over a year. DeepSeek R1 (Jan '25; feels forever ago but hasn't even been a year!) was miles ahead of any model released prior to September 2024, that's a gap of ~4.5 months, the smallest it's ever been. Even if we squint our brain really hard and argue that the likes of GLM-4.7, DS v3.2 Finale, MiniMax M2.1, Ernie 5.0, and Kimi K2 are still roughly on the level of o3-high (Apr '25) or early Gemini 2.5 Pro (Mar '25) and not in any way ahead of them (also untrue but let's entertain the thought), we're talking an overall capability gap of at most 7–8 months. Since mid-2023, we've never seen the gap grow bigger than 8 months.
All of this tracks both across objective capability indexes like AAII and subjective communal preference aggregators like LM Arena. So even if we consider R1 to be a sole blip of relative capability spurt, the real gap has never actually exceeded a year and nothing indicates it growing bigger. If anything, we see more and more Chinese players catch up and fit within that 8 month gap despite being severely constrained by available compute; otherwise we'd see it shrink to like 1–3 months in no time.
So I really don't know what Scott Alexander's numbers are based on (surely not anything demonstrable, otherwise we'd clearly see it) but it's gonna be hard to resolve this based on that alone unless we just go with whatever Scott himself thinks is the case. But as it stands, it's simply unfalsifiable. (This is not a criticism of the arguments in the post itself, just this particular premise it—and hence this market—goes with.)
@moozooh I think people generally refer to this graph
@mitchellreynolds I don't know if they "generally" refer to that graph but even the graph itself mostly shows exactly I was talking about: that the gap was never bigger than 8 months and is actively shrinking over time (even if we put aside how weird some of the points of comparison on it are—if e.g. Llama 2 7B, Solar Mini, or Grok 3 Mini are the best points of comparison at any point, the problem is clearly with the methodology).
@moozooh Ah I see, I interpret the ACX 1-2 year gap as a concrete stand-in timeframe that's directional vs a precise one. My assumption is that both OP and ACX are saying that OSS & Frontier Labs aren't trading places every other month or so.
The other part implicitly not included in the graph is that Frontier Labs have access internally for a handful of months (eg Mythos is in a private release state right now and for the near-term. Also iirc Anthropic said they've had internal access for a handful of months).
Either way, I think the question should be a little more precise. "Will the best open-source LLM surpass the best closed-source LLM?" I want to put frontier labs instead of closed-source but you could argue that Meta and Alibaba are frontier labs. Also the lead of an OSS LLM will be short-lived by default.