Will a GPT-4 level efficient HRM based language model be released before Feb 2026? [Details in description]
47
100Ṁ3156
2026
24%
chance
14

Inspired by: https://www.lesswrong.com/posts/tEZa7PouYatK78bbb/i-am-worried-about-near-term-non-llm-ai-developments

Hierarchical Reasoning Model (HRM) paper: https://arxiv.org/pdf/2506.21734

The model must:

  • Be primarily based on the HRM architecture, rather than a large transformer (though may include transformer-like components, e.g. attention)

  • Have <100 billion parameters (GPT-4 estimate at 1.76 trillion, GPT-3 is 175 billion)

  • For all benchmarks it is tested on that overlap with those GPT-4 was tested on, it is better in at least half (specifically those covered by tables 1 & 2, and TruthfulQA in the GPT-4 technical report: https://arxiv.org/abs/2303.08774)

  • Update 2025-08-01 (PST) (AI summary of creator comment): In response to a question, the creator has specified the following:

    • To resolve YES, there must be solid evidence that the model's parameter count is under 100 billion.

    • If a candidate model is released but its parameter count is unclear when the market closes, resolution may be delayed to wait for more information to become available.

  • Update 2025-08-01 (PST) (AI summary of creator comment): If a candidate model's parameter count is unclear at the market's close date:

    • The creator may delay resolution for a maximum of 1 month to wait for more information.

    • If clearer information is unlikely to emerge, the creator will consider the best available estimate at that time as sufficient evidence for resolution.

Get
Ṁ1,000
to start trading!
Sort by:

What counts as a release? If the weights aren't public, but the model is estimated to have <100B parameters, how would this market resolve?

@ShankarSivarajan To resolve yes it would need solid evidence the parameter count was <100 Billion. However, if a model was released and it was unclear whether it was under the threshold, and the end date passed, I would delay resolving if there was a good chance things would become clearer.

@Jasonb Would a low confidence Epoch estimate count as "solid evidence".

@CalebParikh I think this depends on how likely we are to get clearer information and in what timeframe. I'd count something like that if that was as good an estimate as we were going to get. Hard to pin down the exact tradeoff that should be made. I could commit to a fixed maximum delay time if that helps make things more concrete to base predictions off? Maybe 1 month?

How correlated do you guys think this market is with p(doom) and is that correlation positive or negative?

@JoshSnider Probably weakly negatively correlated in my opinion, but would depend on how easy it is to interpret / align these models vs transformers and I could see evidence on that swinging it either way (though as noted in the LW post, prior on interp difficulty is that it's harder as you might not have a plaintext chain of thought). I agree with the LW poster that if we think this could reasonably happen, we ought to start investigating some of these properties as soon as we can. This somewhat motivated posting the question here.

© Manifold Markets, Inc.TermsPrivacy