Will a GPT-4 level efficient HRM based language model be released before Feb 2026? [Details in description]

Ṁ1kṀ24k

resolved Feb 12

Resolved

ALL

Inspired by: https://www.lesswrong.com/posts/tEZa7PouYatK78bbb/i-am-worried-about-near-term-non-llm-ai-developments

Hierarchical Reasoning Model (HRM) paper: https://arxiv.org/pdf/2506.21734

The model must:

Be primarily based on the HRM architecture, rather than a large transformer (though may include transformer-like components, e.g. attention)

Have <100 billion parameters (GPT-4 estimate at 1.76 trillion, GPT-3 is 175 billion)
For all benchmarks it is tested on that overlap with those GPT-4 was tested on, it is better in at least half (specifically those covered by tables 1 & 2, and TruthfulQA in the GPT-4 technical report: https://arxiv.org/abs/2303.08774)

Interested in architectural advancements more broadly? See here: https://manifold.markets/Jasonb/significant-advancement-in-frontier

Update 2025-08-01 (PST) (AI summary of creator comment): In response to a question, the creator has specified the following:
- To resolve YES, there must be solid evidence that the model's parameter count is under 100 billion.
- If a candidate model is released but its parameter count is unclear when the market closes, resolution may be delayed to wait for more information to become available.

Update 2025-08-01 (PST) (AI summary of creator comment): If a candidate model's parameter count is unclear at the market's close date:
- The creator may delay resolution for a maximum of 1 month to wait for more information.
- If clearer information is unlikely to emerge, the creator will consider the best available estimate at that time as sufficient evidence for resolution.

Market context

Get

1,000

to start trading!

🏅 Top traders

#	Trader	Total profit
1		Ṁ1,466
2		Ṁ214
3		Ṁ173
4		Ṁ112
5		Ṁ98

People are also trading

Will we fully interpret a GPT-2 level language model by 2028?

Sort by:

What's the closest HRM got?

@JoshSnider I'm not aware of any HRM based LLM having been trained.

bought Ṁ15 NO

bought Ṁ250 NO

Chollet claims that the paper's performance is not a result of HRM, but of data augmentation. https://x.com/fchollet/status/1956442449922138336

Wait hang on I was not intending to be the biggest "yes holder" I just put a limit order at 10%

What counts as a release? If the weights aren't public, but the model is estimated to have <100B parameters, how would this market resolve?

@ShankarSivarajan To resolve yes it would need solid evidence the parameter count was <100 Billion. However, if a model was released and it was unclear whether it was under the threshold, and the end date passed, I would delay resolving if there was a good chance things would become clearer.

@Jasonb Would a low confidence Epoch estimate count as "solid evidence".

@CalebParikh I think this depends on how likely we are to get clearer information and in what timeframe. I'd count something like that if that was as good an estimate as we were going to get. Hard to pin down the exact tradeoff that should be made. I could commit to a fixed maximum delay time if that helps make things more concrete to base predictions off? Maybe 1 month?

How correlated do you guys think this market is with p(doom) and is that correlation positive or negative?

@JoshSnider Probably weakly positively correlated in my opinion, but would depend on how easy it is to interpret / align these models vs transformers and I could see evidence on that swinging it either way (though as noted in the LW post, prior on interp difficulty is that it's harder as you might not have a plaintext chain of thought). I agree with the LW poster that if we think this could reasonably happen, we ought to start investigating some of these properties as soon as we can. This somewhat motivated posting the question here.

(edited negative to positive to fix mistake)

People are also trading

Will we fully interpret a GPT-2 level language model by 2028?

14% chance

🏅 Top traders

People are also trading

People are also trading

Related questions