GPT-5 plus scaffolding and inference-compute ~= training compute will achieve capabilities advance >= (GPT-4 to GPT-5).

1.6kṀ2377

2026

79%

chance

ALL

Question written out without the abbreviations for clarity:

GPT-5, if given scaffolding and inference-compute that is approximately equal to its training compute will achieve a capabilities advance of similar or greater magnitude than the capabilities advance from GPT-4 to GPT-5.

Important! This question is assuming that the capabilities increase from GPT-4 to GPT-5 is at least as large as the increase from GPT-3 to GPT-4. If it is widely agreed that the capabilities increase from GPT-4 to GPT-5 is significantly smaller (e.g. because LLM scaling hits a ceiling), then the question will resolve N/A.

This question is related to, but different from, my other question here: https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s

The discussion in the comments section on that question will give you more insight into my thinking, if that's something you want.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

What will be true about GPT-5?

Will GPT-5 be capable of recursive self-improvement?

9% chance

Will the performance jump from GPT4->GPT5 be less than the one from GPT3->GPT4?

56% chance

GPT-5 capabilities at launch - make your predictions

GPT-5 trained with >=24k GPUs?

95% chance

How much compute will be used to train GPT-5?

GPT-4 performance and compute efficiency from a simple architecture before 2026

19% chance

Will the ratio of inference runs to training runs on GPT5 decrease from the ratio on GPT4?

50% chance

What hardware will GPT-5 be trained on?

Will GPT-5 be released incrementally as GPT4.x for different checkpoints from the training run?

Sort by:

The capability advance should be measured above gpt-5, or above public sota or what? This resolves to your credence at time of resolution or...?

https://x.com/adonis_singh/status/1918934825223794888

Another perspective related to this: https://youtube.com/clip/UgkxFgl8Zw2bFKBtS8BPrhuHjtODMNCN5E7H?si=JBw5ZUylexeR43DT

Very difficult to read but i think it mean

Base_delta = "base gpt-5" - "base gpt-4"

Scaffold_bonus = "gpt-5 scaffolded" - "base gpt-4"

bool market_outcome = (scaffold_bonus > Base_delta)

So I guess this is trying to compare intelligence improvements vs tool use? (Though a smarter model should recognize whenever a tool is a more effective option)

Also tool use should be fully integrated.

Gpt-5 may be natively multimodal and have python interpreter access and reference material access at all times in training. I assume if there is no way to benchmark the model without scaffolding the market resolves N/A?

@GeraldMonroe That's not quite what I mean. Maybe this comment will make it more clear: https://www.lesswrong.com/posts/NXcm2zWx2MG4sbQio/deliberative-cognitive-algorithms-as-scaffolding?commentId=3vKkasBqFjDSrCpKv

@NathanHelmBurger interesting. Note that if this works as well as the paper claims you can bake it into the model itself during the RL phase. Scaffolding is all internal, model weights adjusted to effectively use this tool.

This would express itself as 0 improvement using this method on gpt-5, since the model is already doing something similar.

@GeraldMonroe Yes, I agree that the more powerful way to use this scaffolding is to apply it in the RL phase. I expect that things like this (probably including the ideas in the discussed paper) will be included in the RL phase for GPT-5. Which would mean that this exact scaffolding might not show itself to be of much help on top of GPT-5.

Nevertheless, I think that there will be NEW scaffolding which is devised in the future which will be of use on top of GPT-5. Thus, my heavy betting on YES.

For instance, an API by which an LLM could run relatively complicated ML experiments and receive nicely formatted data back once the experiment completed. This is something which I don't think anyone has published about yet, but I do expect will be tried by at least one of the frontier labs.