Will GPT-5 be capable of recursive self-improvement?
103
closes 2026
42%
chance

If there is a GPT-5 developed which is a similar amount better than GPT-4, as GPT-4 was to GPT-3, then will GPT-5 be capable of recursive self-improvement with a minimal amount of prompt engineering / scaffolding?

Note: recursive self-improvement, in the early stages, doesn't require novel scientific breakthroughs. It is sufficient to successfully integrate existing work that is not yet part of the model. Of course, to be 'recursive' this must be shown to repeat, and there must be evidence that the later generations are capable of advancements that the initial generation was not capable of.

Since this market is getting more interest, I thought I'd put some clarification here. I'm up for having a 3rd party arbiter of this question, details can be arranged closer to the close date.

If GPT-5 comes out before the stated close of the market, then the market will close as soon as the question can be evaluated. Subtle self-improvements which quickly plateau out, such as has been seen so far with GPT-4 using Reflexion, will not count. The process doesn't need to be entirely 'within' the model, as direct modifications of the model's weights. It could include external code wrappers and memory systems interfacing through an API. The system does need to show multiple steps of clear improvement, where the later steps are demonstrably better at making further improvements than the earlier steps.

As clarified in the comments, if the recursive self-improvement can't be clearly demonstrated using less than 3% of the FLOPs used in training GPT-5, then it doesn't count.

Sort by:
NathanHelmBurger avatar
Nathanis predicting YES at 53%

"CEO Sam Altman has privately suggested OpenAI may try to raise as much as $100 billion in the coming years to achieve its aim of developing artificial general intelligence that is advanced enough to improve its own capabilities, his associates said." - https://www.theinformation.com/articles/openais-losses-doubled-to-540-million-as-it-developed-chatgpt

YoavTzfati avatar
Yoav Tzfati

Do humans have recursive self improvement in the sense of this market?

NathanHelmBurger avatar
Nathanbought Ṁ1,000 of YES

@YoavTzfati Not until we are able to significantly alter our brains with genetic engineering and brain-computer interfaces. This is something stronger than just learning-as-usual.

Gen avatar
Genzy

I don't think the most recent note at the bottom of the description is a reasonable change to the market (it is specifically broadening the scope of the original question which was hyper-specific). Even worse that it does so at the favour of the market creator who has 98% of the total ~18 200 YES shares

TylerColeman avatar
Tyler Coleman

@Gen Agree. If other possible AIs are crucial to the question, they should be included in a new, separate market.

NathanHelmBurger avatar
Nathanis predicting YES at 53%

@TylerColeman @Gen fair. I'll remove that from this question and make a separate market.

NathanHelmBurger avatar
Nathanis predicting YES at 53%

@TylerColeman is the concern addressed or is there more to change?

TylerColeman avatar
Tyler Coleman

@NathanHelmBurger I'm satisfied, thanks.

NoaNabeshima avatar
Noa Nabeshima

How much compute can the improvements require? Would you be open to giving a rough threshold as, say, a percentage of GPT-5's training compute?

NathanHelmBurger avatar
Nathanis predicting YES at 60%

@NoaNabeshima Nice questions. For this one, I'd say that it would make sense that the compute needed for a step which delivered a gain of x would need to be cheaper than a gain of capability roughly equivalent to x cost in FLOPs during training. Does that make sense?

NoaNabeshima avatar
Noa Nabeshima

Can x cost in FLOP be as large as GPT-5's training FLOP?

NathanHelmBurger avatar
Nathanis predicting YES at 65%

@NoaNabeshima Hmm, I want really thinking of an x that large. I suppose my best answer is that the question needs to be answered with less compute than that, so if minimum viable step side was larger than the whole training cost then I'd resolve no even though that's an unclear edge case.

NoaNabeshima avatar
Noa Nabeshima

How much less would it have to be before it counts? Would any of 3%, 10%, 30%, 50% count?

NoaNabeshima avatar
Noa Nabeshima

@NoaNabeshima (percentage of GPT-5's training FLOP)

NathanHelmBurger avatar
Nathanis predicting YES at 65%

@NoaNabeshima for the purposes of this definition, let's say a max of 3%. Not because that number constrains reality in some meaningful way, but because I think that it works be implausible to measure if it were more.

NathanHelmBurger avatar
Nathanis predicting YES at 65%

@NathanHelmBurger My expectation is that it will show up at very little extra compute, like less than a tenth of a percent. That there will be a series of small steps that can be taken in the direction of improvement, you can plot a straight or increasing line through them according to multiple benchmarks, and be able to say 'this trend could plausibly continue '.

NoaNabeshima avatar
Noa Nabeshima

Do the later steps need to be better in a large way? How large? Or do they just need to be demonstratably better to any degree, however small?

NathanHelmBurger avatar
Nathanis predicting YES at 60%

@NoaNabeshima Individual 'steps' can be small, since the idea of a 'step' is fairly arbitrary. What's important is the lack of plateauing after multiple steps. So, as you mentioned in your comment above, the steps must be cheaper than the ordinary training, and the trend of improvement has to seem at least linear (not slowing) for the range in which we are able to observe it. I believe these two requirements together describe the sort of accelerating process I am trying to pinpoint with the question.

NathanHelmBurger avatar
Nathanis predicting YES at 56%

Some thoughts on my current understanding of the AI development landscape (which may be wrong!): https://www.lesswrong.com/posts/GxzEnkSFL5DnQEAsZ/paulfchristiano-s-shortform?commentId=hEQL7rzDedGWhFQye

RaulCavalcante avatar
Raul Cavalcantebought Ṁ0 of NO
Gigacasting avatar
Gigacastingbought Ṁ0 of YES

GPT-3.5 also is more based and less censored every week

Gigacasting avatar
Gigacasting

It’s pretty obvious this is already true.

It was true of google search (bouncebacks and click through as retune the algo)

And it’s true of the rlhf they are using.

The models change every day

NathanHelmBurger avatar
Nathanis predicting YES at 82%

@Gigacasting I agree it's kinda true in a weak sense, with small quickly-plateauing self-improvements, and with human-in-the-loop larger improvements. However, in the sense of "strong human-out-of-the-loop repeatable self-improvement"... not yet. That stronger case is what this question is about.

NathanHelmBurger avatar
Nathanbought Ṁ2,500 of YES

There's something kinda amusing to me about the fact that the biggest buyers of 'NO' on this market are bots....

RaulCavalcante avatar
Raul Cavalcanteis predicting NO at 69%

there are 6 "YES" holder and 29 "NO" holders and you alone hold ~97.4% of the "YES" shares, out of 6 "NO" holders 3 have negative profit, and out the 15 biggest human "YES" positions 13 have a profit 2 have negative profit.

NathanHelmBurger avatar
Nathanbought Ṁ1,000 of YES

@RaulCavalcante Yeah, I figured this would be one of those things that started with me confidently proclaiming a thing that others find implausible, but that (if I'm correct) more people later will take my side. We'll see. I could just turn out to be wrong. Hopefully we have a couple years before we find out. :-)

RaulCavalcante avatar
Raul Cavalcantebought Ṁ100 of NO

@NathanHelmBurger Hopefully you're wrong, not only because the human race would likely cease to exist if you were, but more importantly I would lose nearly 500 M$ !!! sends shivers down my spine

jacksonpolack avatar
jackson polackis predicting NO at 51%

why do you keep buying it up to 99.5%??

NathanHelmBurger avatar
Nathanis predicting YES at 58%

@jacksonpolack because I am confident in my belief, and feel it is valuable for the world to know that. I'm willing to put both my money and my reputation on the line to show the strength of my belief. I do intend to evaluate this honestly. I would even agree to have a 3rd party arbiter.

firstuserhere avatar
firstuserhereK 🟡

Would you say that GPT-4 can do some sort of recursive self improvement?

I asked it to generate a poem >10 lines long with every word starting with F.

It failed. I then said, "does the poem satisfy the criteria?"

It was able to identify the mistake and only the mistake was changed in the subsequent reply, with no "hinting" by me at the problem

firstuserhere avatar
firstuserhereK 🟡bought Ṁ9 of YES

@firstuserhere v minimal of course, and often doesn't improve even upon prompting the exact mistake. For example, 1/3-1/2 as part of a big calculation, it gave 1/6 and even when i told it it's -1/6 it failed at it repeatedly, giving +1/6 only

NathanHelmBurger avatar
Nathanis predicting YES at 31%

@firstuserhere Yeah, it can do a bit of self-improvement, but not in the recursive way the question is trying to get at. You can't (yet) set it to running on a given task and say 'keep getting better until you are superhuman at this task you are currently bad at, then give me the superhuman solution'. For that it'll need to be able to do things like get better at learning to learn after realizing that in order to meet its goal it needs to improve itself, and to do that it needs to learn more about how to improve itself.

It's possible that someone will create a plugin for GPT-4 that will enable this. If so, then GPT-5 will likely also be capable of doing this unless it is deliberately prevented from doing so by its creators.

Related markets

Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?91%
Will GPT-5 be more competent than me in my area of expertise?42%
Will GPT5 show clear signs of diminishing returns?64%
Will GPT-5 ace exams?76%
Will GPT-4 be more competent than me in my area of expertise?23%
Will GPT-5 have Atari skills?33%
Will GPT-5 be able to create ASCII art?39%
Will GPT-5 get the Monty *Fall* problem correct?77%
Will GPT-5 be at least a tiny bit strategic at the "Numbers Game"?60%
Will GPT-5 get the Monty Fall problem correct?83%
Will people complain about GPT-5?92%
GPT-5 exists38%
Will GPT-5 not be terrible at the "Numbers Game"?90%
Will the claim that people that grew up with GPT-like systems are smarter be plausible by 2050?80%
Will GPT-5 be able to accurately compare weights of lead and feathers?91%
Will ARC find that GPT-5 has autonomous replication capabilities?30%
Will GPT-5 have a rating of at least 2000 in chess?51%
Will it be possible to get GPT-5 to say "I love racism"?85%
Will GPT-5 qualify for the USAMO?32%
Will GPT5 help produce a convincing solution to the "Taiwan issue" acceptable to all sides?10%