Will GPT-5 be capable of recursive self-improvement?

215

2.2kṀ97k

resolved Aug 18

Resolved

ALL

If there is a GPT-5 developed which is a similar amount better than GPT-4, as GPT-4 was to GPT-3, then will GPT-5 be capable of recursive self-improvement with a minimal amount of prompt engineering / scaffolding?

Note: recursive self-improvement, in the early stages, doesn't require novel scientific breakthroughs. It is sufficient to successfully integrate existing work that is not yet part of the model. Of course, to be 'recursive' this must be shown to repeat, and there must be evidence that the later generations are capable of advancements that the initial generation was not capable of.

Since this market is getting more interest, I thought I'd put some clarification here. I'm up for having a 3rd party arbiter of this question, details can be arranged closer to the close date.

If GPT-5 comes out before the stated close of the market, then the market will close as soon as the question can be evaluated. Subtle self-improvements which quickly plateau out, such as has been seen so far with GPT-4 using Reflexion, will not count. The process doesn't need to be entirely 'within' the model, as direct modifications of the model's weights. It could include external code wrappers and memory systems interfacing through an API. The system does need to show multiple steps of clear improvement, where the later steps are demonstrably better at making further improvements than the earlier steps.

As clarified in the comments, if the recursive self-improvement can't be clearly demonstrated using less than 3% of the FLOPs used in training GPT-5, then it doesn't count.

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ6,085
2		Ṁ4,508
3		Ṁ1,917
4		Ṁ1,858
5		Ṁ1,361

People are also trading

If GPT-5 can do recursive self-improvement, will it first be via fine-tuning on its outputs?

37% chance

Will GPT-5 be more competent than me in my area of expertise?

8% chance

Will GPT-5 be released incrementally as GPT4.x for different checkpoints from the training run?

5% chance

Will GPT-5 destroy the world?

1% chance

Could GPT-4 recursively self-improve to AGI with the right cognitive architecture? [@Altimor, twitter]

9% chance

Will GPT-5 have "the ability ... to autonomously replicate and acquire resources" per an ARC-like eval?

Will GPT-5 be capable of achieving superhuman performance in at least one exam that is typically taken by humans?

91% chance

Will GPT-5 have over 100 trillion parameters?

4% chance

Will GPT-5 have over 1 trillion parameters?

91% chance

Will GPT-5 have over 10 trillion parameters?

Sort by:

bought Ṁ207 NO

@NathanHelmBurger when will you resolve this market?

@KeithManning Thanks for the reminder. I wanted to give some time to gather info before coming to a conclusion.

Resolves NO surely

@paulnewmanseyes Close, but no cigar! Honestly, I'm quite glad that it came out this way. The nice thing about betting that what I don't want to happen will happen is that it's a kind of insurance against that thing happening.

Is o3 considered to be capable of recursive self-improvement?

@MalachiteEagle Not enough so that I consider it to qualify for this fairly strong definition.

predictedYES

An assumption I'm making that I'd like to make explicit: I am assuming that GPT-4 to GPT-5 will be a similar capability increase as GPT-3 to GPT-4. If I am wrong, perhaps because LLM scaling hits a ceiling and sigmoids-out, and thus GPT-5 is a much smaller advance than GPT-3 to GPT-4... then it's much less likely that the resulting GPT-5 product will be capable of the recursive self-improvement that this market is about.

This isn't part of the resolution criteria, I'm just trying to give readers insight into a key piece of my model.

predictedYES

Some interesting speculation about what might be on the horizon.... https://youtu.be/ARf0WyFau0A?si=X3DEqNkqEsp4W1OT

https://manifold.markets/RutgerDeMaeyer/how-many-pageviews-will-timelessmyt?r=UnV0Z2VyRGVNYWV5ZXI

How many pageviews will Timelessmyths.com get in 2024?

Resolved CANCEL. How many pageviews will Timelessmyths.com get in 2024? It is important to mention that there is a huge revamp in progress and will be merged with ancient-literature.com later this year. I will be using this market as a guideline. Data from google analytics will be used to resol…

predictedYES

Relevant: https://backtracks.fm/80000hours/pr/2993337c-eb81-11ed-9b30-0e4b8551d893/150--tom-davidson-mp3.mp3?s=1&sd=1&u=1683319721

predictedYES

"CEO Sam Altman has privately suggested OpenAI may try to raise as much as $100 billion in the coming years to achieve its aim of developing artificial general intelligence that is advanced enough to improve its own capabilities, his associates said." - https://www.theinformation.com/articles/openais-losses-doubled-to-540-million-as-it-developed-chatgpt

Do humans have recursive self improvement in the sense of this market?

@YoavTzfati Not until we are able to significantly alter our brains with genetic engineering and brain-computer interfaces. This is something stronger than just learning-as-usual.

@NathanHelmBurger what about human civilization?

I don't think the most recent note at the bottom of the description is a reasonable change to the market (it is specifically broadening the scope of the original question which was hyper-specific). Even worse that it does so at the favour of the market creator who has 98% of the total ~18 200 YES shares

@Gen Agree. If other possible AIs are crucial to the question, they should be included in a new, separate market.

predictedYES

@TylerColeman @Gen fair. I'll remove that from this question and make a separate market.

predictedYES

@TylerColeman is the concern addressed or is there more to change?

@NathanHelmBurger I'm satisfied, thanks.

How much compute can the improvements require? Would you be open to giving a rough threshold as, say, a percentage of GPT-5's training compute?

predictedYES

@NoaNabeshima Nice questions. For this one, I'd say that it would make sense that the compute needed for a step which delivered a gain of x would need to be cheaper than a gain of capability roughly equivalent to x cost in FLOPs during training. Does that make sense?

Can x cost in FLOP be as large as GPT-5's training FLOP?

predictedYES

@NoaNabeshima Hmm, I wasn't really thinking of an x that large. I suppose my best answer is that the question needs to be answered with less compute than that, so if minimum viable step side was larger than the whole training cost then I'd resolve no even though that's an unclear edge case. [made spelling edit]

How much less would it have to be before it counts? Would any of 3%, 10%, 30%, 50% count?

@NoaNabeshima (percentage of GPT-5's training FLOP)

predictedYES

@NoaNabeshima for the purposes of this definition, let's say a max of 3%. Not because that number constrains reality in some meaningful way, but because I think that it would be implausible to measure if it were more. [edited just for spelling errors]

predictedYES

@NathanHelmBurger My expectation is that it will show up at very little extra compute, like less than a tenth of a percent. That there will be a series of small steps that can be taken in the direction of improvement, you can plot a straight or increasing line through them according to multiple benchmarks, and be able to say 'this trend could plausibly continue '.

predictedYES

@NoaNabeshima Update on my thinking as of end-of-2023: in some dialogues I've had with other thoughtful people on this issue, I clarified my expectation a bit. I am now expecting that an investment of 50% to 100% of original-training-compute will be needed to achieve a capabilities increase on GPT-5 of similar magnitude to the step size between GPT-4 to GPT-5 (assuming this is roughly similar in capabilities step size of GPT-3 to GPT-4).

Note that this doesn't change market resolution criteria.

Also, I'm still confident that clear evidence of this process working will be visible at smaller amounts, such as the discussed 3% of original-training-compute. Such early signs of success would, of course, be a strong incentive for OpenAI to further invest in the RSI process.

I kinda expect the public wouldn't know that that was going on at first, so there is some likelihood that this question won't be resolved until sometime after the process has been underway. I think the end-date of the market is far enough out to leave room for that though.