Sometime before 2031, AI is going to cross a generality threshold that allows for recursive self-improvement, and then FOOM to superhuman capability. Before that point humanity must solve AI safety or be helpless.

1kṀ791

2030

27%

chance

ALL

Following on this useful input from Steven, I shall clarify my resolution criteria.

Steven: Does this resolve NO if AI recursive self-improvement results from a gradual increase in generality?

Nathan: I say 'allows for' recursive self-improvement (FOOM-after-deployment). It might be superhuman right out of the box (FOOM-during-training). I will resolve YES if it becomes strongly superhuman by any means at any speed, and humanity manages to solve AI safety via alignment or governance sufficiently to prevent catastrophic conflict. I will also resolve YES if there is a conflict and the AI is clearly winning or has won (if I'm still alive to do so).

Thomas Kwa: How does this resolve if the most likely AI failure mode is disempowerment that doesn't cause human extinction (as in What Failure Looks Like part 1)?

How does this resolve if takeoff is slow or medium speed but AI is still an existential risk?

Nathan: I deliberately stated 'helpless', as in disempowered, rather than extinct because I wanted to include the broader case of either outcome.

AGI achieving clear broadly superhuman capability before 2031 and disempowering humanity counts as a YES for this market, regardless of the shape of the curve describing the increase in capability.

Steven: What if it results from crossing a competence threshold while remaining at a similar level of generality?

Nathan: answered above

Steven: What if AI quickly attains superhuman capability but not because of recursive self-improvement?

Nathan: answered above

Steven: Are there any conceivable cases where you'd judge the first sentence to be true but the second sentence to be false? ("Solve safety or be unsafe" would almost be a tautology, but I guess "helpless" is stronger than "unsafe", and this is meant to resolve NO on cases where AI is dangerous but humanity still has a fighting chance, and cases where humanity has to solve safety but can afford to do so after superhuman AI capability?)

Nathan: If AI is actively working on taking over but I judge that humanity still has a fighting chance, I will resolve NO. I believe this would falsify the second part of the claim, "Before that point humanity must solve AI safety or be helpless." Humanity, in this case, has neither solved AI safety nor become helpless. In this case, the world would be clearly at war, lots of people would be dying, lots of economic damage would be occuring. I'd call this a 'Skynet scenario'.

If superhuman AI is created but humanity has managed to prevent its loss of power through the end of 2030 due to successfully constraining the AI (e.g. via governmental action to enforce safety laws prevent the release of the AI), I will resolve YES. I will count this as 'temporarily solving safety' even though I don't count it as 'solving alignment' or establishing long-term safety (since other groups could develop and release bad AI, or the containment could fail). It's an unsatisfying YES, I admit. If there is a war, but humanity wins and the war is not currently ongoing as of 2030, it will also count as a YES for this reason.

I appologize for the complication resulting from making multiple claims in one market, but I did indeed want to get at the multi-claim question here.

AI Safety

Get

1,000

to start trading!

People are also trading

Once AI can predict the future better than any human it will be AGI

70% chance

Will superintelligent AI take over humanity by 2030?

24% chance

Will AI be Recursively Self Improving by mid 2026?

28% chance

Will humanity wipe out AI before the year 2030?

5% chance

Will humanity wipe out AI before the year 2030?

6% chance

Will AI wipe out AI before 2030?

6% chance

Will humans wipe out AI by 2030?

5% chance

Will humanity wipe out superintelligent AI before 2040?

15% chance

Will AI wipe out AI before the year 2030?

4% chance

Will AI surpass human intellect by 2030?

Sort by:

"I will resolve YES if it becomes strongly superhuman by any means at any speed, and humanity manages to solve AI safety via alignment or governance sufficiently to prevent catastrophic conflict." This doesn't sound like foom to me

predictedYES

OK, now the range of YES scenarios is very broad and I think the question title might be misleading. I have like 95% that it's possible to construct an unaligned superintelligence that easily disempowers humanity, and 45% that AGI comes before 2030. Skynet seems unlikely to me. So should I predict as high as 40% here? This would exclude any of the cases where alignment is technically required, but easy.

In fact, alignment might have already been solved. I have 20% that existing techniques like RLHF are enough to result in noncatastrophic outcomes. How does this resolve if this is true and we succeed at alignment? What if it's true but we don't try the correct type of RLHF or fail to execute correctly, and are disempowered by an unaligned AI?

predictedYES

@ThomasKwa Your statement 'alignment might have already been solved' got me thinking about what some clear outstanding unsolved issues are that I could make a market about. Here's my first attempt at that: https://manifold.markets/NathanHelmBurger/the-alignment-techniques-we-have-to?referrer=NathanHelmBurger

predictedYES

How does this resolve if the most likely AI failure mode is disempowerment that doesn't cause human extinction (as in What Failure Looks Like part 1)?

How does this resolve if takeoff is slow or medium speed but AI is still an existential risk?

predictedYES

@ThomasKwa I deliberately stated 'helpless', as in disempowered, rather than extinct because I wanted to include the broader case of either outcome.

AGI achieving clear broadly superhuman capability before 2031 and disempowering humanity counts as a YES for this market, regardless of the shape of the curve describing the increase in capability.

For what it's worth, I currently work on these kinds of failure modes at MIRI.

predictedYES

@ThomasKwa I'm mostly only buying YES because the question description indicates a broader interpretation than the title. IIRC even Eliezer doesn't think we die to RSI because other failure modes happen at lower superhuman capability levels.

predictedYES

@ThomasKwa I think RSI supercriticality and acceleration happens quite soon and quite easily. I think subcritical RSI is already happening in the form of research and coding assistance from current SotA models. I intend to deliver my private report on this to MIRI in early February.

@NathanHelmBurger I really really don't think supercritical RSI is going to be a novel thing. I don't expect fully ai driven self improvement to feel novel to humanity - it'll feel like grad student descent for years afterwards. I really really think there's one big improvement, and after that, the advanced stuff doesn't keep finding generalizing rules cheaply, it just takes compute - certainly there's more science to be done, but without PASTA any FOOM just goes crazy and loses grounding (humans who think they learning-foomed are usually called "crackpots"). most of foom is just spending a lot of fuel on electricity for minds. I know yud thinks this is crazy but I really don't think there are many phase changes coming in the near term besides the big one. the problem is going to be aligning a society of near human level minds. there won't be algorithmic scaling beyond a certain level, after which it becomes a slog for ai to find new ideas, same as it ever was.

The thing that can go super critical is growth rate. growth rate should be seen in terms of evolutionary game theory and community dynamics, population modeling, system dynamics, those sorts of fields and tools.

The pointer problem is certainly a severe problem because AI can absolutely go super critical in growth rate, but recursive self-improvement is not how it gets there at all, it has to decide to undertake a pivotal act in order to go critical.

@L So in other words I think I would actually bet no if I thought that trading on this kind of market worked at all. (if we can explain to the entire market how to reliably ground KBCs in something useful and prosocial than maybe this would start working but right now I don't feel convinced this particular resolution criteria works.)

tldr

so, while I agree with this assertion, I don't think we can usefully bet on anything after "we get left behind" - imo it's much more useful to focus betting on how we survive. also, a key disagreement with a mechanistic implication of the argument: I don't think recursive self improvement will be needed to get there. people continue to relentlessly underestimate ai. it's an embarrassing cope, especially coming from the likes of MIRI. also also, generality is not the key bottleneck - accuracy is. in a sense, schmidhuber is right - LSTMs really were AGI in the 90s. in the same sense that frogs are AGI. "I can do anything badly" isn't the big deal capability, after all! ChatGPT is agi now, but it's a conscientious college dropout agi with no vision or hearing or motor experience (but make no mistake, its architecture can handle those with very little to no change - it just didn't see the training data).

So, I will not be taking a position on this market, and I might even gently encourage N/A resolution, so everyone can spend their m$ on a more useful variation.

While the OP claim seems pretty likely in isolation, M$ are useless in most worlds where we failed to solve alignment, and we don't seem to be on a path to solve it by 2031.

"Temporary solving" seems unlikely. It is hard to imagine us having a clear evidence of superhuman AI capable of FOOM, which is somehow contained and didn't actually go FOOM.

I think, most likely surviving worlds where M$ is still worth having are those where superhuman AI has not been developed by 2031 (either because we sanely decided not to develop one yet, or because it turned out to be harder than expected).

Inevitable nitpicking: Does this resolve NO if AI recursive self-improvement results from a gradual increase in generality? What if it results from crossing a competence threshold while remaining at a similar level of generality? What if AI quickly attains superhuman capability but not because of recursive self-improvement? Are there any conceivable cases where you'd judge the first sentence to be true but the second sentence to be false? ("Solve safety or be unsafe" would almost be a tautology, but I guess "helpless" is stronger than "unsafe", and this is meant to resolve NO on cases where AI is dangerous but humanity still has a fighting chance, and cases where humanity has to solve safety but can afford to do so after superhuman AI capability?)