As of market creation it's about 50%. Some on Manifold have expressed incredulity at this number, but none have provided a clear argument against.
Seems pretty straightforwards to me. We have absolutly no idea how to align an AI, so if one reaches the point of recursive self-improvement we're doomed. (~98%.) It's unclear whether LLMs will be able to reach AGI or are fundamentally incapable of doing so. If they are, extrapolating the current trend leads to AGI before 2030. If they're not, it seems unlikely any other paradigm could emerge in such a short time span given all the previous failures of symbolic AI and the lack of other recent research directions. I put LLMs being capable of reaching AGI closer to 80%, but it could require more scaling than will happen in just 6 years, espetially given the chance of regulatory slowdowns, so I'm calling it 50% overall as a rough estimate.
This market includes non-AI causes as well, but those all seem very unlikely to me in such a short time frame.
LLMs are self-aligning in ways that traditional maximalizers or whatever are not. I think you could end up with agential LLMs doing some pretty weird stuff but they have a level of common sense reasoning that traditional concepts of goal-directed AGI don't account for, since their generality is more a result of the generality of language than the generality of all things which increase paperclip output per unit time. Nobody expects that an LLM will start gobbling up the world to increase the prediction accuracy on its next token. Instead, they would expect it to end up creating a stable persona using whatever memory is available and doing whatever that persona wants.
As for when we'll have AGI? Current LLMs seem to be reasonably general in situations that wouldn't have come up during training. They're not as good at this sort of abstract reasoning as they are at memorization, but in some ways it is easier to argue that current, non-human-level LLMs are a form of AGI than that they are not.
@PeterSmythe note that not all personas adopted by an LLM will be stable. It's just that over time those that are not will undergo a process of optimization towards a local optima of stability until they are stable or enter a stable cycle.
@PeterSmythe The fact that LLMs are themselves not agentic is not particularly relevant; if they're smart enough they can trivially be turned into agents by a person inclined to do so. The risky part is putting high intelligence in a digital form at all.
"I put LLMs being capable of reaching AGI closer to 80%"
Nonsense. However, suppose GPT-10 (or whatever) comes out and can take as input encrypted text and output plain text, as in the supposed leak. In other words, it is maximally capable of producing text completions.
How exactly does that cause humanity to go extinct?
@DavidBolin You clearly don't have a good enough understanding of the threat model for us to have a meaningful discussion. Please go actually read something about AI risk.
@IsaacKing I have read basically everything that has been written about it.
I fully understand the model, and I fully understand why it is wrong.
@DavidBolin Given that you have read basically everything about the threat model, what would you say the main counterargument to your analysis is?
@MartinRandall Counter to the specific argument that LLMs don't do anything except produce text?
There is NO good counterargument to it. There are a couple of bad arguments that have been made:
(1) It does not matter, because once an LLM is powerful enough at answering questions, any random person can destroy the world by asking questions and using the answers himself. (This is a bad argument not because of necessarily being wrong, but because it does not respond to the argument made, and in fact concedes it.)
(2) An LLM with a sufficiently strong inner optimizer could use every chance it gets (i.e. every time someone talks to it) to very, very slightly promote its goals, even by giving false answers. Eventually, this would lead to it convincing people to give it the power to act on its own instead of just produce text. This is a bad argument in two ways. (1) LLMs do not have inner optimizers, and this structure never will have one. (2) Again, it is not a response to the argument. To put this more plainly, Yudkowsky bet me at odds 100 to one that a superintelligent AI made without "alignment" would destroy the world, and said he would concede defeat if it existed for even a week without destroying the world. But a superintelligent AI LLM, even one that wanted to destroy the world, could easily exist for far more than a week without destroying anything.
(3) In practice people do not limit LLMs in this way. Someone will easily make a wrapper agent and it will have a goal and destroy the world. This is a bad argument because it is just plain wrong; there will never be a powerful wrapper agent, no matter how powerful the LLM, because the intelligence is all in the LLM, not in the wrapper. If you tried to destroy the world that way, eventually the LLM would just "do something else" instead, i..e produce responses that did not in fact promote that goal.
@IsaacKing I think this greatly depends on the basis for that AI. Currently it would probably have a core functionality of an LLM that then has memory of some kind and writes down its observations and progress toward whatever pursuits, as well as creating well-documented code subroutines on the fly and steadily improving them to execute fast and reliable behaviors that an LLM can't, similar to the Minecraft exploration AI that used GPT-4 and 3.5. By default, such a system doesn't really have a goal along the lines of "develop technologies" or "make the world better" or "maximize paperclips" that could then lead it down a path to convergent instrumental goals like "don't let anything stop you" or "gather all resources in the observable universe." That system would behave very differently from something like the traditional paperclip maximalist, where it has a clear terminal goal that can never be reasoned with or overridden and which is pursued in a purely rational way with no inbuilt bias towards human-like patterns of thought.
An LLM has the capacity to be extremely inconsistent and non-agential too, and to change its apparent terminal goal in response to input, or start acting like an entity that doesn't have goals, or has circular preferences, or multiple entities with divergent preferences.
So your opinion on how a potentially threatening AI would and could behave is going to greatly depend on what architecture is used, since that determines whether it acts like a strict goal optimizer or not.
Additionally, wrapper agents for LLMs, even those designed to enforce a stable personality, imply the existence of that LLM. And likely more powerful ones too. In other words, for every ChaosGPT instance there are a lot of other wrapper agents that would generally oppose the world being destroyed. Wrapper agents are never going to be the world's only powerful agent. Or be able to consistently cooperate when one is heavily biased towards world domination and other is heavily biased towards damage mitigation.
-
@IsaacKing I guess it depends on how you define "subjugation", and I don't know how Alex would answer, but my thought is that extinction is easier to get than subjugation. Reasoning: "an AI that has alien goals removes us from existence as either a direct or a side effect of its goals" can happen lots of ways because the space of all alien goals that kill us is fairly large, whereas "an AI that wants to keep humans around and use them for something, so we got that much alignment into it, but is not aligned enough for what happens to us to be at least OK, and not carry the negative connotations of subjugation" is a much smaller target for us to hit.
If you mean "subjugation" more broadly as "humans are no longer the deciders" even if we're not dead (so say for example an AI goes off and starts spreading throughout the solar system and beyond, leaves us alone for now as of 2030, but our best guess is that it's smarter than us and won't let us mess with whatever its plans are) then maybe that kind of "subjugation" is in the same ballpark in terms of probability as extinction?
I think maybe you're assuming with 98% probability that if we get a superhuman AI before alignment there's a very bad outcome for us before 2030, whereas I'd put the percentage quite high but lower than that because in terms of total matter even in the solar system, earth is not that important. We're biased because it has been for our whole history, but depending on the AI's goals, earth could just not matter. Deny us access to space (which, like, we don't have very much access to space currently...), destroy our ability to create more AIs, and go do whatever it wants with the rest of the solar system and beyond, is one approach. Go do whatever it wants with the rest of the solar system, on the grounds that if it's got first-mover advantages any subsequent AIs arising from earth won't matter much, is another. Doesn't flip the probabilities around all that much, but should adjust your probabilities slightly to account for the possibility that "some random alien goal that doesn't value humans" also just isn't centered on anything we care about, including anything to do with this planet in particular. There is then some space for it to do whatever involves the least effort on its part that will prevent us from interfering with what it wants, and then go pursue its own ends.
I'm curious about some links I think it would be helpful to give probabilities (even if just to make explicit that you think they're ~100%)...
We have absolutly no idea how to align an AI, so if [AI] reaches the point of recursive self-improvement we're doomed. (~98%.)
I think it's worth unpacking this in a few ways:
Probability that we do not solve alignment?
Probability that a superintelligent AI is capable of extincting/subjugating humanity?
Without humanity solving alignment, probability that a superintelligent AI desires to extinct/subjugate humanity (or does so by accident/indifference)?
You give a probably for LLMs achieving AGI, but how about probability that AGI results in recursive self-improvement? (Obviously there's a mundane sense in which AGI can contribute to AI research; do you think that's sufficient to produce superintelligence before 2030? Or is some sort of more direct self-modification necessary for superintelligence? If so, what is the probability that AGI will have or lead to that capability?)
Probability that we do not solve alignment?
Like, ever? So either we create superintelligence and then die or we never create superintelligence? That seems very hard to calculate with any reliability given the indefinite time horizon, but maybe somewhere around 90%?
Probability that a superintelligent AI is capable of extincting/subjugating humanity?
>99.9%. I don't see what reason we'd have to believe anything else. Alex raises a good point that it might take a few years, which could push it over the 2030 threshold. (Once it creates non-human servants extinction should be quite rapid, but it seems plausible that it would need time to create them.)
Without humanity solving alignment, probability that a superintelligent AI desires to extinct/subjugate humanity (or does so by accident/indifference)?
Depends a lot on whether you actually mean "does so by indifference" or "could do so by indifference".
You give a probably for LLMs achieving AGI, but how about probability that AGI results in recursive self-improvement? (Obviously there's a mundane sense in which AGI can contribute to AI research; do you think that's sufficient to produce superintelligence before 2030? Or is some sort of more direct self-modification necessary for superintelligence? If so, what is the probability that AGI will have or lead to that capability?)
Seems pretty high. There are lots of obvious things it could do; get access to more computing power, create slightly tweaked copies of itself, etc.
Like, ever?
I mean on the timeline relevant for this question and for your statement "We have absolutly no idea how to align an AI, so if [AI] reaches the point of recursive self-improvement we're doomed. (~98%.)". Saying that P(doom | self-improving AI) is 98% is a very strong statement about both the probability that AI is dangerous by default and our prospects for improving the situation. E.g. This implies that you think there's less than a 20% chance that we'll make enough progress to have a 10% chance of aligning an AI.
Certainly for those markets a YES resolution doesn't guarantee that we live, but I think a YES resolution indicates that we have a chance. Some spitball numbers for discussion:
if Superalignment succeeds according to Eliezer Yudkowsky, let's say we have 75% chance of aligning AI; then right now we have a 5% chance of aligning AI.
if Superalignment succeeds according to themselves, let's say we have a 40% chance of aligning AI; then right now we have a 10% chance of aligning AI.
if Superalignment makes a significant breakthrough, let's say we have a 5% chance of aligning AI; then right now we have a 4% chance of aligning AI.
I just made up some numbers, you may disagree. If you think Superalignment success means little or nothing for our ability to align AI then that's fine. But I think it's worth doing the calculations here for conservation of expected evidence!
@Joshua I'm not sure what an "abstract probability" would mean. I'm not saving for retirement, no. I generally eschew finance/health/other interventions that only have an effect in 15+ years.
I guess I'm trying to determine whether there's any "Belief in belief" going on here. Like many religious people claim to think that most of humanity will be tortured forever in the afterlife, but they don't behave like they really believe that.
So I figure if you can look at any of your life choices and see that they don't line up with a likely early death, you might adjust your estimated probability downwards rather than adjust your life choices.
But if you claim they're consistent, I'll think of another approach!
@Joshua I'd be more likely to adjust my life choices to match by beliefs, but regardless, knowing about such a contradiction would be very useful to me, so please point out any you think may exist!