Will Superalignment succeed? (self assessment)
413
7K
4.8K
2027
22%
chance

Superalignment is a new team at OpenAI attempting to solve the alignment problem within 4 years.

If the team believes they have succeeded in this goal of "solv[ing] the core technical challenges of superintelligence alignment in four years" by their own estimation by July 5th, 2027, this market will resolve YES. If the team dissolves, reorganizes, or pursues a separate research direction unlikely to lead to a solution to the alignment problem, this resolves NO.

Get Ṁ200 play money
Sort by:

self-assessment

I'd have to be some kind of very special fool to bet.

predicts NO

@Lorxus Why? You can just price it in. It's not like they're gonna decide how to self-eval based on their position in this market.

"If the team dissolves, reorganizes, or pursues a separate research direction unlikely to lead to a solution to the alignment problem, this resolves NO." What is the resolution if the team neither declares success nor makes big changes by July 5th, 2027 - ie if they say "what we're doing is good, we're just not done yet"?

bought Ṁ100 of YES

Beware new traders, this market is not about whether superalignment will succeed according to the goals they've set, but about whether the OpenAI team will call it a success.

bought Ṁ100 of YES

@firstuserhere Thanks @SG for the title change

People might be interested in a podcast interview I did with Jan Leike about the superalignment team and plan: https://axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html

predicts NO

I'm frankly astonished by the consensus of 20% of success, which seems ridiculously over-optimistic to me.

bought Ṁ1,000 of NO

I know Ilya Sutskever is widely regarded as a genius but "solving the core technical challenges of superintelligence alignment in four years" even by their own estimation, let's be serious.

predicts NO

@ersatz Always look at the resolution criteria: "If the team believes they have succeeded in this goal of "solv[ing] the core technical challenges of superintelligence alignment in four years" by their own estimation by July 5th, 2027, this market will resolve YES." Now it mostly depends on one's estimation of how honest the Superalignment team will be.

@NiplavYushtun And how easy it is to evaluate alignment failures.

predicts YES

alignment has been progressing at a rapid pace!

predicts NO

@Adam That graph's y axis is so wonky.

predicts YES

@parhizj yeah, I agree, also the label is incorrect because it represents alignment, not safety

predicts NO

@Adam I meant I can't tell the trend because the scale is all over the place

predicts YES

@parhizj The log scale is a standard tool scientists use in ploting data. https://en.wikipedia.org/wiki/Logarithmic_scale

predicts NO

@TeddyWeverka I know what log scale is. The problem is the spacing is even between major ticks.

predicts YES

@parhizj This is the nature of a log scale. Note how the gap between 1% and 2% is the same as the gap between 5% and 10%. Log scales have uniform spacing for each factor of 2.

bought Ṁ30 of NO

@Adam this diagram makes the extremely common mistake of conflating "AI Safety" with "preventing bad outputs from LLMs".

predicts YES

@StephenFowler "preventing bad output from llms" is alignment, though; if we can actually prevent bad output, then safety follows as a logical consequence (at least in llms; the approach may not generalize)

predicts NO

@Adam I think that would only be true for a very broad and incoherent definition of "bad output".

@Adam it is a subproblem, but performance on that metric alone is insufficient.

Consider trying to build a chess engine. Decreasing the frequency the engine outputs an illegal move does not guarantee your engine is good at winning games.

predicts YES

It sounds like you need a better definition of "bad output" 😉

bought Ṁ200 of NO

@Adam This is a joke, right?

predicts YES

@KabirKumar I mean, "defining bad output coherently so that we can prevent it" sounds like one of the more reasonable long-term approaches to alignment, at least to me. I certainly agree that it's difficult.

predicts NO

@Adam If I had a definition of "bad output" sufficient to prevent extinction then I would already have an aligned super-intelligence.

predicts YES

@MartinRandall Yes, that's the point!

predicts NO

@MartinRandall I think it's certainly possible to have a super-intelligence which is not aligned and has a perfect track record on any "output" metric. The issue is to what extent you're willing to extrapolate your superintelligence's track record of "no bad output" forward. (You should not do this). "Bad output" is a strong indicator of misalignment but "no bad output" is not a strong indicator of alignment. This is true even with a perfect metric and an indicator that perfectly measures that metric.

predicts YES

@Sailfish if the ai never does unaligned things, is it not then aligned? I guess a counterargument would be an ai that does nothing, but if "no output" is also qualified to be potentially "bad output" then...

predicts NO

@Adam "The AI will never do unaligned things" and "The AI has never done unaligned things" are quite different. Measuring output gives you strong information about one but not the other.

predicts YES

@Sailfish Yes, I agree. I was not saying "an ai that has never done unaligned things is aligned", I was saying "an ai that will never do unaligned things is aligned"

predicts YES

@Adam Proof that alignment is impossible... Yuddites will shut down development with this sophomoric reasoning.

predicts NO

@Sailfish If I had a perfect metric and measure I would filter all outputs through it and then utopia. Of course I would also filter human outputs through the same filter, as they are clearly mostly not aligned with humanity.