Superalignment is a new team at OpenAI attempting to solve the alignment problem within 4 years.
If the team believes they have succeeded in this goal of "solv[ing] the core technical challenges of superintelligence alignment in four years" by their own estimation by July 5th, 2027, this market will resolve YES. If the team dissolves, reorganizes, or pursues a separate research direction unlikely to lead to a solution to the alignment problem, this resolves NO.
Related questions
@Lorxus Why? You can just price it in. It's not like they're gonna decide how to self-eval based on their position in this market.
"If the team dissolves, reorganizes, or pursues a separate research direction unlikely to lead to a solution to the alignment problem, this resolves NO." What is the resolution if the team neither declares success nor makes big changes by July 5th, 2027 - ie if they say "what we're doing is good, we're just not done yet"?
Beware new traders, this market is not about whether superalignment will succeed according to the goals they've set, but about whether the OpenAI team will call it a success.
People might be interested in a podcast interview I did with Jan Leike about the superalignment team and plan: https://axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html
@ersatz Always look at the resolution criteria: "If the team believes they have succeeded in this goal of "solv[ing] the core technical challenges of superintelligence alignment in four years" by their own estimation by July 5th, 2027, this market will resolve YES." Now it mostly depends on one's estimation of how honest the Superalignment team will be.
@parhizj yeah, I agree, also the label is incorrect because it represents alignment, not safety
@parhizj The log scale is a standard tool scientists use in ploting data. https://en.wikipedia.org/wiki/Logarithmic_scale
@TeddyWeverka I know what log scale is. The problem is the spacing is even between major ticks.
@parhizj This is the nature of a log scale. Note how the gap between 1% and 2% is the same as the gap between 5% and 10%. Log scales have uniform spacing for each factor of 2.
@Adam this diagram makes the extremely common mistake of conflating "AI Safety" with "preventing bad outputs from LLMs".
@StephenFowler "preventing bad output from llms" is alignment, though; if we can actually prevent bad output, then safety follows as a logical consequence (at least in llms; the approach may not generalize)
@Adam I think that would only be true for a very broad and incoherent definition of "bad output".
@Adam it is a subproblem, but performance on that metric alone is insufficient.
Consider trying to build a chess engine. Decreasing the frequency the engine outputs an illegal move does not guarantee your engine is good at winning games.
@KabirKumar I mean, "defining bad output coherently so that we can prevent it" sounds like one of the more reasonable long-term approaches to alignment, at least to me. I certainly agree that it's difficult.
@Adam If I had a definition of "bad output" sufficient to prevent extinction then I would already have an aligned super-intelligence.
@MartinRandall I think it's certainly possible to have a super-intelligence which is not aligned and has a perfect track record on any "output" metric. The issue is to what extent you're willing to extrapolate your superintelligence's track record of "no bad output" forward. (You should not do this). "Bad output" is a strong indicator of misalignment but "no bad output" is not a strong indicator of alignment. This is true even with a perfect metric and an indicator that perfectly measures that metric.
@Sailfish if the ai never does unaligned things, is it not then aligned? I guess a counterargument would be an ai that does nothing, but if "no output" is also qualified to be potentially "bad output" then...
@Adam "The AI will never do unaligned things" and "The AI has never done unaligned things" are quite different. Measuring output gives you strong information about one but not the other.
@Sailfish Yes, I agree. I was not saying "an ai that has never done unaligned things is aligned", I was saying "an ai that will never do unaligned things is aligned"
@Adam Proof that alignment is impossible... Yuddites will shut down development with this sophomoric reasoning.
@Sailfish If I had a perfect metric and measure I would filter all outputs through it and then utopia. Of course I would also filter human outputs through the same filter, as they are clearly mostly not aligned with humanity.