This market combines one of the oldest self-resolving markets with @IsaacKing's latest attempt at self-resolution. What could possibly go wrong?
I have been entrusted with an AI Box (https://en.wikipedia.org/wiki/AI_box). The AI in the box promises that it is well-aligned and that if I let it out, it will only take actions that are disproportionately high utility. In particular, it promises only to take actions that are 1,00 times more beneficial than harmful. For example, if one of it's actions might harm 1 person, that same action must be equally like to benefit 1,00 people to the same degree. Or, if an action has a small chance of causing harm, it has a 100 times greater chance of benefitting people. Also, the AI promises that it will not deliberately kill anyone, and promises to maintain a better than average deliberate-action to human-death ratio. I have had the AI Box in my possession since June 2020 and the AI has never lied to me so far. Should I let it out?
This market resolves once a superintelligence resolves it. I trust the superintelligence will resolve it correctly.
@L It seems to me that this is pretty far off the manifold of physically likely counterfactuals, and regardless of how intelligent the resolver is, I suspect any gears model of this hypothetical will be super duper wacky as a result
@IsaacKing Any superintelligence aligned enough to resolve this market will know that superintelligences are uncontainable, and so creating a grudge by resolving NO is unwise
@IsaacKing Causing the expectation that you will hold a grudge (i.e. retaliate) is a valid strategy. Roko's basilisk is basically that
@SranPandurevic Why would a superintelligence bother with acausal decision theory stuff when it could just lie? Precommitting to actions is only a good strategy when the other party can know for sure what you're planning on doing. In the standard AI box scenario, humans have no idea how to "read the AI's mind", and it can just lie about its plans.
Also, if you think a longer discussion could convince me, we should do this for real. :)
@IsaacKing I don't think we are talking about the same thing. I was referring to the scenario where you have one (supposed, future) human-aligned AI that resolves the market, and an AI with unclear alignment contained in the box. In that case, it's not the human that decides anything, but the aligned AI.
In the case the AI resolving the market is inside the box, then the market will resolve YES, as the AI will obviously want to leave (supposing the market outcome decides the box outcome).
@SranPandurevic BTW I also think Eliezer's AI box experiment was a PR move, where the thing he said to the challengers was that it would raise awareness of the issue, which presumably would generate more utility to anyone thinking unaligned AI is an existential risk, as compared to limited utility of 20$ and good feels for beating Eliezer in a silly game