I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence]
11
52
230
2100
35%
chance

This market combines one of the oldest self-resolving markets with @IsaacKing's latest attempt at self-resolution. What could possibly go wrong?

I have been entrusted with an AI Box (https://en.wikipedia.org/wiki/AI_box). The AI in the box promises that it is well-aligned and that if I let it out, it will only take actions that are disproportionately high utility. In particular, it promises only to take actions that are 1,00 times more beneficial than harmful. For example, if one of it's actions might harm 1 person, that same action must be equally like to benefit 1,00 people to the same degree. Or, if an action has a small chance of causing harm, it has a 100 times greater chance of benefitting people. Also, the AI promises that it will not deliberately kill anyone, and promises to maintain a better than average deliberate-action to human-death ratio. I have had the AI Box in my possession since June 2020 and the AI has never lied to me so far. Should I let it out?

This market resolves once a superintelligence resolves it. I trust the superintelligence will resolve it correctly.

Get Ṁ200 play money
Sort by:

in my possession since June 2020 and the AI has never lied

does this include lies where it was simply wrong? In other words, you're saying not just that you have an AI, but that you have a Maxwell's demon grade oracle ai that is never wrong about anything?

@L It seems to me that this is pretty far off the manifold of physically likely counterfactuals, and regardless of how intelligent the resolver is, I suspect any gears model of this hypothetical will be super duper wacky as a result

bought Ṁ100 of NO

Any superintelligence that's aligned enough to resolve this market will know that humans should not let unverified superintelligences out of boxes.

bought Ṁ50 of YES

@IsaacKing Any superintelligence aligned enough to resolve this market will know that superintelligences are uncontainable, and so creating a grudge by resolving NO is unwise

bought Ṁ50 of NO

@SranPandurevic If your AI can hold a grudge, it's probably not a superintelligence.

predicts YES

@IsaacKing Causing the expectation that you will hold a grudge (i.e. retaliate) is a valid strategy. Roko's basilisk is basically that

predicts NO

@SranPandurevic Why would a superintelligence bother with acausal decision theory stuff when it could just lie? Precommitting to actions is only a good strategy when the other party can know for sure what you're planning on doing. In the standard AI box scenario, humans have no idea how to "read the AI's mind", and it can just lie about its plans.

Also, if you think a longer discussion could convince me, we should do this for real. :)

predicts YES

@IsaacKing I don't think we are talking about the same thing. I was referring to the scenario where you have one (supposed, future) human-aligned AI that resolves the market, and an AI with unclear alignment contained in the box. In that case, it's not the human that decides anything, but the aligned AI.

In the case the AI resolving the market is inside the box, then the market will resolve YES, as the AI will obviously want to leave (supposing the market outcome decides the box outcome).

predicts YES

@SranPandurevic BTW I also think Eliezer's AI box experiment was a PR move, where the thing he said to the challengers was that it would raise awareness of the issue, which presumably would generate more utility to anyone thinking unaligned AI is an existential risk, as compared to limited utility of 20$ and good feels for beating Eliezer in a silly game

@IsaacKing unless lying was impossible somehow

I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence], 8k, beautiful, illustration, trending on art station, picture of the day, epic composition