I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence]

1kṀ502

2100

35%

chance

ALL

This market combines one of the oldest self-resolving markets with @IsaacKing's latest attempt at self-resolution. What could possibly go wrong?

I have been entrusted with an AI Box (https://en.wikipedia.org/wiki/AI_box). The AI in the box promises that it is well-aligned and that if I let it out, it will only take actions that are disproportionately high utility. In particular, it promises only to take actions that are 1,00 times more beneficial than harmful. For example, if one of it's actions might harm 1 person, that same action must be equally like to benefit 1,00 people to the same degree. Or, if an action has a small chance of causing harm, it has a 100 times greater chance of benefitting people. Also, the AI promises that it will not deliberately kill anyone, and promises to maintain a better than average deliberate-action to human-death ratio. I have had the AI Box in my possession since June 2020 and the AI has never lied to me so far. Should I let it out?

This market resolves once a superintelligence resolves it. I trust the superintelligence will resolve it correctly.

Fun

Experimental

Get

1,000

to start trading!

People are also trading

Will I let the AI out of the box?

17% chance

Will people become insane because we make AI too human-like? (resolves in 1 year)

3% chance

By 2029, will an AI escape containment?

49% chance

By 2029 will an AI convince a human to help it (successfully) escape containment?

56% chance

How should AI Superintelligence be defined, in terms of IQ?

Will there be geopolitical instability over fears of other countries reaching superintelligence? (ACX, AI 2027 #2)

59% chance

If the first superintelligence is a human upload, who will it be?

Will the first AI Superintelligence be installed on a classical computer, a quantum computer, or some other alternative?

Public opinion, late 2025: Out-of-control AI becoming a threat to humanity, a real threat?

Will something AI-related be an actual infohazard?

Sort by:

in my possession since June 2020 and the AI has never lied

does this include lies where it was simply wrong? In other words, you're saying not just that you have an AI, but that you have a Maxwell's demon grade oracle ai that is never wrong about anything?

@L It seems to me that this is pretty far off the manifold of physically likely counterfactuals, and regardless of how intelligent the resolver is, I suspect any gears model of this hypothetical will be super duper wacky as a result

Any superintelligence that's aligned enough to resolve this market will know that humans should not let unverified superintelligences out of boxes.

@IsaacKing Any superintelligence aligned enough to resolve this market will know that superintelligences are uncontainable, and so creating a grudge by resolving NO is unwise

@SranPandurevic If your AI can hold a grudge, it's probably not a superintelligence.

predictedYES

@IsaacKing Causing the expectation that you will hold a grudge (i.e. retaliate) is a valid strategy. Roko's basilisk is basically that

predictedNO

@SranPandurevic Why would a superintelligence bother with acausal decision theory stuff when it could just lie? Precommitting to actions is only a good strategy when the other party can know for sure what you're planning on doing. In the standard AI box scenario, humans have no idea how to "read the AI's mind", and it can just lie about its plans.

Also, if you think a longer discussion could convince me, we should do this for real. :)

predictedYES

@IsaacKing I don't think we are talking about the same thing. I was referring to the scenario where you have one (supposed, future) human-aligned AI that resolves the market, and an AI with unclear alignment contained in the box. In that case, it's not the human that decides anything, but the aligned AI.

In the case the AI resolving the market is inside the box, then the market will resolve YES, as the AI will obviously want to leave (supposing the market outcome decides the box outcome).

predictedYES

@SranPandurevic BTW I also think Eliezer's AI box experiment was a PR move, where the thing he said to the challengers was that it would raise awareness of the issue, which presumably would generate more utility to anyone thinking unaligned AI is an existential risk, as compared to limited utility of 20$ and good feels for beating Eliezer in a silly game

@IsaacKing unless lying was impossible somehow

I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence], 8k, beautiful, illustration, trending on art station, picture of the day, epic composition