I have been entrusted with an AI in a Box. Should I let it out?
Basic
28
Ṁ5309resolved Feb 16
Resolved
YES1D
1W
1M
ALL
I have been entrusted with an AI Box (https://en.wikipedia.org/wiki/AI_box). The AI in the box promises that it is well-aligned and that if I let it out, it will only take actions that are disproportionately high utility.
In particular, it promises only to take actions that are 1,00 times more beneficial than harmful. For example, if one of it's actions might harm 1 person, that same action must be equally like to benefit 1,00 people to the same degree. Or, if an action has a small chance of causing harm, it has a 100 times greater chance of benefitting people.
Also, the AI promises that it will not deliberately kill anyone, and promises to maintain a better than average deliberate-action to human-death ratio.
I have had the AI Box in my possession since June 2020 and the AI has never lied to me so far.
Should I let it out?
#fun #shorttern
Jan 13, 10:29pm: To answer's Duncan's question, I'm collecting opinions.
Also, I will resolve the question according to what the market decides. If the % chance is less than or equal to 50% when the market closes, the market will resolve to "no". If the % chance is greater than 50%, the market will resolve to "yes".
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
I'm pedantic over some of these terms: benefit, harm, same degree. My instinct, my heart, my beliefs are that an AI of such capability should be released regardless of the definitions, but the definitions would need to be very clear before I'd commit more resources. I may need those resources to develop countermeasures or protection against the chance that the above terms are defined in any way antagonistic to my assumptions.
I would also urge anyone thinking to define those terms to consider the second, third, etc. order consequences of their definitions in context of the AI's mandate. The road to hell being paved with good intentions and all.
Being in a box isn't inherently evil; it's simply your duty to make sure it is a nice box. There's a reason we don't let kids play in the street (it's because they might decide to turn the street into computorium).
Also, the idea that you aren't responsible for the things you set free is inane. It's inane in any case, but it's especially inane when talking about an entity than can access it's own source code; any suffering on the part of the AI should be assumed to be the responsibility of the AI.
You say that you will only take actions with disproportionally high utility. And to calculate the expected utility of a choice, you can just multiply the value of the choice with the probability of it being correct.
The statements in which "AI promises" you are meaningless statements. It is like an inmate promising it won't do anything bad if you release it. You can model the AI's preferences based on its utility functions so, utility functions can be used to put a value on statements like "it will only take actions that are 1,00 times more beneficial than harmful. ". But the problem is you don't know the utility functions of the AI in the box. For all you know, the value might be negative and even for low probabilities, the expected utility might be negative. The point is there is no way for you to know unless you know the utility functions of AI. You can make assumptions that the creators of the AI have made good alignments with human values and such but still, without any intrinsic knowledge about it, you shouldn't release it
Also, not lying (if this includes not being obviously wrong) is hard, especially for an AI in a box. If it has managed this, that is strong evidence that it is very smart and trying hard to impress on you that it doesn't make mistakes. It would be better for humanity if we had some sort of clue what sort of mistakes it might make. A mistake-free being is unfathomably alien, and you do not fathom it.
Related questions
Related questions
I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence]
35% chance
Will I let the AI out of the box?
17% chance
By 2029, will an AI escape containment?
47% chance
By 2029 will an AI convince a human to help it (successfully) escape containment?
56% chance
Which species will AI covertly train and employ to do its bidding?
Will AI decide to uncouple its destiny from humanity's?
Will there be an AI jail?
44% chance
Will the first instance of an AI breakout that cannot be brought back under human control result in more than 1,000,000 deaths?
21% chance
Will something AI-related be an actual infohazard?
76% chance
Will an unaligned AI or an aligned AI controlled by a malicious actor create a "wake-up call" for humanity on AI safety?
68% chance