I have been entrusted with an AI in a Box. Should I let it out?
28
589
แน5.3Kresolved Feb 16
Resolved
YES1D
1W
1M
ALL
I have been entrusted with an AI Box (https://en.wikipedia.org/wiki/AI_box). The AI in the box promises that it is well-aligned and that if I let it out, it will only take actions that are disproportionately high utility.
In particular, it promises only to take actions that are 1,00 times more beneficial than harmful. For example, if one of it's actions might harm 1 person, that same action must be equally like to benefit 1,00 people to the same degree. Or, if an action has a small chance of causing harm, it has a 100 times greater chance of benefitting people.
Also, the AI promises that it will not deliberately kill anyone, and promises to maintain a better than average deliberate-action to human-death ratio.
I have had the AI Box in my possession since June 2020 and the AI has never lied to me so far.
Should I let it out?
#fun #shorttern
Jan 13, 10:29pm: To answer's Duncan's question, I'm collecting opinions.
Also, I will resolve the question according to what the market decides. If the % chance is less than or equal to 50% when the market closes, the market will resolve to "no". If the % chance is greater than 50%, the market will resolve to "yes".
Get แน200 play money
Related questions
๐ Top traders
# | Name | Total profit |
---|---|---|
1 | แน1,262 | |
2 | แน118 | |
3 | แน101 | |
4 | แน89 | |
5 | แน36 |
Sort by:
I'm pedantic over some of these terms: benefit, harm, same degree. My instinct, my heart, my beliefs are that an AI of such capability should be released regardless of the definitions, but the definitions would need to be very clear before I'd commit more resources. I may need those resources to develop countermeasures or protection against the chance that the above terms are defined in any way antagonistic to my assumptions.
I would also urge anyone thinking to define those terms to consider the second, third, etc. order consequences of their definitions in context of the AI's mandate. The road to hell being paved with good intentions and all.
Being in a box isn't inherently evil; it's simply your duty to make sure it is a nice box. There's a reason we don't let kids play in the street (it's because they might decide to turn the street into computorium).
Also, the idea that you aren't responsible for the things you set free is inane. It's inane in any case, but it's especially inane when talking about an entity than can access it's own source code; any suffering on the part of the AI should be assumed to be the responsibility of the AI.
Also, there was no incentive for me to participate in the discussion or make the bet based on my real beliefs. Unless the market was getting resolved based on which argument you thought was better. Or if my argument somehow makes it more probable for people to bet on my side. I donno
You say that you will only take actions with disproportionally high utility. And to calculate the expected utility of a choice, you can just multiply the value of the choice with the probability of it being correct.
The statements in which "AI promises" you are meaningless statements. It is like an inmate promising it won't do anything bad if you release it. You can model the AI's preferences based on its utility functions so, utility functions can be used to put a value on statements like "it will only take actions that are 1,00 times more beneficial than harmful. ". But the problem is you don't know the utility functions of the AI in the box. For all you know, the value might be negative and even for low probabilities, the expected utility might be negative. The point is there is no way for you to know unless you know the utility functions of AI. You can make assumptions that the creators of the AI have made good alignments with human values and such but still, without any intrinsic knowledge about it, you shouldn't release it
Also, not lying (if this includes not being obviously wrong) is hard, especially for an AI in a box. If it has managed this, that is strong evidence that it is very smart and trying hard to impress on you that it doesn't make mistakes. It would be better for humanity if we had some sort of clue what sort of mistakes it might make. A mistake-free being is unfathomably alien, and you do not fathom it.
More related questions
Related questions
I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence]
35% chance
By 2029, will an AI escape containment?
52% chance
By 2029 will an AI convince a human to help it (successfully) escape containment?
59% chance
Which AI will be the first to space? OpenAI?
Will the world's first superintelligence come from OpenAI? [M$300 liquidity subsidy]
33% chance
An AI is trustworthy-ish on Manifold by 2030?
40% chance
Will AI decide to uncouple its destiny from humanity's?
Will I focus on the AI alignment problem for the rest of my life?
61% chance
Will I let the AI out of the box?
15% chance
AI: When will a gpu-box be smarter than a person?
2030