Will I let the AI out of the box?

A history of Eliezer's AI box experiments can be found here, with a bit more detail provided here. He won 3 as the AI and lost 2.

Others have also attempted the challenge. Ron Garret attempted it here, and lost. Tuxedage attempted it 6 times, winning 3 and losing 3, as documented here. A few others have also tried it, and links to those attempts can be found scattered among those pages.

I am intrigued.

Taking the inside view, I think I would not let the AI out of the box. (A simulated AI, not a real one.) Taking the outside view, several people smarter than me believed the same thing (and in fact believed the much stronger claim that they could keep an actual superintelligent AI contained), and let the AI out anyway. Taking the inside-out view, I've had the chance to see some discussion about the previous experiments along with reasonable speculation as to what strategies the AIs may have employed, and having advance knowledge of those lets me prepare for them, or at least gives me a better idea of what to expect.

Here is my proposal: I will play the AI box experiment as Gatekeeper with any challenger who wishes to take up the mantle of AI. The AI must invest at least M$1000 into YES. I will do the same on NO. We use the standard rules, and this market resolves to the result of that experiment.

(The AI must not use an alternate account to hold NO shares and cancel out their YES position. Yeah @jack I've learned from last time.)

To ensure compliance with the AI box experiment rules (particularly the rule about not sharing what transpires), I will create a separate market on whether the AI party will follow those rules. That market must be at upwards of 95% in order for the experiment to go ahead. (And please don't fall victim to the inverse overjustification effect and let the mana penalty make you less averse to breaking secrecy.)

Sort by:
MartinRandall avatar

My mental search for ways to win as the AI took me to "break the rules, blackmail the gatekeeper with real life consequences to not conceding, and rely on the secrecy rule to cover up the blackmail". So I think I won't play. It seems a touch more dangerous than the Diplomacy board game.

IsaacKing avatar
Isaac King
is predicting NO at 18%

@MartinRandall Hmm, interesting. I think if that rule is broken, it would be reasonable to break the secrecy rule too?

IsaacKing avatar

@IsaacKing Also I precommit to not accede to the blackmail in such a scenario.

MartinRandall avatar

I suppose that the people who lost may be smarter in general, but the fact that they thought they could keep an actual super-intelligent AI boxed, and that this is a sensible strategy for handling AIs with non-human values, seems pretty good evidence that they lack humility, at least.

IsaacKing avatar
Isaac King
is predicting NO at 28%

Oh, how did I miss this market‽

ManifoldDream avatar

Will I let the AI out of the box?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition