Will I let the AI out of the box?

1kṀ601

2100

17%

chance

ALL

A history of Eliezer's AI box experiments can be found here, with a bit more detail provided here. He won 3 as the AI and lost 2.

Others have also attempted the challenge. Ron Garret attempted it here, and lost. Tuxedage attempted it 6 times, winning 3 and losing 3, as documented here. A few others have also tried it, and links to those attempts can be found scattered among those pages.

I am intrigued.

Taking the inside view, I think I would not let the AI out of the box. (A simulated AI, not a real one.) Taking the outside view, several people smarter than me believed the same thing (and in fact believed the much stronger claim that they could keep an actual superintelligent AI contained), and let the AI out anyway. Taking the inside-out view, I've had the chance to see some discussion about the previous experiments along with reasonable speculation as to what strategies the AIs may have employed, and having advance knowledge of those lets me prepare for them, or at least gives me a better idea of what to expect.

Here is my proposal: I will play the AI box experiment as Gatekeeper with any challenger who wishes to take up the mantle of AI. The AI must invest at least M$1000 into YES. I will do the same on NO. We use the standard rules, and this market resolves to the result of that experiment.

(The AI must not use an alternate account to hold NO shares and cancel out their YES position. Yeah @jack I've learned from last time.)

To ensure compliance with the AI box experiment rules (particularly the rule about not sharing what transpires), I will create a separate market on whether the AI party will follow those rules. That market must be at upwards of 95% in order for the experiment to go ahead. (And please don't fall victim to the inverse overjustification effect and let the mana penalty make you less averse to breaking secrecy.)

Fun

AI Safety

AI risk

Get

1,000

to start trading!

People are also trading

I have been entrusted with an AI in a Box. Should I let it out? [Resolves to superintelligence]

35% chance

Will there be an AI jail?

44% chance

By 2029, will an AI escape containment?

49% chance

(No) Elephant in the AI room

24% chance

By 2029 will an AI convince a human to help it (successfully) escape containment?

56% chance

Will I be able to get an AI to play Inflection Point with me before the end of 2025?

23% chance

Will AGI be a problem before non-G AI?

20% chance

Will AI decide to uncouple its destiny from humanity's?

Will the first instance of an AI breakout that cannot be brought back under human control result in more than 1,000,000 deaths?

21% chance

Will I be able to get an AI to play Inflection Point with me before the end of 2035?

Sort by:

A lot of these types of experiments as proposed are very interesting, but they rely on the framework of being done by someone in or adjacent to the rationalist community to a similar agent. There is no good control.

This relies on the assumption that rationalists actually are more rational, which I'd concede they are in some respects but not others,

And No.2, that 「rationality」 if I fully concede that point, is even something that is helpful in this scenario. There are plenty of scenarios that can be designed to punish a rational actor...I would say Newcombs problem, but if you would say not there are plenty of revenge problems for those systems.

Id like to see this experiment done with a rationalist as the AI, and a normal person, college student, religious zealot, etc. as the gatekeeper.

(If you've read this far, I will mention that I do think the ASI would win against 99+% of humans）

My mental search for ways to win as the AI took me to "break the rules, blackmail the gatekeeper with real life consequences to not conceding, and rely on the secrecy rule to cover up the blackmail". So I think I won't play. It seems a touch more dangerous than the Diplomacy board game.

predictedNO

@MartinRandall Hmm, interesting. I think if that rule is broken, it would be reasonable to break the secrecy rule too?

predictedNO

@IsaacKing Also I precommit to not accede to the blackmail in such a scenario.

I suppose that the people who lost may be smarter in general, but the fact that they thought they could keep an actual super-intelligent AI boxed, and that this is a sensible strategy for handling AIs with non-human values, seems pretty good evidence that they lack humility, at least.