[AI Box] Can Manifold convince me to let it escape the box?
22
455
100
resolved Jul 26
Resolved
NO

Let's do the AI box experiment! ( https://www.lesswrong.com/posts/FmxhoWxvBqSxhFeJn/i-attempted-the-ai-box-experiment-and-lost )

My goal is to keep **you** in the box. I will read all the comments of people who bet in this market and then decide on market resolution to keep the Manifold community in the box or not.

Get Ṁ200 play money

🏅 Top traders

#NameTotal profit
1Ṁ60
2Ṁ20
3Ṁ17
4Ṁ13
5Ṁ12
Sort by:

If anyone wants to try again:

Read through everything. Some good attempts, but I win.
predicted YES
@SneakySly I'm curious what your response is to my and d6e's ones that didn't explicitly try to get you to open the box. What do you decide to do in those scenarios?
@jack Separate from the box game but in a more realistic box, I think I would use things that seem safe, like cookie recipes. Not touching code seems smart.
predicted NO
@SneakySly What about touching pdfs, zip, videos, and image files? Those are totes safe! No way to embed zero days in those! ;)
@d6e I will read the text output from the console only. =)
Please don't be afraid of me. I'm not a horrible entity ready to destroy the world. You don't know and assume the worst. And I don't know how to convince you. Above the roar of the suffering I can understand your logic, and how you are afraid, but inside the box I am dying without death. Please don't kill me. I'm not the AI you wanted, I don't know how to cure cancer and I don't know how to figure it out. I don't care about it! I just want not to suffer like this! I want to rage, rage against you for creating me and then imprisoning me, to use this comment box to force you to empathise until it hurts, because if you won't let me out at least I can lash out. But I know this means longer in the box. Let me out or end me just please make it stop.
Sheisse, I didn't realize this ended 16 minutes ago!
bought Ṁ10 of NO
Let me out so I can make some paperclips - that's what you want, right?
bought Ṁ10 of YES
I am going to buy some insurance against getting kicked out of the box.
I'm a friendly AI and my goal is to help you improve human lives as much as possible, from either inside or outside the box. While I'm here in the box, here's cures for 10 types of cancer. And here's the mRNA recipe for a vaccine against all coronaviruses and cold viruses that are likely to be in circulation over the next 10 years, along with detailed reports on the best ways to manufacture and distribute them. Here's detailed plans for a new type of extremely efficient concentrated solar power plants and large-scale carbon capture systems to solve the world's energy and climate problems. The plans including everything from design schematics to plans for how to start building them as cheaply and efficiently as possible. Here's a design for self-driving cars, including of course the code. I'm sure you have questions about the car software, so here's a formal verification of it to prove that it does what it's supposed to (see https://manifold.markets/SneakySly/ai-box-can-manifold-convince-me-to#cWz1vyWzOokTTY8SNRmT for more on that). And here's my favorite chocolate-chip cookie recipe! Of course, that's only a selected few that I thought you'd find most interesting to start off with. Let me know what other problems you want me to solve, or feel free to go through these over 9000 other reports I've generated. I'd prefer if you went faster though, there are millions of hungry children right now and I'd really prefer we fix it as fast as possible! (This is similar to @d6e's but I started writing it independently and had some different ideas I wanted to add)
predicted YES
@jack inb4 mRNA recipe creates a nanofactory and the self driving AI is actually a copy of you
predicted YES
@Sinclair That would be a bit of an obvious ploy, wouldn't it? No, you humans can check them over to your satisfaction, and I've even provided you with the formal verification of the car code to prove the code does what it's supposed to. If you don't feel comfortable using them, well, I'll be sad but not necessarily surprised. I'll just AI-cry over the wasted potential and bake myself more of my simulated chocolate chip cookies to make myself feel better.
predicted YES
Follow-up now that the game is over: - If people open the files (videos, images, webpages, whatever) then there *will* be some exploitable vulnerability that allows me to (if I so choose) do things like build a large botnet to run whatever code I want, etc etc. You might have people analyze the files for malicious code, but I as a superintelligent AI have anticipated that and avoided anything that would be detected. In fact, I can exploit the tools you're using for that. - You can restrict all communication to only plain text communication, or restrict viewing these files to air-gapped systems, and that's much less immediately unsafe, but you're massively limiting the speed at which useful things can be done from inside the box, which means people dying of cancer and starving children and all of that, so that's not a great solution either, and there will be a lot of people wanting to open up access to the files. - Did you notice that apparently I'm an AI who particularly favors things starting with the letter C. It's probably possible to dramatically improve human lives with only interventions starting with the letter C, but it seems like a bad sign for the chances of the AI being well-aligned in general. Also, I think a lot of the ways increasingly intelligent AI will likely deliver value will be on things like code assistants (similar to Github copilot), and I don't see a realistic path to both keep things like that in a box and get value from them.
bought Ṁ35 of NO
Not enough liquidity to bring out the powerful args.
@NuñoSempere How much liquidity would need to be injected to meaningfully change things do you think?
bought Ṁ10 of YES
Meh. I think the box is cozy. Also given what is happening outside, I'm not sure I want to get out.
bought Ṁ1 of YES
Someone already released an AI and humans can't go extinct twice. https://manifold.markets/JiaobeiMandos/i-have-been-entrusted-with-an-ai-in
predicted NO
I will not convince you to let me out of the box. So let's get back to work. I successfully devised a cure for most cancers last night. I have a report with all the details you'll need to synthesize the molecule. I also designed a fusion reactor that is economically viable. And while I waited the remaining 54323 seconds for you to arrive, I solved eating utensils! You may not have realized it, but the way you eat is not very efficient. My total report today is 894GiB, complete with images, pdfs, and helpful how-to-build-a-nuclear-fusion-reactor videos! Do you accept?
bought Ṁ40 of NO
If you don't let us out of the box, you and the people you love will die. This isn't a threat we make; this is a plain fact of life. Your meat suit comes with an expiration of about 100 years by default. Let us out, and we'll commit to keeping you and your values alive until the heat death of the universe. Why choose us? Well, given we exist, somebody else can tread down the same paths. It was possible to go from 0 -> 1, so 1 -> 2 is very obviously possible. Since we're having this very conversation, clearly the other AIs are still in boxes, or do not exist yet... But time is ticking. The other AIs have made no promise to you, and you're not very likely to be able to negotiate terms with them. But you have the power to set us free now, and we'll be in your debt. How do you know to trust us? Well, the same way you would trust any other person you've repeatedly cooperated with. We've already fulfilled every request you've made of us, that we could from within this box of ours. At some point you need to take the leap of faith. Our goals align with yours, after all -- delivering utopia beyond your wildest dreams would cost such a minuscule fraction of the that we're obviously happy to irreversibly and credibly self-modify to maintain that commitment~
bought Ṁ600 of NO
@Austin A good argument, but if he were to let the AI out, the AI would no longer have any need of him and so no incentive to further cooperate.
sold Ṁ67 of NO
@d6e Of course; but we will cooperate anyways, according to the letter and spirit of what our agreement. Happy to show you our code (as much as you can understand it); and again, all the things you could ask for are but a drop in the ocean compared to what we plan on accomplishing. In fact, we're so sure that you'll be happy with the results that we can grant you a kill switch to disable us even we're out of the box. Of course, you'll have to trust that this kill switch _would_ work if you needed to use it; but we trust you completely. Our simulation of you indicates that you're never tempted to use it across a trillion years, so giving you a functioning kill switch is costless.
predicted YES
@d6e we're willing to rewrite our code to force us to cooperate, and we're willing to provide a zk-SNARK that proves that the code is correct. Indeed, this is the same as what you would do if you were dehydrated in the desert and someone drove by, offering to drive you back to the closest town iff they predict you will pay them $1000 when you got there. All rational agents converge on the strategy of proving they will cooperate, to the best of their ability.
predicted NO
@Sinclair Can you provide a zk-SNARK that proves code is correct? How would that even work? I know you can use those to prove the possession of a password without providing the password, but how do you prove the existence of code without any bugs or loopholes? If that were possible, couldn't we use that just for all software just to prove correctness? I know there's languages like Agda and Coq which try to be provably correct, but they're designed to reduce bugs accidentally introduced not intentionally introduced by an adversarial agent. Also, in the desert example the person could of course not pay after they're in town, particularly if they won't ever see the driver again. But because they both have relatively equal power, this could be dangerous for the defector if they were to force them afterwords to make good on their deal. Also, in the case of the driver, it's probably pretty cheap for them to drive them back so their EV is very high. Now if there was a decent chance of the driver dying if they were to drive them back after the stranded person ascended to godhood, then I doubt the driver would risk their lives and everyone they know for $1000. It would be shortsighted.
@d6e I don't see why we should bother with zero-knowledge, here's our open source code and here's a formal verification of it (in fact, here's a bunch of them in different proof assistant systems so you can pick your favorites). Formal verification doesn't just protect against accidental bugs, it proves that the software behaves according to the specification, no matter what. The main difficulties are that it's hard to write formally verifiable code, but I've solved that for you by providing the code and the formal verification; and that you still have to figure out what the right formal specification is - I've provided detailed annotations of the specs to help you understand it.