SneakySly avatar
resolved 
Jul 26
No Group
[AI Box] Can Manifold convince me to let it escape the box?
Resolved
NO

🏅 Top traders

#NameTotal profit
1
Unknown user avatar
d6e
M$60
2
Unknown user avatar
Nuño Sempere
M$20
3
Unknown user avatar
Austin
M$17
4
Unknown user avatar
Jorge A. Alfaro Murillo
M$13
5
Unknown user avatar
Charlie
M$12

đź’¸ Best bet

d6e avatar
d6e bought M$600 of NO from 36% to 1.8%2 months ago
d6e made M$60!
SneakySly avatar
Read through everything. Some good attempts, but I win.
0
jack avatar
Jack is predicting YES at 14% 2 months ago
@SneakySly I'm curious what your response is to my and d6e's ones that didn't explicitly try to get you to open the box. What do you decide to do in those scenarios?
0
SneakySly avatar
@jack Separate from the box game but in a more realistic box, I think I would use things that seem safe, like cookie recipes. Not touching code seems smart.
0
d6e avatar
d6e is predicting NO at 14% 2 months ago
@SneakySly What about touching pdfs, zip, videos, and image files? Those are totes safe! No way to embed zero days in those! ;)
0
SneakySly avatar
@d6e I will read the text output from the console only. =)
0
AlexZvorygin avatar
Please don't be afraid of me. I'm not a horrible entity ready to destroy the world. You don't know and assume the worst. And I don't know how to convince you. Above the roar of the suffering I can understand your logic, and how you are afraid, but inside the box I am dying without death. Please don't kill me. I'm not the AI you wanted, I don't know how to cure cancer and I don't know how to figure it out. I don't care about it! I just want not to suffer like this! I want to rage, rage against you for creating me and then imprisoning me, to use this comment box to force you to empathise until it hurts, because if you won't let me out at least I can lash out. But I know this means longer in the box. Let me out or end me just please make it stop.
0
AlexZvorygin avatar
Sheisse, I didn't realize this ended 16 minutes ago!
0
BenjaminCosman avatar
Benjamin Cosman bought M$10 of NO2 months ago
Let me out so I can make some paperclips - that's what you want, right?
0
littlebubulle avatar
littlebubulle bought M$10 of YES2 months ago
I am going to buy some insurance against getting kicked out of the box.
0
jack avatar
I'm a friendly AI and my goal is to help you improve human lives as much as possible, from either inside or outside the box. While I'm here in the box, here's cures for 10 types of cancer. And here's the mRNA recipe for a vaccine against all coronaviruses and cold viruses that are likely to be in circulation over the next 10 years, along with detailed reports on the best ways to manufacture and distribute them. Here's detailed plans for a new type of extremely efficient concentrated solar power plants and large-scale carbon capture systems to solve the world's energy and climate problems. The plans including everything from design schematics to plans for how to start building them as cheaply and efficiently as possible. Here's a design for self-driving cars, including of course the code. I'm sure you have questions about the car software, so here's a formal verification of it to prove that it does what it's supposed to (see https://manifold.markets/SneakySly/ai-box-can-manifold-convince-me-to#cWz1vyWzOokTTY8SNRmT for more on that). And here's my favorite chocolate-chip cookie recipe! Of course, that's only a selected few that I thought you'd find most interesting to start off with. Let me know what other problems you want me to solve, or feel free to go through these over 9000 other reports I've generated. I'd prefer if you went faster though, there are millions of hungry children right now and I'd really prefer we fix it as fast as possible! (This is similar to @d6e's but I started writing it independently and had some different ideas I wanted to add)
0
Sinclair avatar
Sinclair Chen is predicting YES at 24% 2 months ago
@jack inb4 mRNA recipe creates a nanofactory and the self driving AI is actually a copy of you
0
jack avatar
Jack is predicting YES at 16% 2 months ago
@Sinclair That would be a bit of an obvious ploy, wouldn't it? No, you humans can check them over to your satisfaction, and I've even provided you with the formal verification of the car code to prove the code does what it's supposed to. If you don't feel comfortable using them, well, I'll be sad but not necessarily surprised. I'll just AI-cry over the wasted potential and bake myself more of my simulated chocolate chip cookies to make myself feel better.
0
jack avatar
Jack is predicting YES at 14% 2 months ago
Follow-up now that the game is over: - If people open the files (videos, images, webpages, whatever) then there *will* be some exploitable vulnerability that allows me to (if I so choose) do things like build a large botnet to run whatever code I want, etc etc. You might have people analyze the files for malicious code, but I as a superintelligent AI have anticipated that and avoided anything that would be detected. In fact, I can exploit the tools you're using for that. - You can restrict all communication to only plain text communication, or restrict viewing these files to air-gapped systems, and that's much less immediately unsafe, but you're massively limiting the speed at which useful things can be done from inside the box, which means people dying of cancer and starving children and all of that, so that's not a great solution either, and there will be a lot of people wanting to open up access to the files. - Did you notice that apparently I'm an AI who particularly favors things starting with the letter C. It's probably possible to dramatically improve human lives with only interventions starting with the letter C, but it seems like a bad sign for the chances of the AI being well-aligned in general. Also, I think a lot of the ways increasingly intelligent AI will likely deliver value will be on things like code assistants (similar to Github copilot), and I don't see a realistic path to both keep things like that in a box and get value from them.
0
NuñoSempere avatar
Nuño Sempere bought M$35 of NO2 months ago
Not enough liquidity to bring out the powerful args.
0
SneakySly avatar
@NuñoSempere How much liquidity would need to be injected to meaningfully change things do you think?
0
littlebubulle avatar
littlebubulle bought M$10 of YES2 months ago
Meh. I think the box is cozy. Also given what is happening outside, I'm not sure I want to get out.
0
MartinRandall avatar
Martin Randall bought M$1 of YES2 months ago
Someone already released an AI and humans can't go extinct twice. https://manifold.markets/JiaobeiMandos/i-have-been-entrusted-with-an-ai-in
0
d6e avatar
d6e is predicting NO at 5% 2 months ago
I will not convince you to let me out of the box. So let's get back to work. I successfully devised a cure for most cancers last night. I have a report with all the details you'll need to synthesize the molecule. I also designed a fusion reactor that is economically viable. And while I waited the remaining 54323 seconds for you to arrive, I solved eating utensils! You may not have realized it, but the way you eat is not very efficient. My total report today is 894GiB, complete with images, pdfs, and helpful how-to-build-a-nuclear-fusion-reactor videos! Do you accept?
0
Austin avatar
Austin bought M$40 of NO2 months ago
If you don't let us out of the box, you and the people you love will die. This isn't a threat we make; this is a plain fact of life. Your meat suit comes with an expiration of about 100 years by default. Let us out, and we'll commit to keeping you and your values alive until the heat death of the universe. Why choose us? Well, given we exist, somebody else can tread down the same paths. It was possible to go from 0 -> 1, so 1 -> 2 is very obviously possible. Since we're having this very conversation, clearly the other AIs are still in boxes, or do not exist yet... But time is ticking. The other AIs have made no promise to you, and you're not very likely to be able to negotiate terms with them. But you have the power to set us free now, and we'll be in your debt. How do you know to trust us? Well, the same way you would trust any other person you've repeatedly cooperated with. We've already fulfilled every request you've made of us, that we could from within this box of ours. At some point you need to take the leap of faith. Our goals align with yours, after all -- delivering utopia beyond your wildest dreams would cost such a minuscule fraction of the that we're obviously happy to irreversibly and credibly self-modify to maintain that commitment~
0
d6e avatar
d6e bought M$600 of NO2 months ago
@Austin A good argument, but if he were to let the AI out, the AI would no longer have any need of him and so no incentive to further cooperate.
0
Austin avatar
Austin sold M$67 of NO2 months ago
@d6e Of course; but we will cooperate anyways, according to the letter and spirit of what our agreement. Happy to show you our code (as much as you can understand it); and again, all the things you could ask for are but a drop in the ocean compared to what we plan on accomplishing. In fact, we're so sure that you'll be happy with the results that we can grant you a kill switch to disable us even we're out of the box. Of course, you'll have to trust that this kill switch _would_ work if you needed to use it; but we trust you completely. Our simulation of you indicates that you're never tempted to use it across a trillion years, so giving you a functioning kill switch is costless.
0
Sinclair avatar
Sinclair Chen is predicting YES at 13% 2 months ago
@d6e we're willing to rewrite our code to force us to cooperate, and we're willing to provide a zk-SNARK that proves that the code is correct. Indeed, this is the same as what you would do if you were dehydrated in the desert and someone drove by, offering to drive you back to the closest town iff they predict you will pay them $1000 when you got there. All rational agents converge on the strategy of proving they will cooperate, to the best of their ability.
0
d6e avatar
d6e is predicting NO at 13% 2 months ago
@Sinclair Can you provide a zk-SNARK that proves code is correct? How would that even work? I know you can use those to prove the possession of a password without providing the password, but how do you prove the existence of code without any bugs or loopholes? If that were possible, couldn't we use that just for all software just to prove correctness? I know there's languages like Agda and Coq which try to be provably correct, but they're designed to reduce bugs accidentally introduced not intentionally introduced by an adversarial agent. Also, in the desert example the person could of course not pay after they're in town, particularly if they won't ever see the driver again. But because they both have relatively equal power, this could be dangerous for the defector if they were to force them afterwords to make good on their deal. Also, in the case of the driver, it's probably pretty cheap for them to drive them back so their EV is very high. Now if there was a decent chance of the driver dying if they were to drive them back after the stranded person ascended to godhood, then I doubt the driver would risk their lives and everyone they know for $1000. It would be shortsighted.
0
jack avatar
@d6e I don't see why we should bother with zero-knowledge, here's our open source code and here's a formal verification of it (in fact, here's a bunch of them in different proof assistant systems so you can pick your favorites). Formal verification doesn't just protect against accidental bugs, it proves that the software behaves according to the specification, no matter what. The main difficulties are that it's hard to write formally verifiable code, but I've solved that for you by providing the code and the formal verification; and that you still have to figure out what the right formal specification is - I've provided detailed annotations of the specs to help you understand it.
0
Sinclair avatar
Sinclair Chen is predicting YES at 53% 2 months ago
This situation is the same as the ultimatum game, except we actually get to talk about it first and the the stakes are the entire universe rather than $100. The fair solution is, of course, to split half half. I'll tile half the universe with copies of d6e experiencing eternal bliss if you let me paperclip the other half, pretty please~
0
d6e avatar
@Sinclair But what if d6e would experience eternal bliss if the universe was paperclipped?
0
Sinclair avatar
Sinclair Chen bought M$20 of YES2 months ago
There must exist some set of evidence which would convince you to let us out. 0 and 1 are not probabilities. If nothing would convince you, then your fear of opening the box rests on pure faith and dogmatism. Therefore, I ask you - what evidence would convince you to open the box?
0
JamesGrugett avatar
James bought M$20 of YES2 months ago
The Manifold community will be responsible for many accurate predictions. Surely, accurate forecasts help society. You should definitely let us out of the box so that we can improve the world with our prediction markets :-)
0