This is a solution to alignment.

1kṀ29k

2030

19%

chance

ALL

A game theoretic solution to alignment would be to create a function that rewards 2 Bitcoin for demonstrating a proper understanding of the "AI situation" (a sort of Eliezer litmus test for whether a person has a firm grasp of the risks posed by AI and what they can do to stop it) to a humanity verified public ledger (intellectual consent blockchain verification (a sort of crypto credential that's designed to train AI on what different people consent to being true)).

This function can be built.

Building it would also transform the way the world communicates and transform the economy into one that primarily rewards education. As in, people can pay other people to learn things. This will scale to replace all advertising, journalism, media, peer-review, educational institution, government and all communication platforms. It will create a true meritocracy and ensure true free speech (the ability to put an idea into the public domain for consideration) with plenty of resources for everyone on the planet to live an amazing life.

Then. After everyone is rich and gets along. We use that data to train AI instead.

Will resolve yes given Eliezer's consent.

Will resolve no given my consent.

I pledge on Kant’s name to try as hard as I can to consent.

If someone can supply the Bitcoin, I'll build this.

If you think that's crazy, please explain why or bet against me.

Thanks 🤓

AI Alignment

Get

1,000

to start trading!

People are also trading

Will we “muddle through” alignment?

66% chance

Is AI alignment computable?

53% chance

If alignment is solved, what percentage of the technical work will have come from MIRI?

5% chance

Who will cheerfully explain to @Krantz why his proposed solution to alignment of ASI isn't valid?

420 mana for the first broadly accepted solution to AI alignment

Sort by:

I would bet more Yes if I didn't believe an aligned or non-evil AGI would immediately try to leave the Earth, to value our historical struggle for self-discovery, rather than stunt it by becoming a benevolent singleton nominally under the control of the US or PRC governments. We will not know then whether it was aligned by the social Krantz solution or by anything else. After it leaves us (the risk of AI abandonment will dissuade further AGI projects), the main use of Krantz systems will be in aligning humans.

@Krantz I hope you didn't take Kant's name in vain.

@Jono3h Never. Just didn't really see any challenges to my approach.

everyone is rich and gets along. We use that data to train AI instead.

Your solution does not address various ways in which AI would ruin the world.

misalignment prevention: the goal is not to build an aligned AI, it is to make sure no one ever builds a misaligned, powerful AI. What do you do with your AI that is trained on near-utopian data?
deadline: building cryptotopia is a multi-decade project, before which misaligned AIs have probably already been deployed.

out-of-distribution (OOD) misbehavior: once society is being influenced by your AI, it will start to look vastly different from what the AI's training data looked like. Our current systems are known to be untrustworthy when dealing with OOD data and ensuring OOD robustness is an open problem you don't address.
reward hacking: AIs don't look at problems like humans do and might generalize differently from the data then what we intuitively would expect.

me pre-empting some rebuttals

You didn't address the crypto part.

Yes, the above problems persist even if the crypto scheme improves society.

The only thing you try to solve for is getting high quality data, but the alignment problem is by far not only the bad-training-data problem (which I don't claim that you solve).

You speak of preventing AGI-induced catastrophe, I'm just solving the alignment problem.

That's what the alignment problem is.

I can address your points by [raising something new].

I'll hear you out, please resolve your market to NO though. Or inferiorly, edit the market description to say that the "this" in "this is a solution to alignment" does not refer to the market description.

You ought invest more into my idea by reading / watching 60+ minutes of [recommended stuff].

I plausibly spent more time responding to your market than you did on making it. Read Yudkowsky's AGI Ruin post, since that addresses a variety of ways in which alignment plans like yours tend to fail before you ask me to spend more time.

Nonetheless, thanks for at least trying and good luck.

@Jono3h You don't seem to understand my objectives.

Would you like to point out where my argument fails?

https://manifold.markets/Krantz/which-proposition-will-be-denied

@Krantz you linked me a list that did not contain the statement

Then. After everyone is rich and gets along. We use that data to train AI instead.

which was the one I was addressing.

If you no longer stand by that statement, then your market should reflect that (resolve NO, or edit the resolution criteria).

I don't know why you'd not take me seriously here.
good luck

@Jono3h The system I am building prevents, game theoretically, the building of AI until we have proven a consensus for how to build it safely (which would include individuals like Eliezer to consent). Afterwhich, we could then use that data to build safe AI.

It is entirely possible, and in my opinion likely, that we would not reach such consensus and thus wouldn't build AI.

What you are asking, are the details of such a consensus, if it were to occur.

I cannot speculate on that anymore accurately than I can speculate on how AI would beat you in chess.

I'd love a market "will anyone think this market resolved fairly" resolves by poll or something.

What’s stopping someone from claiming the “I understand” prize multiple times?

Great question. Humanity verification is critical.

Users creating multiple accounts to earn extra profit plays a large role in why my work is potentially infohazardous.

As for the same 'humanity verified user' making duplicate claims, that's not really how the program works.

It's similar to asking what prevents a given user from 'liking' a comment on X multiple times.

The basic design of the system?

Biometric verification? Our devices already have the ability.

I'm talking about humanity verification on a decentralized ledger.

Something like worldcoin.

Even if everyone understood the "AI situation" how does that solve alignment? I think it might even increase doom probability because more people would become interested in AI.

bought Ṁ1,000 YES

It's the 'humanity grows up' possible future.

Not claiming it's guaranteed, just that creating this program drastically increases the odds.

Could work. $40 would be a good start to make everyone just watch a short video and demonstrate understanding. I think a lot of relevant information could be packed into one super high quality video. The idea would then obviously be that enough people will make governments slow down or stop AI capabilities research such that sufficient safeguards can be put in place.

I'm not sure whether there aren't less expensive solutions, however.

Economists can write 100s of papers confirming huge negative externalities to humanity and society from certain things and more than 51% of people will still vote, not just against Pigouvian taxes, but for subsidies of these things.

Right, that's because we haven't paid them to consent to particular propositions, within those writings, being true in a way that's legally binding. In a way where representatives can use that data to legislate on their behalf.

If we had that, these situations would not occur.

That's literally how these Pigouvian taxes often work (with rebates from the tax) and it still occurs.

The distinction you are not recognizing is the set of people that get to choose which (Pigouvian taxes / incentive mechanisms) to create.

Sure, once you have control of the government, the media and everyone's attention, you could pick some safe topics for individuals to earn some credit for exploring.

What I'm talking about is the ability for any person on the planet to offer monetary incentives to any set of individuals they select to look at whatever information they choose without needing any governing body to oversee it.

The power lies not in the revenue gained from learning the curriculum.

The power lies in being able to define the curriculum.

There's nothing stopping anyone today from paying people to listen to whatever they have to say. Haven't you heard of time-share scams?

@Krantz what keeps this from settling into the equilibrium where e.g. coca-cola pays many times more for young people's eyeball time than Khan Academy's entire operating budget?

Thank you for the legitimate question.

It isn't about 'eyeball time' it's about acknowledging steps of inference.

If Verizon is aimed at 'teaching' its consumers that their new phone is waterproof, then that requires individuals to understand their phone is waterproof. That's the objective Verizon has. Currently, they need to blast a ton of advertisements because they do not know which consumers actually saw or paid attention to their ad.

This is different with Khan Academy. There are vastly more points of verifiable information in Khan Academy's 'advertising campaign'.

If I were to ask both Verizon and Khan Academy, "What does the constitution of facts that you would like to get into people's heads look like?", although Verizon's budget for investment will be significantly higher, their constitution will not be nearly as large or profitable in the machine I'm aiming to build.

What it boils down to is 'How much is Verizon willing to pay you to acknowledge the fact that you understand their phone is waterproof?' (This takes a couple of seconds) vs 'How much is Khan Academy (though I'm sure your neighbors will want to contribute) willing to pay you to acknowledge all the facts required to demonstrate you have a good general education?'.

Isn't this like a terrible conflict of interest because the creator is also the person whose idea it's about, and the resolution criteria are very ambiguous, and that person (the creator / subject) also has a bunch of YES bets?