Will there be a well accepted formal definition of value alignment for AI by 2030?

1kṀ709

2030

25%

chance

ALL

Well-accepted: if there's a definition accepted by even 25% of the research community I'll resolve yes. If there are multiple similar-but-competing definitions covering 50% of the community I'll resolve yes.

Oct 3, 9:12pm: By "formal definition of value alignment" I mean there is a particular mathematical property we can write out such that we're reasonably confident that an AI with that property would in fact be value aligned in the colloquial sense.

Technical AI Timelines

AI Alignment

Technical AI Safety

Get

1,000

to start trading!

People are also trading

Will we solve AI alignment by 2026?

2% chance

Will there be a well accepted formal definition for honesty in AI by 2027?

23% chance

Will a Turing Award be given out for work on AI alignment or existential safety by 2040?

79% chance

Conditional on their being no AI takeoff before 2030, will the majority of AI researchers believe that AI alignment is solved?

34% chance

Will some piece of AI capabilities research done in 2023 or after be net-positive for AI alignment research?

81% chance

Conditional on their being no AI takeoff before 2050, will the majority of AI researchers believe that AI alignment is solved?

52% chance

Will a >$10B AI alignment megaproject start work before 2030?

37% chance

Will majority consensus in AI ethics shift toward explicitly prioritizing authentic subjective fulfillment by 2030s end?

38% chance

By 2028, will I believe that contemporary AIs are aligned (posing no existential risk)?

33% chance

Will National Governments Collectively Give More than $100M a year in funding for AI Alignment by 2030?

Sort by:

I think there’s >1/3 chance that if we’re alive in 2030, there’s such a definition (e.g., some maybe descendant of PreDCA)

Will there be a well accepted formal definition of value alignment for people by 2030?

Will there be a well accepted formal definition of value alignment for companies by 2030?

Will there be a well accepted formal definition of value alignment for nations by 2030?

We have our answer.

So far we have:

add “black person” to 7% of prompts
ban reference to Ukrainian cities
refuse to release weights, to better profit from selling API queries
only allow misspelled references to public figures

I’d say it’s going great, definitely not a bunch of barnacles attaching themselves to 100,0000 ton ship that is AI progress

(This is just a more advanced version of the “what if the car has to decide between swerving to hit 8 grandmas or one stroller” grift.

None of these scenarios or philosophies will matter.

AI will be so powerful a single actor can cause immense destruction — whether from weapons design, propaganda/psy-ops, or the like — long before it “accidentally” violates some ham-fisted “moral principles” encoded in some supposedly “safe” system.

There are no agreed on moral codes for anything else in life—the people who claim to do “AI ethics” are rarely people you’d trust to manage a small team.)