Will there be a well accepted formal definition of value alignment for AI by 2030?
23
23
380
2030
25%
chance

Well-accepted: if there's a definition accepted by even 25% of the research community I'll resolve yes. If there are multiple similar-but-competing definitions covering 50% of the community I'll resolve yes.

Oct 3, 9:12pm: By "formal definition of value alignment" I mean there is a particular mathematical property we can write out such that we're reasonably confident that an AI with that property would in fact be value aligned in the colloquial sense.

Get Ṁ1,000 play money
Sort by:
bought Ṁ10 of YES

I think there’s >1/3 chance that if we’re alive in 2030, there’s such a definition (e.g., some maybe descendant of PreDCA)

Will there be a well accepted formal definition of value alignment for people by 2030?

Will there be a well accepted formal definition of value alignment for companies by 2030?

Will there be a well accepted formal definition of value alignment for nations by 2030?

We have our answer.

So far we have:

  • add “black person” to 7% of prompts

  • ban reference to Ukrainian cities

  • refuse to release weights, to better profit from selling API queries

  • only allow misspelled references to public figures

I’d say it’s going great, definitely not a bunch of barnacles attaching themselves to 100,0000 ton ship that is AI progress

(This is just a more advanced version of the “what if the car has to decide between swerving to hit 8 grandmas or one stroller” grift.

None of these scenarios or philosophies will matter.

AI will be so powerful a single actor can cause immense destruction — whether from weapons design, propaganda/psy-ops, or the like — long before it “accidentally” violates some ham-fisted “moral principles” encoded in some supposedly “safe” system.

There are no agreed on moral codes for anything else in life—the people who claim to do “AI ethics” are rarely people you’d trust to manage a small team.)

More related questions