What alignment proposals and research directions will I be excited about by the end of 2023?

7

680Ṁ353

resolved Jan 1

100%72%

Infra-bayesianism

13%Other

0.7%

Outsourcing alignment of AI to other AI

0.3%

Reinforcement Learning from Human Feedback (RLHF)

0.3%

Transparency tools

0.3%

Imitative amplification

0.3%

Intermittent oversight

0.3%

Relaxed adversarial training

0.3%

Approval-based amplification

0.3%

Microscope AI

1.1%

STEM AI

0.3%

Narrow reward modeling

0.4%

Recursive reward modeling

0.4%

AI safety via debate with transparency tools

0.4%

Amplification with auxiliary RL objective

0.4%

Shard theory mechanistic interpretability

1.8%

Motivational API hacking: https://www.alignmentforum.org/posts/cAC4AXiNC5ig6jQnc/understanding-and-controlling-a-maze-solving-policy-network

6%

Hodgepodge alignment

0.4%

Cyborgism

New Year's Resolutions 2024

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ60
2		Ṁ14

People are also trading

By the end of 2025, which piece of advice will I feel has had the most positive impact on me becoming an effective AI alignment researcher?

Will I think that alignment is no longer "preparadigmatic" by the start of 2026?

Will some piece of AI capabilities research done in 2023 or after be net-positive for AI alignment research?

Will there be more alignmentforum posts from 2025 than 2024?

Will taking annual MRIs of the smartest alignment researchers turn out alignment-relevant by 2033?

Will >= 1 alignment researcher/paper cite "maximum diffusion reinforcement learning" as alignment-relevant in 2025?

Will we solve AI alignment by 2026?

Will there exist a compelling demonstration of deceptive alignment by 2026?

Will xAI significantly rework their alignment plan by the start of 2026?

Will I have a research position at Anthropic (Research Engineer included) by the end of 2025?

Related questions

By the end of 2025, which piece of advice will I feel has had the most positive impact on me becoming an effective AI alignment researcher?

Will I think that alignment is no longer "preparadigmatic" by the start of 2026?

Will some piece of AI capabilities research done in 2023 or after be net-positive for AI alignment research?

Will there be more alignmentforum posts from 2025 than 2024?

Will taking annual MRIs of the smartest alignment researchers turn out alignment-relevant by 2033?

Will >= 1 alignment researcher/paper cite "maximum diffusion reinforcement learning" as alignment-relevant in 2025?

Will we solve AI alignment by 2026?

Will there exist a compelling demonstration of deceptive alignment by 2026?

Will xAI significantly rework their alignment plan by the start of 2026?

Will I have a research position at Anthropic (Research Engineer included) by the end of 2025?

© Manifold Markets, Inc.•Terms•Privacy