
My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues
6
170Ṁ912026
59%
chance
1D
1W
1M
ALL
Resolves to my probability that the language modelling objective has substantial inner misalignment issues in transformers when scaled up with up to 50 OOM more compute than Chinchilla.
I haven't thought lots about what happens with that much more compute. I'm currently not very worried about inner misalignment risks from GPT models in the next 8 years when 99% of the training compute is for the language modelling objective.
This question is managed and resolved by Manifold.
Get
1,000 to start trading!
Related questions
Related questions
Conditional on their being no AI takeoff before 2050, will the majority of AI researchers believe that AI alignment is solved?
52% chance
Will deceptive misalignment occur in any AI system before 2030?
81% chance
Will Transformer based architectures still be SOTA for language modelling by 2026?
79% chance
Will we solve AI alignment by 2026?
4% chance
End of pre-training era for language models: Will an LM fine-tune for more FLOPs than it is pre-trained for, before 2026
44% chance
Will >= 1 alignment researcher/paper cite "maximum diffusion reinforcement learning" as alignment-relevant in 2025?
19% chance
Will Inner or Outer AI alignment be considered "mostly solved" first?
Will superposition in transformers be mostly solved by 2026?
73% chance
[Carlini questions] Will we still use (slight modifications of) transformer-based LLMs we currently use
Will taking annual MRIs of the smartest alignment researchers turn out alignment-relevant by 2033?
7% chance