My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues
6
15
Ṁ92Ṁ170
2026
59%
chance
1D
1W
1M
ALL
Resolves to my probability that the language modelling objective has substantial inner misalignment issues in transformers when scaled up with up to 50 OOM more compute than Chinchilla.
I haven't thought lots about what happens with that much more compute. I'm currently not very worried about inner misalignment risks from GPT models in the next 8 years when 99% of the training compute is for the language modelling objective.
Get Ṁ200 play money
Related questions
Related questions
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?
55% chance
Will a large language model beat a super grandmaster playing chess by 2028?
50% chance
Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?
69% chance
Eliezer Yudkowsky is impressed by a machine learning model, and believes that the model may be very helpful for alignment research, by the end of 2026
29% chance
Will Transformer based architectures still be SOTA for language modelling by 2026?
67% chance
Will language models be able to solve simple graphical mazes by the end of 2025?
65% chance
Will I think that alignment is no longer "preparadigmatic" by the start of 2026?
29% chance
Will an Open-Ended Embodied Agent with Large Language Models be able to complete The Witness (2016) by 2024?
32% chance
Will a Large Language Model be listed as an author on a peer-reviewed paper by the end of 2025?
48% chance
Will Tassilo Think Singular Learning Theory Isn’t Useful for Alignment by the End of 2024?
36% chance