My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues

6

14

Ṁ92Ṁ170

2026

59%

chance

1D

1W

1M

ALL

Resolves to my probability that the language modelling objective has substantial inner misalignment issues in transformers when scaled up with up to 50 OOM more compute than Chinchilla.

I haven't thought lots about what happens with that much more compute. I'm currently not very worried about inner misalignment risks from GPT models in the next 8 years when 99% of the training compute is for the language modelling objective.

Get Ṁ200 play money

## Related questions

## Related questions

By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?

55% chance

Will a large language model beat a super grandmaster playing chess by 2028?

48% chance

Eliezer Yudkowsky is impressed by a machine learning model, and believes that the model may be very helpful for alignment research, by the end of 2026

29% chance

[Metaculus] Will an LLM at least on the scale of GPT-4 be widely available for download before January 1st, 2025?

61% chance

Will Transformer based architectures still be SOTA for language modelling by 2026?

67% chance

Will I think that alignment is no longer "preparadigmatic" by the start of 2026?

29% chance

Will language models be able to solve simple graphical mazes by the end of 2025?

65% chance

Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?

77% chance

Will a Large Language Model be listed as an author on a peer-reviewed paper by the end of 2025?

48% chance

Will Tassilo Think Singular Learning Theory Isn’t Useful for Alignment by the End of 2024?

36% chance