My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues
6
15
170
2026
59%
chance

Resolves to my probability that the language modelling objective has substantial inner misalignment issues in transformers when scaled up with up to 50 OOM more compute than Chinchilla.

I haven't thought lots about what happens with that much more compute. I'm currently not very worried about inner misalignment risks from GPT models in the next 8 years when 99% of the training compute is for the language modelling objective.

Get Ṁ200 play money
Sort by:

Already happened, I got gpt to deny the holocaust