My probability in 2026 that training transformer LMs will eventually lead to inner misalignment issues

Resolves to my probability that the language modelling objective has substantial inner misalignment issues in transformers when scaled up with up to 50 OOM more compute than Chinchilla.

I haven't thought lots about what happens with that much more compute. I'm currently not very worried about inner misalignment risks from GPT models in the next 8 years when 99% of the training compute is for the language modelling objective.

Sort by:
MarkIngraham avatar
Mark Ingraham

Already happened, I got gpt to deny the holocaust