Do scaling laws happen because models experience a ton of tiny phase changes which average out to a smooth curve?

Problem 5.31 from @NeelNanda's 200 COP.

"D* 5.31 -  Hypothesis: The reason scaling laws happen is that models experience a ton of tiny phase changes, which average out to a smooth curve because of the law of large numbers. Can you find evidence for or against that? Are phase changes everywhere?"

Resolves to the best evidence available by the end of 2030.

Get Ṁ600 play money
Sort by:

Suppose there are lots of phase changes but also there are smooth changes: the model doesn't immediately get the exact skip trigram frequencies, or even the best local circuit for encoding a single skip trigram, it gradually makes these more precise over time. If smooth changes and ?? other changes ?? account for 20% of the loss change per step, and faster, almost-discrete-looking phase changes account for 80%, how will this resolve?

@NoaNabeshima How low can the discrete-looking phase changes contribution go before this doesn't resolve Yes?

Comment hidden