Do scaling laws happen because models experience a ton of tiny phase changes which average out to a smooth curve?

Ṁ1kṀ1.2k

2030

61%

chance

ALL

Problem 5.31 from @NeelNanda's 200 COP.

"D* 5.31 - Hypothesis: The reason scaling laws happen is that models experience a ton of tiny phase changes, which average out to a smooth curve because of the law of large numbers. Can you find evidence for or against that? Are phase changes everywhere?"

Resolves to the best evidence available by the end of 2030.

Market context

Mechanistic interpretability

Get

1,000

to start trading!

4 Comments

15 Holders

73 Trades

Sort by:

Relevant paper: https://arxiv.org/abs/2303.13506

The Quantization Model of Neural Scaling

We propose the $\textit{Quantization Model}$ of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the $\textit{Quantization Hypothesis}$, where lea…

Suppose there are lots of phase changes but also there are smooth changes: the model doesn't immediately get the exact skip trigram frequencies, or even the best local circuit for encoding a single skip trigram, it gradually makes these more precise over time. If smooth changes and ?? other changes ?? account for 20% of the loss change per step, and faster, almost-discrete-looking phase changes account for 80%, how will this resolve?

@NoaNabeshima How low can the discrete-looking phase changes contribution go before this doesn't resolve Yes?

Comment hidden

Related questions