Will the largest AI training run in 2025 utilize Sophia, Second-order Clipped Stochastic Optimization?
19
258
875
2026
18%
chance

Advances in AI training are primarily driven by algorithmic innovation, data availability, and the amount of compute used for training. The measure of compute used to train a single model, rather than the capacity of an entire datacenter or the speed of a single GPU, is a crucial factor correlating to the power of our best AI models. On the algorithmic side, Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer, has been proposed. It uses a light-weight estimate of the diagonal Hessian as the pre-conditioner and achieves a 2x speed-up compared with Adam in the number of steps, total compute, and wall-clock time on language modeling with GPT-2 models.

Will the largest AI training run by compute announced in 2025 utilize Sophia, Second-order Clipped Stochastic Optimization in its training process?

Resolution Criteria:

This question will resolve positively if a credible source, such as a reputable AI research organization, AI company, or academic paper, confirms that the largest AI training run by compute announced in 2025 utilizes Sophia, Second-order Clipped Stochastic Optimization. The training run should be the one that uses the most amount of compute to train a single model announced in the year 2025.

The question will resolve negatively if it is confirmed by a credible source that the largest AI training run by compute announced in 2025 does not utilize Sophia, Second-order Clipped Stochastic Optimization, by the end of 2028.

If no information about the largest AI training run by compute announced in 2025 is available from credible sources by the end of 2028, the question will resolve as N/A.

Get Ṁ200 play money
Sort by:

Probably works (Lion also worked.)

Bigger deal:

Better to award to largest disclosed model (otherwise odds are very low of disclosure)

Academics shipped nothing for a decade, then as soon as language became the benchmark—have shipped a ton.

Both fine-tuning of LLaMA (almost free); and mini language models.

Lesson in there somewhere; eg ImageNet was saturated and compute-limited,—and like all academic fields not held to a cost-scaled real-world benchmark academic machine learning was completely fake.

Having perplexity (uncapped, transfers! and compute-scaled) has pushed quantization, optimizers, and samplers easily 10x in a quarter.

Next: open source MoE, retro, etc.

predicts NO

what

ML academia has done a ton of useful stuff before GPT. less useful than openai, sure, but cmon

Came here after seeing this and getting chills if this is correct

Boosting and subsidizing this market

bought Ṁ30 of NO

The question will resolve negatively if... [something you'd expect from title]... or if no credible source provides information about the optimization method used in the largest AI training run by compute announced in 2025.

I think the title is misleading because of this.

At a quick skim the GPT-4 paper doesn't seem to specify what optimizer they use although they cite adafactor--I think there is a sizeable prob no credible source provides info ab the optimization method used by the largest ai training run.

@NoaNabeshima If we don't know by the end of 2026, then it resolves N/A. I just added some words to make that clearer.

@NoaNabeshima *I changed it to 2028 now to provide even more time.

sold Ṁ17 of NO

@MatthewBarnett Maybe there's a mistake in the description or I'm confused?

The question will resolve negatively... if no credible source provides information about the optimization method used in the largest AI training run by compute announced in 2025, by the end of 2028.

and

If no information about the largest AI training run by compute announced in 2025 is available from credible sources by the end of 2028, the question will resolve as N/A.

don't seem mutually exclusive to me

predicts NO

@NoaNabeshima and I was mostly worried about a world where a 2025 model is announced, no optimizer info is released, and the market resolves No

More related questions