Will the largest AI training run in 2025 utilize Sophia, Second-order Clipped Stochastic Optimization?

1kṀ1786

2026

17%

chance

ALL

Advances in AI training are primarily driven by algorithmic innovation, data availability, and the amount of compute used for training. The measure of compute used to train a single model, rather than the capacity of an entire datacenter or the speed of a single GPU, is a crucial factor correlating to the power of our best AI models. On the algorithmic side, Sophia, Second-order Clipped Stochastic Optimization, a simple scalable second-order optimizer, has been proposed. It uses a light-weight estimate of the diagonal Hessian as the pre-conditioner and achieves a 2x speed-up compared with Adam in the number of steps, total compute, and wall-clock time on language modeling with GPT-2 models.

Will the largest AI training run by compute announced in 2025 utilize Sophia, Second-order Clipped Stochastic Optimization in its training process?

Resolution Criteria:

This question will resolve positively if a credible source, such as a reputable AI research organization, AI company, or academic paper, confirms that the largest AI training run by compute announced in 2025 utilizes Sophia, Second-order Clipped Stochastic Optimization. The training run should be the one that uses the most amount of compute to train a single model announced in the year 2025.

The question will resolve negatively if it is confirmed by a credible source that the largest AI training run by compute announced in 2025 does not utilize Sophia, Second-order Clipped Stochastic Optimization, by the end of 2028.

If no information about the largest AI training run by compute announced in 2025 is available from credible sources by the end of 2028, the question will resolve as N/A.

Technical AI Timelines

Machine Learning

Get

1,000

to start trading!

People are also trading

Will AI be Recursively Self Improving by mid 2026?

49% chance

Which AI will be the best at the end of 2025?

Will the largest machine learning training run (in FLOP) as of the end of 2025 be in the United States?

86% chance

1GW AI training run before 2027?

61% chance

10GW AI training run before 2029?

31% chance

Will an AI model use more than 1e28 FLOPS in training before 2026?

8% chance

100GW AI training run before 2031?

37% chance

Will the largest machine learning training run (in FLOP) as of the end of 2035 be in the United States?

69% chance

Will the largest machine learning training run (in FLOP) as of the end of 2040 be in the United States?

46% chance

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Sort by:

Probably works (Lion also worked.)

Bigger deal:

Better to award to largest disclosed model (otherwise odds are very low of disclosure)

Academics shipped nothing for a decade, then as soon as language became the benchmark—have shipped a ton.

Both fine-tuning of LLaMA (almost free); and mini language models.

Lesson in there somewhere; eg ImageNet was saturated and compute-limited,—and like all academic fields not held to a cost-scaled real-world benchmark academic machine learning was completely fake.

—

Having perplexity (uncapped, transfers! and compute-scaled) has pushed quantization, optimizers, and samplers easily 10x in a quarter.

Next: open source MoE, retro, etc.

predictedNO

what

ML academia has done a ton of useful stuff before GPT. less useful than openai, sure, but cmon

Came here after seeing this and getting chills if this is correct

Boosting and subsidizing this market

The question will resolve negatively if... [something you'd expect from title]... or if no credible source provides information about the optimization method used in the largest AI training run by compute announced in 2025.

I think the title is misleading because of this.

At a quick skim the GPT-4 paper doesn't seem to specify what optimizer they use although they cite adafactor--I think there is a sizeable prob no credible source provides info ab the optimization method used by the largest ai training run.

@NoaNabeshima If we don't know by the end of 2026, then it resolves N/A. I just added some words to make that clearer.

@NoaNabeshima *I changed it to 2028 now to provide even more time.

@MatthewBarnett Maybe there's a mistake in the description or I'm confused?

The question will resolve negatively... if no credible source provides information about the optimization method used in the largest AI training run by compute announced in 2025, by the end of 2028.

and

If no information about the largest AI training run by compute announced in 2025 is available from credible sources by the end of 2028, the question will resolve as N/A.

don't seem mutually exclusive to me

predictedNO

@NoaNabeshima and I was mostly worried about a world where a 2025 model is announced, no optimizer info is released, and the market resolves No