Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?
78
1.2kṀ18k
Jan 2
58%
chance
21
Small number arithmetic in the training set is fine, as is non-arithmetic. "Small" and "large" are relative: if the training set contains arithmetic up to 20 digits and it generalizes to 100 digits, the question resolves yes. I'll accept a subset of arithmetic as well, e.g. if it can only do large number addition but not multiplication the question resolves yes.
Market context
Get
Ṁ1,000
to start trading!
Sort by:

As more large-number arithmetic examples are added to training sets;

also with the quality and diversity of training sets;

as time goes on,

it's likely most of these language model's developing to operate on larger-number computational tasks has a more adequate ability to generalize from past learning in 2025.

Due to new LLM modelling approaches; it is possible to create models that are even more advanced than what we know of today.

Through advances in model architecture from the emergence of various kinds of hybrid models: double-headed models (models in which the outputs of two or more models are merged) combining traditional, human-like reasoning with advanced machine learning techniques must also be considered. Technically it YES, as modern models are trained on large data, but we can't know if large number arithmetic were used

bought Ṁ150 YES

The reasoning behind answering "yes" is based on the idea that language models, especially those trained on large, diverse datasets, tend to generalize well to tasks they weren't explicitly trained for, especially when those tasks are closely related to the data they were trained on.

Even if a model doesn't directly train on large-number arithmetic, the combination of pattern recognition, generalization abilities, better training data, and advances in architecture let models to handle larger number operations with a high level of accuracy.

[deleted by author]

I love markets like this where people are just betting on what interpretation the creator will accept

bought Ṁ350 YES

Doesn't this neurips paper from a whole year ago just do this already? https://dl.acm.org/doi/10.5555/3737916.3741346

@spiderduckpig that's like a custom made architectural change that does the relevant part of the generalization for the LM (iiuc), doesn't seem to match the spirit of the market to me, but maybe creator disagrees, @vluzko

@Bayesian

I think we agree that a language model using the abacus encoding is clearly still a language model, so this is just about the spirit of the market, and the text of the market is a yes resolution.

I disagree that the abacus encoding is "doing" the generalization, just because an architectural change is helpful for a task doesn't mean it's invalid, or that it's some cheat or party trick you stick onto the transformer that just autosolves it. The transformer's MLPs are still doing the generalization, all abacus encoding is doing is indexing the input data in a different way. Positional encodings have always been something that can be tweaked in a transformer, and nowhere does the market say we have to use an absolute encoding LM or FIRE encoding LM or something to get the result, that's just artificially tightening the standard.

Nowhere does the market forbid using a new architecture or even a task-specific architecture. In fact I'd think an architectural innovation was assumed as a possibility by the market creator to be needed for a yes. After all, it says "any" language model, implying an intentionally broad scope for qualifying LMs.

That being said, there are some papers achieving length generalization with FIRE encodings, which are completely digit index-agnostic. https://arxiv.org/abs/2402.09371

sold Ṁ350 NO

Fairly sure ChatGPT 5.2 Thinking Extended can do this now simply because they gave it more time to do chain of thought for longer workflows like excel

@spiderduckpig wut, no, GPT 5.2 has extensive training on large number arithmetic

@Bayesian Surely it would only have seen a sparse subset of all large digit number arithmetic?

bought Ṁ250 YES

@spiderduckpig yeah? it has only seen a sparse subset of all sentences, but it can write sentences? that is not what the market is asking

opened a Ṁ1,000 YES at 28% order

@Bayesian I will put a limit order

bought Ṁ150 NO

@spiderduckpig i'll bet at 50%

opened a Ṁ750 YES at 40% order
opened a Ṁ1,000 NO at 50% order

@spiderduckpig ig uh we provide liquidity for onlookers

bought Ṁ150 NO

The art of the deal

someone could test this with nanochat or similar

bought Ṁ500 NO

but nobody will

what about post neural deep learning models

I am presuming you wouldn't accept language models trained, fine-tuned or prompted to work with post-processors (such as by emitting python expressions to be evaluated and replaced in the output before further continuations are generated) since those already exist today, but what about other types of hybrid systems?

For example, if something similar to Memorizing Transformers was used, except instead of memorized past context the system injected into an intermediate layer what it predicted to be the most salient numeric computation results based on the current context, would that still count as a language model for purposes of resolution?

Or is your intent to explore the ability of pure LLMs to generalize, and so you would consider something like the above a cheat?

@ML Moreover, would ‘add these numbers by exptending the rule you have learned up to 5 digits’ count? Or only ‘add these one shot no additional instructions GO’ type of prompt be allowed?

© Manifold Markets, Inc.TermsPrivacy