A pure binary LLM will exist by end of 2024

1kṀ3687

resolved Jan 1

Resolved

ALL

A pure binary neural net is a neural network represented as pure combinatorial logic. Naively unrolling multi bit floating point/integer multiplication to binary does not count, the weights and activations must be binary. I will arbitrarily declare that integer weights of 3 bits or fewer are permitted to be unrolled. But note that the whole model end to end must be reduced to logic gates.

For example [Unrolling Ternary Neural Networks](https://arxiv.org/abs/1909.04509) almost satisfies the definition but uses patches and hence does not quite count. (Also I'm interested in language models not image models.)

It does not matter how the model was trained, only that it has adequate accuracy when in binarized form.

Resolves YES if a pure binary language model with bits per byte accuracy on The Pile better than or equal to GPT-2 (1.225 BPB) exits. It does not need to be publicly accessible as long as it is reported by a credible source (Deepmind, OpenAI, ElutherAI, etc).

Resolve NO if there is no credible report of such a model.

Technical AI Timelines

Get

1,000

to start trading!

🏅 Top traders

#	Name	Total profit
1		Ṁ616
2		Ṁ278
3		Ṁ159
4		Ṁ53
5		Ṁ46

People are also trading

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

27% chance

When will the first quadrillion parameter LLM be made?

Will LLMs become a ubiquitous part of everyday life by June 2026?

82% chance

Thinking Machines releases an LLM by EOY 2025?

33% chance

Will any LLM be able to multiply together arbitrary decimal numbers by the end of 2027?

68% chance

Will RL work for LLMs "spill over" to the rest of RL by 2026?

35% chance

What will Manifolders mostly use LLMs for, by EOY 2025?

Will LLMs mostly overcome the Reversal Curse by the end of 2025?

59% chance

There will be one LLM/AI that is at least 10x better than all others in 2027

17% chance

Will there be a state-of-the-art LLM that is NOT based on next raw token prediction before 2029?

Sort by:

We have less than a year left. I have sold my stake in this market and will not bet further on it in case it ends up being subjective.

My personal efforts in this space have not been as successful as I had hoped. Personally, I think the market is ~well priced? I will be interested to see how this resolves.

Trit weights https://arxiv.org/abs/2402.17764

Still 8 bit activations, so does not qualify, but a sparse weights matrix should compact down much nicer.

We are down to 1 bit weights. However looks like the activations are still 8 bit, so it does not quite qualify. (But I'm guessing that they could be unrolled with moderate effort.)

https://arxiv.org/abs/2310.11453

BitNet: Scaling 1-bit Transformers for Large Language Models

The increasing size of large language models has posed challenges for deployment and raised concerns about environmental impact due to high energy consumption. In this work, we introduce BitNet, a scalable and stable 1-bit Transformer architecture designed for large language models. Specifically, we introduce BitLinear as a drop-in replacement of the nn.Linear layer in order to train 1-bit weights from scratch. Experimental results on language modeling show that BitNet achieves competitive performance while substantially reducing memory footprint and energy consumption, compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines. Furthermore, BitNet exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models while maintaining efficiency and performance benefits.

predictedYES

See also, a more strict version: https://manifold.markets/Amaryllis/a-pure-combinational-logic-byte-lev

predictedYES

Looks like we will be getting 3 bit quantized LLaMA soon:
- https://arxiv.org/abs/2210.17323
- https://news.ycombinator.com/item?id=35107058

Now all that remains to resolve this market is to somehow quantize the softmaxes, and then unroll the whole thing to combinational logic.

predictedYES

Ultra-low Precision Multiplication-free Training for Deep Neural Networks: https://arxiv.org/abs/2302.14458

1 sign bit, 4 exponent bits. Looks like it works on transformer language models. I am unclear on how they handle the softmaxes however. To resolve this market, the softmaxes would need to be fully transformed to combinational logic.

predictedYES

https://arxiv.org/abs/2212.09720

We are beginning to get down to 4 bit weights.

However note that even if the weights were 3 bit, the model would need to be fully reduced to combinational logic, including any softmaxes etc, to resolve YES.