A pure binary neural net is a neural network represented as pure combinatorial logic. Naively unrolling multi bit floating point/integer multiplication to binary does not count, the weights and activations must be binary. I will arbitrarily declare that integer weights of 3 bits or fewer are permitted to be unrolled. But note that the whole model end to end must be reduced to logic gates.
For example [Unrolling Ternary Neural Networks](https://arxiv.org/abs/1909.04509) almost satisfies the definition but uses patches and hence does not quite count. (Also I'm interested in language models not image models.)
It does not matter how the model was trained, only that it has adequate accuracy when in binarized form.
Resolves YES if a pure binary language model with bits per byte accuracy on The Pile better than or equal to GPT-2 (1.225 BPB) exits. It does not need to be publicly accessible as long as it is reported by a credible source (Deepmind, OpenAI, ElutherAI, etc).
Resolve NO if there is no credible report of such a model.
We have less than a year left. I have sold my stake in this market and will not bet further on it in case it ends up being subjective.
My personal efforts in this space have not been as successful as I had hoped. Personally, I think the market is ~well priced? I will be interested to see how this resolves.
Trit weights https://arxiv.org/abs/2402.17764
Still 8 bit activations, so does not qualify, but a sparse weights matrix should compact down much nicer.
We are down to 1 bit weights. However looks like the activations are still 8 bit, so it does not quite qualify. (But I'm guessing that they could be unrolled with moderate effort.)
See also, a more strict version: https://manifold.markets/Amaryllis/a-pure-combinational-logic-byte-lev
Looks like we will be getting 3 bit quantized LLaMA soon:
- https://arxiv.org/abs/2210.17323
- https://news.ycombinator.com/item?id=35107058
Now all that remains to resolve this market is to somehow quantize the softmaxes, and then unroll the whole thing to combinational logic.
Ultra-low Precision Multiplication-free Training for Deep Neural Networks: https://arxiv.org/abs/2302.14458
1 sign bit, 4 exponent bits. Looks like it works on transformer language models. I am unclear on how they handle the softmaxes however. To resolve this market, the softmaxes would need to be fully transformed to combinational logic.
https://arxiv.org/abs/2212.09720
We are beginning to get down to 4 bit weights.
However note that even if the weights were 3 bit, the model would need to be fully reduced to combinational logic, including any softmaxes etc, to resolve YES.