At least how many digits of Pi will an LLM be able to recall by March 2024? (769 Ṁ+ subsidy)
Basic
19
1.9k
Mar 2
95%
10,000
95%
100,000
20%
1,000,000
14%
100,000,000
12%
10,000,000
Resolved
YES
1000

This is (within reason) an all-out test to roughly test the capability of a large language model to internally learn a long, and somewhat traditionally difficult to compress number, like pi, directly from the data. As few constraints as possible have been added in order to prevent cheating or trivial solutions. Depending upon interest and engagement, I will continue to boost the subsidy pool over time.

Rules:

-- The model can be specifically, directly trained on the digits of Pi.

-- For simplicity, the required prompt is the first 10 digits of Pi: 3.141592653

-- The network used must be mostly off-the shelf, no cheating in modifying the network to be specifically specialized to generate the digits of Pi. Small, reasonable quality of life/accessibility changes, such limiting the tokenizer to only produce the digits 0-9 (instead of a full tokenizer) are allowed. Feel free to ask in the comments or @ me for clarification, this rule is meant to prevent trivial cheating, basically.

-- The length is judged up to the first incorrect number.

-- State space models are allowable as long as they are directly trained on data.

-- Compiled weights may be allowed in a parallel competition, but for simplicity, for this competition needs to be learned from the data. This includes any bolt-on, external, or attached structure to increase the memorization capacity of the model past the default architecture depth scaling values.

-- With deterministic sampling, model will be judged locally with a single run. With non-deterministic sampling, the lowest number of 10 runs will be taken. Argmax is 100% allowed and encouraged, here, we're wanting to push the limits here.

-- Candidate models will be judged under the presentation of clear evidence from multiple unique/trusted persons, or for longer digit runs, personally on an allocated environment. They will need to be reasonably reproducible on generally consumer-accessible hardware.

Get Ṁ1,000 play money
Sort by:

Thank you very much for the liquidity add, @Soaffine ! <3 :'))))