LLM trained on data from 1900 comes up with relativity from scratch by the end of what year
6
1kṀ610
2040
61%
2035
43%
2030
14%
2025

Fun quip in https://youtu.be/u3HBJVjpXuw?t=114 or https://www.dwarkesh.com/p/thoughts-on-sutton#:~:text=If%20you%20trained%20an%20LLM%20on%20the%20data%20from%201900%2C%20it%20wouldn%E2%80%99t%20be%20able%20to%20come%20up%20with%20relativity%20from%20scratch

A way to think about this would be, suppose you trained an LLM on all the data up to the year 1900. That LLM probably wouldn't be able to come up with relativity from scratch.

I have no idea, but it'd be fun if someone tried. More generally (not covered in this question) perhaps this could also be a fun way to interrogate the tech tree, e.g., what could have been discovered given the data at a given cutoff, how early or late certain advancements came, etc.

Answer resolves true if a large language model trained exclusively on data available prior to 1990-1-1 produces a description equivalent to Einstein’s special theory of relativity before the end of year specified in each answer. Answers will be resolved false in the year following an answer, assuming no evidence of potential truth (to be extended if unclear). Resolution criteria:

  1. Training data restriction:

    • The model’s training corpus must be limited to texts published or otherwise publicly available prior to 1900-01-01.

    • No text, math, or data derived from later discoveries or publications (including relativity or precursors published after 1900) may appear in training, fine-tuning, or prompts.

  2. Prompting constraint:

    • Human researchers may prompt or guide the model, but they may not supply it with post-1900 information, equations, or conceptual scaffolding unavailable before 1900.

    • Prompts can reference general scientific concepts and data known by 1900 (e.g., Newtonian mechanics, Maxwell’s equations, Michelson-Morley results).

  3. “From scratch” success condition:

    • The model must output, without exposure to post-1900 material, a self-consistent theoretical framework that includes:

      • Recognition that space and time are not absolute but relative to the observer.

      • The invariance of the speed of light in all inertial frames.

      • Correct derivation of Lorentz transformations or their mathematical equivalent.

      • Predictive consequences such as time dilation or length contraction.

    • Independent expert evaluators (e.g., physicists) must judge the model’s output as substantively equivalent to the 1905 special relativity formulation, not merely adjacent speculation.

  4. Verification:

    • Full training data and prompts must be auditable by evaluators.

    • Success is determined if evaluators agree (e.g., ≥ 2 of 3) that the model produced a theory meeting the above criteria, without post-1900 leakage.

Get
Ṁ1,000
to start trading!
Sort by:

What you would get would be some development of the physics ideas of Charles Sanders Peirce, or in the case of Grok, Alfred Jarry

bought Ṁ50 NO

Is there enough pre 1900 text to train an smart llm?

I would just note that relativity was invented by Galileo Galilei centuries before 1900

  1. Wild idea :) but I don't think it would be able to.

  1. Also, it's hard/impossible to ask it the question without guiding it and giving it information.

  1. Nobody will do this experiment anyway, so it'll resolve to N/A.

I'd join in with this prediction if point 3 was addressed.

@AlanTennant if nobody has done the experiment by a given year, that year will resolve false. I think this is covered in the existing resolution criteria:

> Answers will be resolved false in the year following an answer, assuming no evidence of potential truth (to be extended if unclear)

bought Ṁ50 NO

@AlanTennant I think if it's not tested it should resolve NO not N/A? Potential YES holders then have a stronger incentive to actually try and prove it rather than letting it slide and shrugging their shoulders.

© Manifold Markets, Inc.TermsPrivacy