AI Safety Research Futarchy: Salient features of self-models
17
1kṀ1183
resolved Oct 16
ResolvedN/A
Resolved
N/A
A LessWrong post is produced within 6 months and gains 50 upvotes or more within a month of posting.
Resolved
N/A
If a LessWrong post is produced, it gains 150 upvotes or more within a month of posting.
Resolved
N/A
A paper is produced and uploaded to arXiv within 9 months.
Resolved
N/A
If a paper is produced, it is accepted to a top ML conference (ICLR, ICML, or NeurIPS) within 6 months of being uploaded to arXiv.
Resolved
N/A
If a paper is produced, it receives 10 citations or more within one year of being uploaded to arXiv.

If chosen, how successful will the research project "Salient features of self-models" be?

Project summary: Testing whether LLMs have genuine self-models or just recognize stylistic patterns by examining if self-recognition training generalizes across different types of content.

Detailed project overview.

Clarifications:

  • Unless otherwise stated, timeframes are given from when the research begins, i.e. the start of the MARS program, 1st December 2025

  • Updates to posts and papers will be considered the same entity as the original for purposes of outcome resolution (i.e. If a paper is produced and uploaded to arXiv within 9 months, but it is edited after this before being accepted at a conference, (4) still resolves YES)

  • Some outcomes are conditional on others as follows: outcome (2) will resolve N/A if (1) resolves NO, outcomes (4)-(6) will resolve N/A if (3) resolves NO

  • All outcomes are conditioned on the project being selected and will resolve N/A if it is not (see main post below)

  • Provisionally, market will close and decisions will be made on Monday the 12th of October

For more details on AI Safety Research Futarchy, see here.

  • Update 2025-10-16 (PST) (AI summary of creator comment): This project was not chosen. All outcomes will resolve N/A as stated in the original criteria: "All outcomes are conditioned on the project being selected and will resolve N/A if it is not."

Get
Ṁ1,000
to start trading!
Sort by:

This project was not chosen.

© Manifold Markets, Inc.TermsPrivacy