If chosen, how successful will the research project "Post-training order and CoT Monitorability" be?
Project summary: Examining whether different post-training orders (applying RLHF after reasoning training) affect models' tendency to obfuscate deceptive reasoning in their chain-of-thought outputs.
Clarifications:
Unless otherwise stated, timeframes are given from when the research begins, i.e. the start of the MARS program, 1st December 2025
Updates to posts and papers will be considered the same entity as the original for purposes of outcome resolution (i.e. If a paper is produced and uploaded to arXiv within 9 months, but it is edited after this before being accepted at a conference, (4) still resolves YES)
Some outcomes are conditional on others as follows: outcome (2) will resolve N/A if (1) resolves NO, outcomes (4)-(6) will resolve N/A if (3) resolves NO
All outcomes are conditioned on the project being selected and will resolve N/A if it is not (see main post below)
Provisionally, market will close and decisions will be made on Monday the 12th of October
For more details on AI Safety Research Futarchy, see here.