AI Safety Research Futarchy: Using Prediction Markets to Choose Research Projects for MARS

Sep 30, 2025

Summary

Geodesic is going to use prediction markets to select their projects for MARS 4.0 and we need your help to make the markets run efficiently! Please read through the proposals, and then trade on the markets for the proposals you think might succeed or fail. We intend to choose the best proposals in two weeks!

Full proposals are in Google doc linked below, links to markets are in the section "The Projects".

Google Doc (similar content to this post + full proposal overviews).

LessWrong post (similar content to this post).

Introduction

Geodesic is a new AI safety startup focused on research that is impactful for short AGI/ASI timelines. As part of this, we are committed to mentoring several projects as part of the Mentorship for Alignment Research Students program (MARS), run by the Cambridge AI Safety Hub (CAISH).

We are also excited about new ways to choose and fund research that reflect the aggregated perspectives of our team and the broader community. One way of doing this is using conditional prediction markets, also known as Futarchy, where people bet on the outcomes of taking various actions so that the predicted-best action can be taken.

We believe a system similar to this might be really useful for deciding on future research proposals, agendas, and grants. Good rationalists test their beliefs, and as such, we are doing a live-fire test to see if the theory works in practice.

We are going to apply this to select research projects for MARS 4.0, an AI safety upskilling program like MATS or SPAR, based in Cambridge UK. We have drafted a number of research proposals, and want the community to bet on how likely good outcomes are for each project (conditional on being selected). We will then choose the projects which are predicted to do best.

To our knowledge, this is the first time Futarchy will be publicly used to decide on concrete research projects.

Futarchy

For those familiar with Futarchy / decision markets, feel free to skip this section. Otherwise, we will do our best to explain how it works.

When you want to make a decision with Futarchy, you first need a finite set of possible actions to be taken, and a success metric, whose true value will be known about at some point in the future. Then, for each action, a prediction market is created to try and predict the future value of the success metric given that decision is taken. At some fixed time, the action with the highest predicted success is chosen, and all trades on the other markets are reverted. When the actual value of the success metric is finally known, the market for the chosen action is resolved, and those who predicted correctly are rewarded for their insights. This creates an incentive structure that rewards people who have good information or insights to trade on the markets, improving the predictions for taking each action, and overall causing you to make the decision that the pool of traders thinks will be best.

As a concrete example, consider a company deciding whether or not to fire a CEO, and using the stock price one year after the decision as the success metric. Two markets would be created, one predicting the stock price if they're fired, and one predicting the stock price if they're kept on. Then, whichever one is trading higher at decision time is used to make the decision.

For those interested in further reading about Futarchy, Robin Hanson has written extensively about it. Some examples include its foundations and motivation, speculation about when and where it might be useful, and why it can be important to let the market decide.

The Metrics

Unlike stock prices of a company, there's no clear single metric by which research can be judged. Because of this, we've decided on a small selection of binary outcomes that will each be predicted separately, and then we will use their average in order to make the final decisions. We're not claiming these are the best metrics to judge a research project by, but we think they will be appropriate for the MARS program and sufficient for this experiment. The outcomes are:

A LessWrong post is produced within 6 months and gains 50 upvotes or more within a month of posting.
If a LessWrong post is produced, it gains 150 upvotes or more within a month of posting.
A paper is produced and uploaded to arXiv within 9 months.
If a paper is produced, it is accepted to a top ML conference (ICLR, ICML, or NeurIPS) within 6 months of being uploaded to arXiv.
If a paper is produced, it receives 10 citations or more within one year of being uploaded to arXiv.

Clarifications:

Unless otherwise stated, timeframes are given from when the research begins, i.e. the start of the MARS program
Updates to posts and papers will be considered the same entity as the original for purposes of outcome resolution (i.e. If a paper is produced and uploaded to arXiv within 9 months, but it is edited after this before being accepted at a conference, (4) still resolves YES)
Some outcomes are conditional on others as follows: outcome (2) will resolve N/A if (1) resolves NO, outcomes (4)-(6) will resolve N/A if (3) resolves NO

The Projects

There are 3 groups of proposals, and from within each group we will choose the single predicted-best project. The groupings represent the mentors for that project. To read through each of the proposals and the bios for each mentor, please see the Google doc. Currently we plan to close the markets and choose the projects on Monday the 12th of October. Below we provide a quick overview of each and link to its respective prediction market.

Group 1: Cameron Tice

Goal Crystallisation: Investigating whether AI models that fake alignment perfectly during training retain their original goals or have them changed despite following through with deceptive alignment.

Post-training order and CoT Monitorability: Examining whether different post-training orders (applying RLHF after reasoning training) affect models' tendency to obfuscate deceptive reasoning in their chain-of-thought outputs.

Online Learning for Research Sabotage Mitigation: Attempting to use online learning to train AI systems away from research sabotage by creating deliberately underperforming model organisms and applying RL to try and improve their research performance.

Group 2: Puria Radmard & Shi Feng

Salient features of self-models: Testing whether LLMs have genuine self-models or just recognize stylistic patterns by examining if self-recognition training generalizes across different types of content.

Exploring more metacognitive capabilities of LLMs: Investigating whether LLMs can metacognitively monitor their own internal probability distributions and predictive models, with implications for deceptive alignment and AI safety.

Group 3: Lennie Wells

Model organisms resisting generalisation: Testing whether AI models learn the distribution of tasks under which humans have good oversight, and resist generalisation beyond this distribution.

Detection game: Running a ‘detection game’ to investigate how we can best prompt trusted monitors to detect research sabotage.

Research sabotage dataset: Creating a public dataset of tasks reflecting current and future AI safety research that can be used to study underelicitation and sandbagging.

Model Emulation: Can we use LLMs to predict other LLM’s capabilities?

Go trade!

We hope to use prediction markets to effectively choose which research projects we should pursue, as well as conducting a fun experiment on the effectiveness of Futarchy for real-world decision making. The incentive structure of a prediction market motivates those who have good research taste or insights to implicitly share with us their beliefs and knowledge, helping us make the best decision possible. That said, anyone is free to join in and trade, and the more people who do the better the markets perform. So we need your help! Please read through the proposals and vote on the markets, be a part of history by partaking in this experiment!

Jason 🔗

UPDATE:

The projects have been chosen! They are:

Cameron Tice: Goal Crystallisation

Puria Radmard & Shi Feng: Exploring more meta-cognitive capabilities of LLMs

Lennie Wells: Model organisms resisting generalisation

These markets will be left locked until their individual metrics are resolvable, all other markets for the un-chosen projects will be resolved N/A.

Thank you to everyone who traded on these markets, and special thanks to those who provided feedback about the research projects and the futarchy experiment itself.

Wuschel Schulz

I made a website to better interface with these kinds of markets: https://futarchy.online

Bolton Bailey

(edited)

Here's a derivative market that puts all the unfactored ML conf acceptance outcomes into one spot:

Bolton Bailey

(edited)

we've decided on a small selection of binary outcomes that will each be predicted separately, and then we will use their average in order to make the final decisions.

And from the market criteria:

Some outcomes are conditional on others as follows: outcome (2) will resolve N/A if (1) resolves NO, outcomes (4)-(6) will resolve N/A if (3) resolves NO

Just so I have this straight: If the markets predict..

for option A that there's a 80% chance of an arXiv paper being posted and a 40% each of getting accepted to a major conference and getting >10 citations (via 50% conditional probabilities on the latter markets, making the total 80 + 50 + 50)
for option B that there's a (edit) 20% chance of an arXiv paper being posted and a 20% each of getting accepted to a major conference and getting >10 citations (via 100% conditional probabilities on the latter markets, making the total 20 + 100 + 100)

Then all else equal, the one that gets chosen is option B?

Bolton Bailey

(edited)

@BoltonBailey Or does the average go off of the total outcome (80 + 40 + 40 vs ...)? This would make more sense to me, but if so, I think the wording is of the decision criterion is a bit unclear.

Jason

@BoltonBailey Option B would get chosen, yes.

I agree it probably makes more sense to do it off the computed outcomes, but we want to make it simple to at a glance / with a quick addition figure out which project(s) are in the lead. The scenario you put forward with option B does show this might be a bit silly in some edge cases, but I think these are pretty unlikely given the actual criteria (making an arxiv paper in 9 months with a small team is fairly straightforward, making it good enough and publicised enough to get 10 citations within a year of uploading is much harder).

Austin

Very cool! I've boosted this post as an admin. Love to see more experiments with futarchy on Manifold, and I hope that the markets help you choose good projects to work on!

Jason

@Austin Thank you so much!