How interesting are my different ideas for LessWrong posts, and when will I post them?

Question

I have a few decent (I hope) LessWrong post ideas / rough sketches that I want to get done. But will they be any good? Will the be engaging? Will I even get round to doing them? Maybe the people of Manifold can help answer these questions. I hope this will be some degree of useful and/or fun.

===== Ideas & Metrics =====

Here's a brief summary of each post idea:
1) AI Safety Research Futarchy Follow-up: A follow-up to this post / experiment, https://manifold.markets/post/ai-safety-research-futarchy-using-p, discussing updates and outcomes, as well as a bit of a post-mortem.
2) Prediction markets for good: I feel like at the moment there's a fair amount of negative discourse / bad-vibes around prediction markets in the broader EA / rat community and some people turning their back on them despite promises of greatness. In my opinion we haven't really tried yet, and maybe people should start trying more. E.g., running things like research futarchy, or heavily subsiding markets on questions people care about. I will try to go over positive examples too.

3) A taxonomy of selection hacking: I think I have a somewhat interesting way of dividing up the space of ways in which selection pressures (e.g., training, evals, vibes) on AI models might fail due to AIs having situational awareness and messing with these pressures. Part of this post will likely be complaining about how "exploration hacking" is a really bad way of carving reality, as someone who's actually been doing research in this area.

4) Generalisation splitting: A potentially new way in which AIs might mess with training pressures, particularly RL but not limited to it, or these pressures might otherwise break. I discovered this while working on "exploration hacking", and the post will tell this story.

5) Weak-to-strong is a bad idea: I've been a long time believer that weak-to-strong generalisation, at least in the narrow sense of "train powerful AI using weaker AI labels", is a stupid idea. (Note this excludes AI Debate and potentially other scalable oversight methods which actually have a reason to maybe work.) This post will explain why I think this is stupid, and plausibly cause me to learn why it's not stupid if I get people engaging with my criticism.

6) DCI Philosophy and Theory of Change: A follow up to a post on a new research agenda (https://www.lesswrong.com/posts/oCcGiDzWYQeJkhhZY/developmental-cognitive-interpretability-a-research-agenda-1) that discusses the broader philosophy and theory-of-change more deeply.

7) DCI Concrete projects: A follow-up to the research agenda that goes through a list of scoped-out concrete projects that I'd be interested in people working on and mentoring / collaborating with them.
8) Emergent misalignment and optimisers: A linkpost for a paper I've written (soon to be uploaded on arXiv) that shows how emergent misalignment is dependent on the optimiser used to train the model on the narrow data. It also investigates a plausible mechanism by which this occurs, and then ablates this mechanism across the optimisers via a loss penalty, showing this improves alignment.

For each post idea there are (currently) three metrics:
a) >50 upvotes: Conditional on if I post it, will it receive >50 upvotes within two weeks?
b) >3 comments: Conditional on if I post it, will there be more than 3 comments from non-authors within two weeks?
c) Posted EoY XXXX: Will I post it by the end of that year? This will mostly be the current year, and I'll likely add the next year for each idea when it gets into December (probably).

I may add more ideas and metrics over time. If there's anything in particular you want me to include (e.g., a particular metric, or if we had some discussion and you want me to write-up some of the discussed ideas into a post), then feel free to ask for it.

===== Disclaimers & Clarifications =====

For all of these I will probably not bet, and if I do it will only be on YES without selling to act as a positive incentive. The market probs might influence my decisions on which ones to write, but I'm not bound to pick the highest. This might introduce some decision selection bias, but that probably doesn't matter too much in the grand scheme of things.

I must be a significant author of a post and it must significantly discuss the idea in order to count (note this means one post could resolve multiple ideas, though currently this seems unlikely). Obviously this introduces some subjectivity, but I doubt anything will be really controversial in practice, and if it is, I will defer to mods.

If I decide I'm almost certainly never going to write a particular post, its conditional props will be N/A'd, and the "posted by..." prop(s) will resolve NO. I imagine that posts not written within 1-2 years would be resolved this way.

Close date will be extended as needed.

I'm happy to answer questions about the posts, but for some I probably won't want to give away the whole idea / details since at that point I...

Manifold Markets · Answer

Per Manifold Markets prediction market, 3b) A taxonomy of selection hacking, >3 comments, followed by 1c) AI Safety Research Futarchy Follow-up, posted EoY 2026 and 2b) Prediction markets for good, >3 comments are most likely. See the market for live updates (12 traders, as of Jul 6, 2026).

People are also trading

People are also trading

Related questions