Skip to main content
MANIFOLD
How interesting are my different ideas for LessWrong posts, and when will I post them?
4
แน€1.8kแน€356
Dec 31
50%
1a) AI Safety Research Futarchy Follow-up, >50 upvotes
50%
1b) AI Safety Research Futarchy Follow-up, >3 comments
66%
1c) AI Safety Research Futarchy Follow-up, posted EoY 2026
50%
2a) Prediction markets for good, >50 upvotes
50%
2b) Prediction markets for good, >3 comments
50%
2c) Prediction markets for good, posted EoY 2026
69%
3a) A taxonomy of selection hacking, >50 upvotes
80%
3b) A taxonomy of selection hacking, >3 comments
50%
3c) A taxonomy of selection hacking, posted EoY 2026
50%
4a) Generalisation splitting, >50 upvotes
50%
4b) Generalisation splitting, >3 comments
50%
4c) Generalisation splitting, posted EoY 2026
50%
5a) Weak-to-strong is a bad idea, >50 upvotes
80%
5b) Weak-to-strong is a bad idea, >3 comments
50%
5c) Weak-to-strong is a bad idea, posted EoY 2026
50%
6a) DCI Philosophy and Theory of Change, >50 upvotes
50%
6b) DCI Philosophy and Theory of Change, >3 comments
50%
6c) DCI Philosophy and Theory of Change, posted EoY 2026

I have a few decent (I hope) LessWrong post ideas / rough sketches that I want to get done. But will they be any good? Will the be engaging? Will I even get round to doing them? Maybe the people of Manifold can help answer these questions. I hope this will be some degree of useful and/or fun.


===== Ideas & Metrics =====

Here's a brief summary of each post idea:
1) AI Safety Research Futarchy Follow-up: A follow-up to this post / experiment, https://manifold.markets/post/ai-safety-research-futarchy-using-p, discussing updates and outcomes, as well as a bit of a post-mortem.
2) Prediction markets for good: I feel like at the moment there's a fair amount of negative discourse / bad-vibes around prediction markets in the broader EA / rat community and some people turning their back on them despite promises of greatness. In my opinion we haven't really tried yet, and maybe people should start trying more. E.g., running things like research futarchy, or heavily subsiding markets on questions people care about. I will try to go over positive examples too.

3) A taxonomy of selection hacking: I think I have a somewhat interesting way of dividing up the space of ways in which selection pressures (e.g., training, evals, vibes) on AI models might fail due to AIs having situational awareness and messing with these pressures. Part of this post will likely be complaining about how "exploration hacking" is a really bad way of carving reality, as someone who's actually been doing research in this area.

4) Generalisation splitting: A potentially new way in which AIs might mess with training pressures, particularly RL but not limited to it, or these pressures might otherwise break. I discovered this while working on "exploration hacking", and the post will tell this story.

5) Weak-to-strong is a bad idea: I've been a long time believer that weak-to-strong generalisation, at least in the narrow sense of "train powerful AI using weaker AI labels", is a stupid idea. (Note this excludes AI Debate and potentially other scalable oversight methods which actually have a reason to maybe work.) This post will explain why I think this is stupid, and plausibly cause me to learn why it's not stupid if I get people engaging with my criticism.

6) DCI Philosophy and Theory of Change: A follow up to a post on a new research agenda (https://www.lesswrong.com/posts/oCcGiDzWYQeJkhhZY/developmental-cognitive-interpretability-a-research-agenda-1) that discusses the broader philosophy and theory-of-change more deeply.

For each post idea there are (currently) three metrics:
a) >50 upvotes: Conditional on if I post it, will it receive >50 upvotes within two weeks?
b) >3 comments: Conditional on if I post it, will there be more than 3 comments from non-authors within two weeks?
c) Posted EoY XXXX: Will I post it by the end of that year? This will mostly be the current year, and I'll likely add the next year for each idea when it gets into December (probably).

I may add more ideas and metrics over time. If there's anything in particular you want me to include (e.g., a particular metric, or if we had some discussion and you want me to write-up some of the discussed ideas into a post), then feel free to ask for it.


===== Disclaimers & Clarifications =====

For all of these I will probably not bet, and if I do it will only be on YES without selling to act as a positive incentive. The market probs might influence my decisions on which ones to write, but I'm not bound to pick the highest. This might introduce some decision selection bias, but that probably doesn't matter too much in the grand scheme of things.


I must be a significant author of a post and it must significantly discuss the idea in order to count (note this means one post could resolve multiple ideas, though currently this seems unlikely). Obviously this introduces some subjectivity, but I doubt anything will be really controversial in practice, and if it is, I will defer to mods.

If I decide I'm almost certainly never going to write a particular post, its conditional props will be N/A'd, and the "posted by..." prop(s) will resolve NO. I imagine that posts not written within 1-2 years would be resolved this way.

Close date will be extended as needed.

I'm happy to answer questions about the posts, but for some I probably won't want to give away the whole idea / details since at that point I may as well just write the post.

Get
แน€1,000
to start trading!
Sort by:

In the spirit of not wanting to bet but also wanting calibrated estimates, here are some Claude-investigated base-rates people might be interested in knowing (and perhaps using for profit):