Skip to main content
MANIFOLD
Prop bets for my new research focus/agenda, Developmental Cognitive Interpretability
4
á¹€1ká¹€175
2028
75%
8) At least one paper from the agenda gets published at a top conference or has a companion LessWrong post with >100 upvotes within 1 year
62%
2) I reflectively endorse focussing on it 1 year later
60%
7) At least one paper from the agenda gets published at a top conference or has a companion LessWrong post with >100 upvotes within 6 months
54%
3) I consider the work produced useful 6 months later
50%
1) I reflectively endorse focussing on it 6 months later
50%
4) I consider the work produced useful 1 year later
50%
5) A prominent figure in AI safety publicly endorses the agenda or some research produced by it within 6 months
50%
6) A prominent figure in AI safety publicly endorses the agenda or some research produced by it within 1 year
50%
9) At least three papers from the agenda get published at top conferences or have companion LessWrong posts with >100 upvotes within 2 years

I have a new LessWrong post with a collaborator discussing a new AI Safety research agenda, Developmental Cognitive Interpretability, which is aimed at trying to predict how AI systems will generalise from their training to deployment.

But will it be any good? Time to see what Manifold thinks!

I will not trade on props with subjective / vibes-based resolution criteria (specifically 1-4), and I will only bet YES without selling on the other ones in which there might be incentive issues otherwise (specifically: 5-9). In the event that we end up in a more-subjective-than-expected grey area for these questions, I will defer resolution to the mods.

I might add new props I'm interested in and I'm open to suggestions on this.

Specific resolution criteria clarifications:
1-4) Will be evaluated according to my subjective feeling at the end of their time horizon.
5-9) Will resolve YES on event occurrence, and NO if the time limit for them runs out.
1, 2) Resolves YES if I think that focussing on this agenda was worthwhile. Resolves NO if not. Note that if I pivot away because I thought another research direction was more worthwhile, this still resolves to my future perspective about whether knowing how things played out, it still seems like the time I did spend on the agenda was worth it (E.g., we do one more project that ends up being impactful, but then I pivot to something else).
3, 4) Resolves YES if I think the agenda essentially bore fruit. This excludes the first paper that's already been released in pre-print. Note this is somewhat independent of 1: I can imagine worlds where I endorse having worked on it, but there was no useful work produced (e.g., nothing gets published within a year, it helps me indirectly by up-skilling, it gives me a new perspective from which I pivot to something else, etc.) and worlds where there's useful work but I don't endorse having worked on it (e.g., there was an obvious-in-hindsight other thing I should've been working on that if I'd worked on would've been more impactful).
4, 5, 6) This is going to be very subjective, but I'm thinking ~MATS mentor level researchers and above. Excludes myself and my collaborator.
7, 8, 9) Top conference meaning ICML, NeurIPS, or ICLR. Note that the first paper does count for this. Also note the paper counts are not mistakes. The paper has to list either myself or my collaborator as an author, and has to be a direct product of the research agenda and on-topic. There are grey areas here but I think it should be fairly obvious how to resolve these in most possible worlds.

I'm happy to provide more clarifications if needed so ask questions before trading if you're worried.

Market context
Get
á¹€1,000
to start trading!