What % of alignment forum karma will be pro-interpetability vs anti this year?
Sep 13

On 2024/09/13 I will uniformly sample from all post on the alignmentforum published between 2023/09/13 and 2024/09/13 that express an opinion on whether prosaic interpretability is net useful for aligning future, dangerous AI, weighted by their karma. (So a post with 4 karma is 2 times more likely to get picked than one with 2 karma)

If the sampled post contributes to prosaic interpretability or is in favor of past/future interpretability research, this question resolves to "yes".

I won't vote on this. I hope but do not guarantee to maintain the updated list of posts I'll sample over with their labels somewhere here.

