Will someone strengthen our Goodhart's Law result?
2
closes 2024
54%
chance

We (Thomas Kwa and Drake Thomas) have a Lesswrong post where we make a toy model for Goodhart’s Law, and prove things about the quantity $Q = \lim_{t \to \infty} E[V | X + V > t]$ when X and V are independent variables.

This market resolves YES iff someone other than us (a) substantially strengthens one of our results, or (b) treads nontrivial ground in the nonindependent case. The work must be published by resolution date.

Examples of work that would result in YES resolution:

  • Criterion (a): Weakening “subexponential” to “long-tailed” or “heavy-tailed” in our proof for when Q=0. (This would require some regularity condition.)

  • Criterion (a): Weakening our light-tailed condition in our proof for when Q=infinity

  • Criterion (b): Proving that Q=0 or Q=infinity in any case more general than an explicitly parametrized class of joint distributions, such that Drake and I couldn’t prove it ourselves in 15 minutes.

  • Criterion (b): Proving that Q=0 or Q=infinity in a toy model of AI alignment through oversight that we find insightful, even if the proof is easy.

The resolution criteria may change to keep the spirit of the question, or to discourage manipulation. I will not trade in this market.

Related markets

Will we achieve pretty clear evidence of high goodness for Function Correctness v2.0 in one week?5%
Will we achieve pretty clear evidence of high goodness for Function Correctness v2.0 in one week?5%
Will we achieve pretty clear evidence of high goodness for Function Correctness v2.0 in one month?25%
Will we achieve pretty clear evidence of high goodness for Function Correctness v2.0 in two weeks?15%
Will we achieve pretty clear evidence of high goodness for Function precondition correctness in one month?5%
Will we achieve pretty clear evidence of high goodness for setting Z in one week?60%
Will we achieve pretty clear evidence of high goodness for Direct Quotes in one month?80%
Will philosophers accept or lean towards consequentialism once a consensus on normative ethics is reached?42%
Will an inconsistency in the Calculus of Inductive Constructions be found before 2050?4%
Will we achieve pretty clear evidence of high goodness for Nested Functions in two weeks?20%
Will we achieve some reasonable version of Function Correctness v2.0(+) in four weeks?60%
Will "Is rationalussy good or bad" be decided by more than 10%?96%
Will we achieve some reasonable version of Function Correctness v2.0(+) in one week?20%
Is Haborth's Conjecture true?80%
Will anyone make a serious attempt at conducting Turing Test this year?38%
Will we achieve pretty clear evidence of high goodness for Open+Read (and later Open+Write?) in two weeks?60%
Is superintelligence possible within the known laws of physics?96%
Will philosophers accept or lean towards virtue ethics once a consensus on normative ethics is reached?33%
Will philosophers accept or lean towards deontology once a consensus on normative ethics is reached?25%
Will we achieve some reasonable version of Function Correctness v2.0(+) in two weeks?50%