Will there be a widely accepted precise definition of mesaoptimizer by 2026?
32
1.3kṀ1048
2026
27%
chance

At present there is not really a rigorous definition of mesaoptimizer; typically, internal search is presented as a central example of mesaoptimization, but no rigorous process exists for classifying whether a given network counts as a mesaoptimizer.

A definition is precise if there exists in theory some procedure for determining whether a given circuit/NN is a mesaoptimizer, or if mesaoptimization is/can be characterized by a continuous quantity, a procedure for determining this quantity given a circuit/NN. This procedure does not necessarily have to be efficient or feasible in practice. This procedure could be probabilistic if the error can be made arbitrarily small.

Resolves Yes if such a definition exists and is accepted by a majority of the alignment community, even if a different but synonymous term is used (in which case it should be accepted by a majority of the alignment community that this term refers to substantially the same thing as the term mesaoptimizer does today). Resolves No if there are multiple competing definitions each with substantial mindshare, or if the concept of inner misalignment is no longer relevant in such a way that there exists no meaningful concept corresponding to mesaoptimization.

As always, in the event of ambiguity I will resolve things in the spirit of the question.

Get
Ṁ1,000
to start trading!
© Manifold Markets, Inc.TermsPrivacy