Will there be a widely accepted precise definition of mesaoptimizer by 2026?

At present there is not really a rigorous definition of mesaoptimizer; typically, internal search is presented as a central example of mesaoptimization, but no rigorous process exists for classifying whether a given network counts as a mesaoptimizer.

A definition is precise if there exists in theory some procedure for determining whether a given circuit/NN is a mesaoptimizer, or if mesaoptimization is/can be characterized by a continuous quantity, a procedure for determining this quantity given a circuit/NN. This procedure does not necessarily have to be efficient or feasible in practice. This procedure could be probabilistic if the error can be made arbitrarily small.

Resolves Yes if such a definition exists and is accepted by a majority of the alignment community, even if a different but synonymous term is used (in which case it should be accepted by a majority of the alignment community that this term refers to substantially the same thing as the term mesaoptimizer does today). Resolves No if there are multiple competing definitions each with substantial mindshare, or if the concept of inner misalignment is no longer relevant in such a way that there exists no meaningful concept corresponding to mesaoptimization.

As always, in the event of ambiguity I will resolve things in the spirit of the question.

Get Ṁ600 play money
Sort by:

@LeoGao Given the ambiguity, I'll buy in if you agree to sell your position and not trade on the market.

This looks interesting to me, a layperson. From my perspective, I don't have a clue what mesaoptimization means, but since there is no precise definition it seems that it may not be terribly useful to look it up either, so the market criteria is vauge to me because the mesaoptimization definition appears to be circular and thus inherently ambiguous.

I don't doubt that the spirit of the market is clear to people in the field, but wonder what the minimum AI development/ethics knowledge bar for entry into the market is.

How much more of a primer would the market creator be interested in adding to make this question more accessible to lay people, or maybe it is better to keep the bar high for prediction accuracy.

How would you resolve a precise definition that is agreed to capture a significant subset of mesaoptimizers but leaves out known edge cases?

predicts NO

@vluzko I will decide whether I consider those edge cases crucial. I'll take into consideration whether the issue feels like the kind of thing that a) might ever come up in practice and b) might be symptoms of a more general kind of problem that hasn't been entirely ruled out. If you have any specific examples I can tell you how I would resolve.

Will there be a widely accepted precise definition of mesaoptimizer by 2026?, 8k, beautiful, illustration, trending on art station, picture of the day, epic composition

More related questions