
This market will resolve to the results of a poll posted in this market and possibly Rationality/AI alignment discords (for atleast 1 week, but no more than 3) at market close. Please review the below definition and provide arguments as desired. Feel free to suggest alterations to how this market is ran - if I think they are worthwhile I am willing to N/A this market, manalink all market participants 15 mana, the suggestor 100, and recreate it as suggested.
In "The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents"
(2012), Nick Bostrom defines Goal Content Integrity as seen below. Is this broadly true for agents whose goals can be approximated as utility function?
"Goal-content integrity
An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals. (This argument applies only to final goals. In order to attain its final goals, an intelligent agent will of course routinely want to change its subgoals in light of new information and insight.) Goal-content integrity for final goals is in a sense even more fundamental than survival as a convergent instrumental motivation. Among humans, the opposite may seem to be the case, but that is because survival is usually part of our final goals. For software agents, which can easily switch bodies or create exact duplicates of themselves, preservation of self as a particular implementation or a particular physical object need not be an important instrumental value. Advanced software agents might also be able to swap memories, download skills, and radically modify their cognitive architecture and personalities. A population of such agents might operate more like a “functional soup” than a society composed of distinct semi-permanent persons.
For some purposes, processes in such a system might be better individuated as teleological threads, based on their final values, rather than on the basis of bodies, personalities, memories, or abilities. In such scenarios, goal-continuity might be said to constitute a key aspect of survival. Even so, there are situations in which an agent may intentionally change its own final goals. Such situations can arise when any of the following factors is significant:
Social signaling
When others can perceive an agent’s goals and use that information to infer instrumentally relevant dispositions or other correlated attributes, it can be in the agent’s interest to modify its goals to make whatever desired impression. For example, an agent might miss out on beneficial deals if potential partners cannot trust it to fulfill its side of the bargain. In order to make credible commitments, an agent might therefore wish to adopt as a final goal the honoring of its earlier commitments, and to allow others to verify that it has indeed adopted this goal. Agents that could flexibly and transparently modify their own goals could use this ability to enforce deals among one another.
Social preferences
Others may also have preferences about an agent’s goals. The agent
could then have reason to modify its goals, either to satisfy or to frustrate those preferences.
Preferences concerning own goal content
An agent might have some final goal concerned with the agent’s own goal content. For example, the agent might have a final goal to become the type of agent that is motivated by certain values, such as compassion.
Storage costs
If the cost of storing or processing some part of an agent’s utility function is large compared to the chance that a situation will arise in which applying that part of the utility function will make a difference, then the agent has an instrumental reason to simplify its goal content, and it may trash that part of the utility function.
We humans often seem happy to let our final goals and values drift. This might often be because we do not know precisely what they are. We obviously want our beliefs about our final goals and values to be able to change in light of continuing self-discovery or changing self-presentation needs. However, there are cases in which we willingly change the goals and values themselves, not just our beliefs or interpretations of them. For example, somebody deciding to have a child might predict that they will come to value the child for its own sake, even though at the time of the decision they may not particularly value their future child or even like children in general.
Humans are complicated, and many factors might be at play in a situation like this. For instance, one might have a final value that involves becoming the kind of person who cares about some other individual for his or her own sake (here one places a final value on having a
certain final value). Alternatively, one might have a final value that involves having certain experiences and occupying a certain social role; and becoming a parent—and undergoing an
associated goal shift—might be a necessary part of that. Human goals can also have inconsistent content, goal content; and so some people might want to modify some of their final goals to
reduce the inconsistencies".
@EhMe11 apologies - I am going to resolve to N/A. It's been a very distracting last several weeks in my life and I do not feel like I have the time to run/advertise a decent quality poll.
@RobertCousineau If anyone is frustrated by this, please like this comment and I will manalink you 25 mana as an apology.