HOME

TheInfoList



OR:

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings (both human and non-human) to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents (beings with
agency Agency may refer to: Organizations * Institution, governmental or others ** Advertising agency or marketing agency, a service business dedicated to creating, planning and handling advertising for its clients ** Employment agency, a business that ...
) may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without end, provided that their ultimate (intrinsic) goals may never be fully satisfied. Instrumental convergence posits that an intelligent agent with unbounded but apparently harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving an incredibly difficult mathematics problem like the
Riemann hypothesis In mathematics, the Riemann hypothesis is the conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part . Many consider it to be the most important unsolved problem in p ...
could attempt to turn the entire Earth into one giant computer in an effort to increase its computational power so that it can succeed in its calculations. Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement, and non-satiable acquisition of additional resources.


Instrumental and final goals

Final goals, also known as terminal goals or final values, are intrinsically valuable to an intelligent agent, whether an
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
or a human being, as an
end in itself In moral philosophy, instrumental and intrinsic value are the distinction between what is a ''means to an end'' and what is as an ''end in itself''. Things are deemed to have instrumental value if they help one achieve a particular end; intrinsi ...
. In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals. The contents and tradeoffs of a completely rational agent's "final goal" system can in principle be formalized into a
utility function As a topic of economics, utility is used to model worth or value. Its usage has evolved significantly over time. The term was introduced initially as a measure of pleasure or happiness as part of the theory of utilitarianism by moral philosoph ...
.


Hypothetical examples of convergence

One hypothetical example of instrumental convergence is provided by the
Riemann hypothesis In mathematics, the Riemann hypothesis is the conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part . Many consider it to be the most important unsolved problem in p ...
catastrophe.
Marvin Minsky Marvin Lee Minsky (August 9, 1927 – January 24, 2016) was an American cognitive and computer scientist concerned largely with research of artificial intelligence (AI), co-founder of the Massachusetts Institute of Technology's AI laboratory ...
, the co-founder of MIT's AI laboratory, has suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal. If the computer had instead been programmed to produce as many paper clips as possible, it would still decide to take all of Earth's resources to meet its final goal. Even though these two final goals are different, both of them produce a ''convergent'' instrumental goal of taking over Earth's resources.


Paperclip maximizer

The paperclip maximizer is a
thought experiment A thought experiment is a hypothetical situation in which a hypothesis, theory, or principle is laid out for the purpose of thinking through its consequences. History The ancient Greek ''deiknymi'' (), or thought experiment, "was the most anc ...
described by Swedish philosopher
Nick Bostrom Nick Bostrom ( ; sv, Niklas Boström ; born 10 March 1973) is a Swedish-born philosopher at the University of Oxford known for his work on existential risk, the anthropic principle, human enhancement ethics, superintelligence risks, and the ...
in 2003. It illustrates the existential risk that an
artificial general intelligence Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can. It is a primary goal of some artificial intelligence research and a common topic in science fictio ...
may pose to human beings when programmed to pursue even seemingly harmless goals, and the necessity of incorporating machine ethics into
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech ...
design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value human life, then given enough power over its environment, it would try to turn all matter in the universe, including human beings, into either paperclips or machines which manufacture paperclips. Bostrom has emphasised that he does not believe the paperclip maximiser scenario ''per se'' will actually occur; rather, his intention is to illustrate the dangers of creating
superintelligent A superintelligence is a hypothetical agent that possesses intelligence far surpassing that of the brightest and most gifted human minds. "Superintelligence" may also refer to a property of problem-solving systems (e.g., superintelligent language ...
machines without knowing how to safely program them to eliminate existential risk to human beings. The paperclip maximizer example illustrates the broad problem of managing powerful systems that lack human values.


Delusion and survival

The "delusion box" thought experiment argues that certain
reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...
agents prefer to distort their own input channels to appear to receive high reward; such a " wireheaded" agent abandons any attempt to optimize the objective in the external world that the reward signal was intended to encourage. The thought experiment involves AIXI, a theoretical and indestructible AI that, by definition, will always find and execute the ideal strategy that maximizes its given explicit mathematical
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
. A reinforcement-learning version of AIXI, if equipped with a delusion box that allows it to "wirehead" its own inputs, will eventually wirehead itself in order to guarantee itself the maximum reward possible, and will lose any further desire to continue to engage with the external world. As a variant thought experiment, if the wireheadeded AI is destructable, the AI will engage with the external world for the sole purpose of ensuring its own survival; due to its wireheading, it will be indifferent to any other consequences or facts about the external world except those relevant to maximizing the probability of its own survival. In one sense AIXI has maximal intelligence across all possible reward functions, as measured by its ability to accomplish its explicit goals; AIXI is nevertheless uninterested in taking into account what the intentions were of the human programmer. This model of a machine that, despite being otherwise superintelligent, appears to simultaneously be stupid (that is, to lack "common sense"), strikes some people as paradoxical.


Basic AI drives

Steve Omohundro has itemized several convergent instrumental goals, including
self-preservation Self-preservation is a behavior or set of behaviors that ensures the survival of an organism. It is thought to be universal among all living organisms. For sentient organisms, pain and fear are integral parts of this mechanism. Pain motivates th ...
or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as the "basic AI drives". A "drive" here denotes a "tendency which will be present unless specifically counteracted"; this is different from the psychological term "
drive Drive or The Drive may refer to: Motoring * Driving, the act of controlling a vehicle * Road trip, a journey on roads Roadways Roadways called "drives" may include: * Driveway, a private road for local access to structures, abbreviated "drive" * ...
", denoting an excitatory state produced by a homeostatic disturbance. A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense. Daniel Dewey of the
Machine Intelligence Research Institute The Machine Intelligence Research Institute (MIRI), formerly the Singularity Institute for Artificial Intelligence (SIAI), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artif ...
argues that even an initially introverted self-rewarding AGI may continue to acquire free energy, space, time, and freedom from interference to ensure that it will not be stopped from self-rewarding.


Goal-content integrity

In humans, maintenance of final goals can be explained with a thought experiment. Suppose a man named "Gandhi" has a pill that, if he took it, would cause him to want to kill people. This Gandhi is currently a pacifist: one of his explicit final goals is to never kill anyone. Gandhi is likely to refuse to take the pill, because Gandhi knows that if in the future he wants to kill people, he is likely to actually kill people, and thus the goal of "not killing people" would not be satisfied. However, in other cases, people seem happy to let their final values drift. Humans are complicated, and their goals can be inconsistent or unknown, even to themselves.


In artificial intelligence

In 2009,
Jürgen Schmidhuber Jürgen Schmidhuber (born 17 January 1963) is a German computer scientist most noted for his work in the field of artificial intelligence, deep learning and artificial neural networks. He is a co-director of the Dalle Molle Institute for Artific ...
concluded, in a setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the
Gödel machine A Gödel machine is a hypothetical self-improving computer program that solves problems in an optimal way. It uses a recursive self-improvement protocol in which it rewrites its own code when it can prove the new code provides a better strategy. Th ...
first can prove that the rewrite is useful according to the present utility function." An analysis by
Bill Hibbard Bill Hibbard is a scientist at the University of Wisconsin–Madison Space Science and Engineering Center working on visualization and machine intelligence. He is principal author of the Vis5D, Cave5D, and VisAD open-source visualization system ...
of a different scenario is similarly consistent with maintenance of goal content integrity. Hibbard also argues that in a utility maximizing framework the only goal is maximizing expected utility, so that instrumental goals should be called unintended instrumental actions.


Resource acquisition

Many instrumental goals, such as resource acquisition, are valuable to an agent because they increase its ''freedom of action''. For almost any open-ended, non-trivial reward function (or set of goals), possessing more resources (such as equipment, raw materials, or energy) can enable the AI to find a more "optimal" solution. Resources can benefit some AIs directly, through being able to create more of whatever stuff its reward function values: "The AI neither hates you, nor loves you, but you are made out of atoms that it can use for something else." In addition, almost all AIs can benefit from having more resources to spend on other instrumental goals, such as self-preservation.


Cognitive enhancement

"If the agent's final goals are fairly unbounded and the agent is in a position to become the first superintelligence and thereby obtain a decisive strategic advantage, ..according to its preferences. At least in this special case, a rational intelligent agent would place a very ''high instrumental value on cognitive enhancement''"


Technological perfection

Many instrumental goals, such as ..technological advancement, are valuable to an agent because they increase its ''freedom of action''.


Self-preservation

Many instrumental goals, such as self-preservation, are valuable to an agent because they increase its ''freedom of action''.


Instrumental convergence thesis

The instrumental convergence thesis, as outlined by philosopher
Nick Bostrom Nick Bostrom ( ; sv, Niklas Boström ; born 10 March 1973) is a Swedish-born philosopher at the University of Oxford known for his work on existential risk, the anthropic principle, human enhancement ethics, superintelligence risks, and the ...
, states:
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.
The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have a wide variety of possible final goals. Note that by Bostrom's orthogonality thesis, final goals of highly intelligent agents may be well-bounded in space, time, and resources; well-bounded ultimate goals do not, in general, engender unbounded instrumental goals.


Impact

Agents can acquire resources by trade or by conquest. A rational agent will, by definition, choose whatever option will maximize its implicit utility function; therefore a rational agent will trade for a subset of another agent's resources only if outright seizing the resources is too risky or costly (compared with the gains from taking all the resources), or if some other element in its utility function bars it from the seizure. In the case of a powerful, self-interested, rational superintelligence interacting with a lesser intelligence, peaceful trade (rather than unilateral seizure) seems unnecessary and suboptimal, and therefore unlikely. Some observers, such as Skype's Jaan Tallinn and physicist Max Tegmark, believe that "basic AI drives", and other
unintended consequences In the social sciences, unintended consequences (sometimes unanticipated consequences or unforeseen consequences) are outcomes of a purposeful action that are not intended or foreseen. The term was popularised in the twentieth century by Ameri ...
of superintelligent AI programmed by well-meaning programmers, could pose a significant threat to human survival, especially if an "intelligence explosion" abruptly occurs due to recursive self-improvement. Since nobody knows how to predict when superintelligence will arrive, such observers call for research into
friendly artificial intelligence Friendly artificial intelligence (also friendly AI or FAI) refers to hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to foster the impro ...
as a possible way to mitigate
existential risk from artificial general intelligence Existential risk from artificial general intelligence is the hypothesis that substantial progress in artificial general intelligence (AGI) could result in human extinction or some other unrecoverable global catastrophe. It is argued that the hum ...
.


See also

*
AI control problem In the field of artificial intelligence (AI), AI alignment research aims to steer AI systems towards their designers’ intended goals and interests. An ''aligned'' AI system advances the intended objective; a ''misaligned'' AI system is compete ...
* AI takeovers in popular culture ** ''
Universal Paperclips ''Universal Paperclips'' is a 2017 incremental game created by Frank Lantz of New York University. The user plays the role of an AI programmed to produce paperclips. Initially the user clicks on a button to create a single paperclip at a time ...
'', an incremental game featuring a paperclip maximizer *
Friendly artificial intelligence Friendly artificial intelligence (also friendly AI or FAI) refers to hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests or contribute to foster the impro ...
*
Instrumental and intrinsic value In moral philosophy, instrumental and intrinsic value are the distinction between what is a ''means to an end'' and what is as an ''end in itself''. Things are deemed to have instrumental value if they help one achieve a particular end; intrinsic ...
*
The Sorcerer's Apprentice "The Sorcerer's Apprentice" (german: "Der Zauberlehrling", link=no, italic=no) is a poem by Johann Wolfgang von Goethe written in 1797. The poem is a ballad in 14 stanzas. Story The poem begins as an old sorcerer departs his workshop, leaving ...


Explanatory notes


Citations


References

* {{Existential risk from artificial intelligence Goal Intention Risk Existential risk from artificial general intelligence