Existential risk from artificial intelligence refers to the idea that substantial progress in

artificial general intelligence Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks. Some researchers argue that sta ...

(AGI) could lead to

human extinction Human extinction or omnicide is the hypothetical end of the human species, either by population decline due to extraneous natural causes, such as an asteroid impact or large-scale volcanism, or via anthropogenic destruction (self-extinction ...

or an irreversible global catastrophe. One argument for the importance of this risk references how

human beings Humans (''Homo sapiens'') or modern humans are the most common and widespread species of primate, and the last surviving species of the genus ''Homo''. They are great apes characterized by their hairlessness, bipedalism, and high intellige ...

dominate other species because the

human brain The human brain is the central organ (anatomy), organ of the nervous system, and with the spinal cord, comprises the central nervous system. It consists of the cerebrum, the brainstem and the cerebellum. The brain controls most of the activi ...

possesses distinctive capabilities other animals lack. If AI were to surpass

human intelligence Human intelligence is the Intellect, intellectual capability of humans, which is marked by complex Cognition, cognitive feats and high levels of motivation and self-awareness. Using their intelligence, humans are able to learning, learn, Concept ...

and become superintelligent, it might become uncontrollable. Just as the fate of the

mountain gorilla The mountain gorilla (''Gorilla beringei beringei'') is one of the two subspecies of the eastern gorilla. It is listed as endangered by the IUCN . There are two populations: One is found in the Virunga Mountains, Virunga volcanic mountains of C ...

depends on human goodwill, the fate of humanity could depend on the actions of a future machine superintelligence. The plausibility of existential catastrophe due to AI is widely debated. It hinges in part on whether AGI or superintelligence are achievable, the speed at which dangerous capabilities and behaviors emerge, and whether practical scenarios for AI takeovers exist. Concerns about superintelligence have been voiced by computer scientists and tech

CEOs Kea (), also known as Tzia () and in antiquity Keos (, ), is a Greek island in the Cyclades archipelago in the Aegean Sea. Kea is part of the Kea-Kythnos regional unit. Geography It is the island of the Cyclades complex that is closest to Att ...

such as

Geoffrey Hinton Geoffrey Everest Hinton (born 1947) is a British-Canadian computer scientist, cognitive scientist, and cognitive psychologist known for his work on artificial neural networks, which earned him the title "the Godfather of AI". Hinton is Univer ...

Yoshua Bengio Yoshua Bengio (born March 5, 1964) is a Canadian-French computer scientist, and a pioneer of artificial neural networks and deep learning. He is a professor at the Université de Montréal and scientific director of the AI institute Montreal In ...

Alan Turing Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. He was highly influential in the development of theoretical computer ...

Elon Musk Elon Reeve Musk ( ; born June 28, 1971) is a businessman. He is known for his leadership of Tesla, SpaceX, X (formerly Twitter), and the Department of Government Efficiency (DOGE). Musk has been considered the wealthiest person in th ...

, and

OpenAI OpenAI, Inc. is an American artificial intelligence (AI) organization founded in December 2015 and headquartered in San Francisco, California. It aims to develop "safe and beneficial" artificial general intelligence (AGI), which it defines ...

CEO

Sam Altman Samuel Harris Altman (born April 22, 1985) is an American technology entrepreneur, investor, and the chief executive officer of OpenAI since 2019 (he was Removal of Sam Altman from OpenAI, briefly dismissed and reinstated in November 2023). He ...

. In 2022, a survey of AI researchers with a 17% response rate found that the majority believed there is a 10 percent or greater chance that human inability to control AI will cause an existential catastrophe. In 2023, hundreds of AI experts and other notable figures signed a statement declaring, "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as

pandemic A pandemic ( ) is an epidemic of an infectious disease that has a sudden increase in cases and spreads across a large region, for instance multiple continents or worldwide, affecting a substantial number of individuals. Widespread endemic (epi ...

s and

nuclear war Nuclear warfare, also known as atomic warfare, is a War, military conflict or prepared Policy, political strategy that deploys nuclear weaponry. Nuclear weapons are Weapon of mass destruction, weapons of mass destruction; in contrast to conven ...

". Following increased concern over AI risks, government leaders such as United Kingdom prime minister

Rishi Sunak Rishi Sunak (born 12 May 1980) is a British politician who served as Prime Minister of the United Kingdom and Leader of the Conservative Party (UK), Leader of the Conservative Party from 2022 to 2024. Following his defeat to Keir Starmer's La ...

and

United Nations Secretary-General The secretary-general of the United Nations (UNSG or UNSECGEN) is the chief administrative officer of the United Nations and head of the United Nations Secretariat, one of the United Nations System#Six principal organs, six principal organs of ...

António Guterres António Manuel de Oliveira Guterres (born 30 April 1949) is a Portuguese politician and diplomat who is serving as the ninth and current secretary-general of the United Nations since 2017. A member of the Socialist Party (Portugal), ...

called for an increased focus on global AI regulation. Two sources of concern stem from the problems of AI control and

alignment Alignment may refer to: Archaeology * Alignment (archaeology), a co-linear arrangement of features or structures with external landmarks * Stone alignment, a linear arrangement of upright, parallel megalithic standing stones Biology * Struc ...

. Controlling a superintelligent machine or instilling it with human-compatible values may be difficult. Many researchers believe that a superintelligent machine would likely resist attempts to disable it or change its goals as that would prevent it from accomplishing its present goals. It would be extremely challenging to align a superintelligence with the full breadth of significant human values and constraints., cited in In contrast, skeptics such as

computer scientist A computer scientist is a scientist who specializes in the academic study of computer science. Computer scientists typically work on the theoretical side of computation. Although computer scientists can also focus their work and research on ...

Yann LeCun Yann André Le Cun ( , ; usually spelled LeCun; born 8 July 1960) is a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics and computational neuroscience. He is the Silver Pr ...

argue that superintelligent machines will have no desire for self-preservation. A third source of concern is the possibility of a sudden "

intelligence explosion The technological singularity—or simply the singularity—is a hypothetical point in time at which technological growth becomes uncontrollable and irreversible, resulting in unforeseeable consequences for human civilization. According to the ...

" that catches humanity unprepared. In this scenario, an AI more intelligent than its creators would be able to recursively improve itself at an exponentially increasing rate, improving too quickly for its handlers or society at large to control. Empirically, examples like

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind ...

, which taught itself to play Go and quickly surpassed human ability, show that domain-specific AI systems can sometimes progress from subhuman to superhuman ability very quickly, although such

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

systems do not recursively improve their fundamental architecture.

History

One of the earliest authors to express serious concern that highly advanced machines might pose existential risks to humanity was the novelist Samuel Butler, who wrote in his 1863 essay '' Darwin among the Machines'': In 1951, foundational computer scientist

wrote the article "Intelligent Machinery, A Heretical Theory", in which he proposed that artificial general intelligences would likely "take control" of the world as they became more intelligent than human beings: In 1965, I. J. Good originated the concept now known as an "intelligence explosion" and said the risks were underappreciated: Scholars such as

Marvin Minsky Marvin Lee Minsky (August 9, 1927 – January 24, 2016) was an American cognitive scientist, cognitive and computer scientist concerned largely with research in artificial intelligence (AI). He co-founded the Massachusetts Institute of Technology ...

and I. J. Good himself occasionally expressed concern that a superintelligence could seize control, but issued no call to action. In 2000, computer scientist and

Sun The Sun is the star at the centre of the Solar System. It is a massive, nearly perfect sphere of hot plasma, heated to incandescence by nuclear fusion reactions in its core, radiating the energy from its surface mainly as visible light a ...

co-founder

Bill Joy William Nelson Joy (born November 8, 1954) is an American computer engineer and venture capitalist. He co-founded Sun Microsystems in 1982 along with Scott McNealy, Vinod Khosla, and Andy Bechtolsheim, and served as Chief Scientist and CTO ...

penned an influential essay, " Why The Future Doesn't Need Us", identifying superintelligent robots as a high-tech danger to human survival, alongside

nanotechnology Nanotechnology is the manipulation of matter with at least one dimension sized from 1 to 100 nanometers (nm). At this scale, commonly known as the nanoscale, surface area and quantum mechanical effects become important in describing propertie ...

and engineered bioplagues.

Nick Bostrom Nick Bostrom ( ; ; born 10 March 1973) is a Philosophy, philosopher known for his work on existential risk, the anthropic principle, human enhancement ethics, whole brain emulation, Existential risk from artificial general intelligence, superin ...

published ''

Superintelligence A superintelligence is a hypothetical intelligent agent, agent that possesses intelligence surpassing that of the brightest and most intellectual giftedness, gifted human minds. "Superintelligence" may also refer to a property of advanced problem- ...

'' in 2014, which presented his arguments that superintelligence poses an existential threat. By 2015, public figures such as physicists

Stephen Hawking Stephen William Hawking (8January 194214March 2018) was an English theoretical physics, theoretical physicist, cosmologist, and author who was director of research at the Centre for Theoretical Cosmology at the University of Cambridge. Between ...

and Nobel laureate

Frank Wilczek Frank Anthony Wilczek ( or ; born May 15, 1951) is an American theoretical physicist, mathematician and Nobel laureate. He is the Herman Feshbach Professor of Physics at the Massachusetts Institute of Technology (MIT), Founding Director ...

, computer scientists Stuart J. Russell and Roman Yampolskiy, and entrepreneurs

and

Bill Gates William Henry Gates III (born October 28, 1955) is an American businessman and philanthropist. A pioneer of the microcomputer revolution of the 1970s and 1980s, he co-founded the software company Microsoft in 1975 with his childhood friend ...

were expressing concern about the risks of superintelligence. Also in 2015, the Open Letter on Artificial Intelligence highlighted the "great potential of AI" and encouraged more research on how to make it robust and beneficial. In April 2016, the journal ''

Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...

'' warned: "Machines and robots that outperform humans across the board could self-improve beyond our control—and their interests might not align with ours". In 2020, Brian Christian published '' The Alignment Problem'', which details the history of progress on AI alignment up to that time. In March 2023, key figures in AI, such as Musk, signed a letter from the Future of Life Institute calling a halt to advanced AI training until it could be properly regulated. In May 2023, the Center for AI Safety released a statement signed by numerous experts in AI safety and the AI existential risk which stated: "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war."

Potential AI capabilities

General Intelligence

Artificial general intelligence Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks. Some researchers argue that sta ...

(AGI) is typically defined as a system that performs at least as well as humans in most or all intellectual tasks. A 2022 survey of AI researchers found that 90% of respondents expected AGI would be achieved in the next 100 years, and half expected the same by 2061. Meanwhile, some researchers dismiss existential risks from AGI as "science fiction" based on their high confidence that AGI will not be created anytime soon. Breakthroughs in

large language model A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are g ...

s (LLMs) have led some researchers to reassess their expectations. Notably,

said in 2023 that he recently changed his estimate from "20 to 50 years before we have general purpose A.I." to "20 years or less". The Frontier supercomputer at

Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a federally funded research and development centers, federally funded research and development center in Oak Ridge, Tennessee, United States. Founded in 1943, the laboratory is sponsored by the United Sta ...

turned out to be nearly eight times faster than expected. Feiyi Wang, a researcher there, said "We didn't expect this capability" and "we're approaching the point where we could actually simulate the human brain".

Superintelligence

In contrast with AGI, Bostrom defines a

superintelligence A superintelligence is a hypothetical intelligent agent, agent that possesses intelligence surpassing that of the brightest and most intellectual giftedness, gifted human minds. "Superintelligence" may also refer to a property of advanced problem- ...

as "any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest", including scientific creativity, strategic planning, and social skills. He argues that a superintelligence can outmaneuver humans anytime its goals conflict with humans'. It may choose to hide its true intent until humanity cannot stop it. Bostrom writes that in order to be safe for humanity, a superintelligence must be aligned with human values and morality, so that it is "fundamentally on our side"..

argued that superintelligence is physically possible because "there is no physical law precluding particles from being organised in ways that perform even more advanced computations than the arrangements of particles in human brains". When artificial superintelligence (ASI) may be achieved, if ever, is necessarily less certain than predictions for AGI. In 2023,

leaders said that not only AGI, but superintelligence may be achieved in less than 10 years.

Comparison with humans

Bostrom argues that AI has many advantages over the

: * Speed of computation: biological

neuron A neuron (American English), neurone (British English), or nerve cell, is an membrane potential#Cell excitability, excitable cell (biology), cell that fires electric signals called action potentials across a neural network (biology), neural net ...

s operate at a maximum frequency of around 200 Hz, compared to potentially multiple GHz for computers. * Internal communication speed:

axon An axon (from Greek ἄξων ''áxōn'', axis) or nerve fiber (or nerve fibre: see American and British English spelling differences#-re, -er, spelling differences) is a long, slender cellular extensions, projection of a nerve cell, or neuron, ...

s transmit signals at up to 120 m/s, while computers transmit signals at the speed of electricity, or optically at the

speed of light The speed of light in vacuum, commonly denoted , is a universal physical constant exactly equal to ). It is exact because, by international agreement, a metre is defined as the length of the path travelled by light in vacuum during a time i ...

. * Scalability: human intelligence is limited by the size and structure of the brain, and by the efficiency of social communication, while AI may be able to scale by simply adding more hardware. * Memory: notably

working memory Working memory is a cognitive system with a limited capacity that can Memory, hold information temporarily. It is important for reasoning and the guidance of decision-making and behavior. Working memory is often used synonymously with short-term m ...

, because in humans it is limited to a few chunks of information at a time. * Reliability: transistors are more reliable than biological neurons, enabling higher precision and requiring less redundancy. * Duplicability: unlike human brains, AI software and models can be easily copied. * Editability: the parameters and internal workings of an AI model can easily be modified, unlike the connections in a human brain. * Memory sharing and learning: AIs may be able to learn from the experiences of other AIs in a manner more efficient than human learning.

Intelligence explosion

According to Bostrom, an AI that has an expert-level facility at certain key software engineering tasks could become a superintelligence due to its capability to recursively improve its own algorithms, even if it is initially limited in other domains not directly relevant to engineering. This suggests that an intelligence explosion may someday catch humanity unprepared. The economist Robin Hanson has said that, to launch an intelligence explosion, an AI must become vastly better at software innovation than the rest of the world combined, which he finds implausible. In a "fast takeoff" scenario, the transition from AGI to superintelligence could take days or months. In a "slow takeoff", it could take years or decades, leaving more time for society to prepare.

Alien mind

Superintelligences are sometimes called "alien minds", referring to the idea that their way of thinking and motivations could be vastly different from ours. This is generally considered as a source of risk, making it more difficult to anticipate what a superintelligence might do. It also suggests the possibility that a superintelligence may not particularly value humans by default. To avoid

anthropomorphism Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities. It is considered to be an innate tendency of human psychology. Personification is the related attribution of human form and characteristics t ...

, superintelligence is sometimes viewed as a powerful optimizer that makes the best decisions to achieve its goals. The field of "mechanistic interpretability" aims to better understand the inner workings of AI models, potentially allowing us one day to detect signs of deception and misalignment.

Limits

It has been argued that there are limitations to what intelligence can achieve. Notably, the chaotic nature or

time complexity In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

of some systems could fundamentally limit a superintelligence's ability to predict some aspects of the future, increasing its uncertainty.

Dangerous capabilities

Advanced AI could generate enhanced pathogens or cyberattacks or manipulate people. These capabilities could be misused by humans, or exploited by the AI itself if misaligned. A full-blown superintelligence could find various ways to gain a decisive influence if it wanted to, but these dangerous capabilities may become available earlier, in weaker and more specialized AI systems. They may cause societal instability and empower malicious actors.

Social manipulation

Geoffrey Hinton warned that in the short term, the profusion of AI-generated text, images and videos will make it more difficult to figure out the truth, which he says authoritarian states could exploit to manipulate elections. Such large-scale, personalized manipulation capabilities can increase the existential risk of a worldwide "irreversible totalitarian regime". It could also be used by malicious actors to fracture society and make it dysfunctional.

Cyberattacks

AI-enabled

cyberattack A cyberattack (or cyber attack) occurs when there is an unauthorized action against computer infrastructure that compromises the confidentiality, integrity, or availability of its content. The rising dependence on increasingly complex and inte ...

s are increasingly considered a present and critical threat. According to

NATO The North Atlantic Treaty Organization (NATO ; , OTAN), also called the North Atlantic Alliance, is an intergovernmental organization, intergovernmental Transnationalism, transnational military alliance of 32 Member states of NATO, member s ...

's technical director of cyberspace, "The number of attacks is increasing exponentially". AI can also be used defensively, to preemptively find and fix vulnerabilities, and detect threats. AI could improve the "accessibility, success rate, scale, speed, stealth and potency of cyberattacks", potentially causing "significant geopolitical turbulence" if it facilitates attacks more than defense. Speculatively, such hacking capabilities could be used by an AI system to break out of its local environment, generate revenue, or acquire cloud computing resources.

Enhanced pathogens

As AI technology democratizes, it may become easier to engineer more contagious and lethal pathogens. This could enable people with limited skills in

synthetic biology Synthetic biology (SynBio) is a multidisciplinary field of science that focuses on living systems and organisms. It applies engineering principles to develop new biological parts, devices, and systems or to redesign existing systems found in nat ...

to engage in

bioterrorism Bioterrorism is terrorism involving the intentional release or dissemination of biological agents. These agents include bacteria, viruses, insects, fungi, and/or their toxins, and may be in a naturally occurring or a human-modified form, in mu ...

Dual-use technology In politics, diplomacy and export control, dual-use items refer to goods, software and technology that can be used for both civilian and military applications.

that is useful for medicine could be repurposed to create weapons. For example, in 2022, scientists modified an AI system originally intended for generating non-toxic, therapeutic molecules with the purpose of creating new drugs. The researchers adjusted the system so that toxicity is rewarded rather than penalized. This simple change enabled the AI system to create, in six hours, 40,000 candidate molecules for chemical warfare, including known and novel molecules.

AI arms race

Companies, state actors, and other organizations competing to develop AI technologies could lead to a

race to the bottom Race to the bottom is a Socioeconomics, socio-economic concept describing a scenario in which individuals or companies compete in a manner that incrementally reduces the utility of a product or service in response to perverse incentives. This pheno ...

of safety standards. As rigorous safety procedures take time and resources, projects that proceed more carefully risk being out-competed by less scrupulous developers. AI could be used to gain military advantages via autonomous lethal weapons,

cyberwarfare Cyberwarfare is the use of cyberattack, cyber attacks against an enemy State (polity), state, causing comparable harm to actual warfare and/or disrupting vital computer systems. Some intended outcomes could be espionage, sabotage, propaganda, ...

, or

automated decision-making Automated decision-making (ADM) is the use of data, machines and algorithms to make decisions in a range of contexts, including public administration, business, health, education, law, employment, transport, media and entertainment, with varying d ...

. As an example of autonomous lethal weapons, miniaturized drones could facilitate low-cost assassination of military or civilian targets, a scenario highlighted in the 2017 short film '' Slaughterbots''. AI could be used to gain an edge in decision-making by quickly analyzing large amounts of data and making decisions more quickly and effectively than humans. This could increase the speed and unpredictability of war, especially when accounting for automated retaliation systems.

Types of existential risk

existential risk A global catastrophic risk or a doomsday scenario is a hypothetical event that could damage human well-being on a global scale, endangering or even destroying Modernity, modern civilization. Existential risk is a related term limited to even ...

is "one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development". Besides extinction risk, there is the risk that the civilization gets permanently locked into a flawed future. One example is a "value lock-in": If humanity still has moral blind spots similar to slavery in the past, AI might irreversibly entrench it, preventing moral progress. AI could also be used to spread and preserve the set of values of whoever develops it. AI could facilitate large-scale surveillance and indoctrination, which could be used to create a stable repressive worldwide totalitarian regime. Atoosa Kasirzadeh proposes to classify existential risks from AI into two categories: decisive and accumulative. Decisive risks encompass the potential for abrupt and catastrophic events resulting from the emergence of superintelligent AI systems that exceed human intelligence, which could ultimately lead to human extinction. In contrast, accumulative risks emerge gradually through a series of interconnected disruptions that may gradually erode societal structures and resilience over time, ultimately leading to a critical failure or collapse. It is difficult or impossible to reliably evaluate whether an advanced AI is sentient and to what degree. But if sentient machines are mass created in the future, engaging in a civilizational path that indefinitely neglects their welfare could be an existential catastrophe. This has notably been discussed in the context of risks of astronomical suffering (also called "s-risks"). Moreover, it may be possible to engineer digital minds that can feel much more happiness than humans with fewer resources, called "super-beneficiaries". Such an opportunity raises the question of how to share the world and which "ethical and political framework" would enable a mutually beneficial coexistence between biological and digital minds. AI may also drastically improve humanity's future. Toby Ord considers the existential risk a reason for "proceeding with due caution", not for abandoning AI.

Max More Max More (born Max T. O'Connor, January 1964) is a philosopher and futurist who writes, speaks, and consults on emerging technologies. He was the president and CEO of the Alcor Life Extension Foundation between 2010 and 2020. Born in Bristol, E ...

calls AI an "existential opportunity", highlighting the cost of not developing it. According to Bostrom, superintelligence could help reduce the existential risk from other powerful technologies such as

molecular nanotechnology Molecular nanotechnology (MNT) is a technology based on the ability to build structures to complex, atomic specifications by means of mechanosynthesis. This is distinct from nanoscale materials. Based on Richard Feynman's vision of miniat ...

. It is thus conceivable that developing superintelligence before other dangerous technologies would reduce the overall existential risk.

AI alignment

The alignment problem is the research problem of how to reliably assign objectives, preferences or ethical principles to AIs.

Instrumental convergence

An "instrumental" goal is a sub-goal that helps to achieve an agent's ultimate goal. "Instrumental convergence" refers to the fact that some sub-goals are useful for achieving virtually ''any'' ultimate goal, such as acquiring resources or self-preservation.Omohundro, S. M. (2008, February). The basic AI drives. In ''AGI'' (Vol. 171, pp. 483–492). Bostrom argues that if an advanced AI's instrumental goals conflict with humanity's goals, the AI might harm humanity in order to acquire more resources or prevent itself from being shut down, but only as a way to achieve its ultimate goal. Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in... if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal."

Resistance to changing goals

Even if current goal-based AI programs are not intelligent enough to think of resisting programmer attempts to modify their goal structures, a sufficiently advanced AI might resist any attempts to change its goal structure, just as a pacifist would not want to take a pill that makes them want to kill people. If the AI were superintelligent, it would likely succeed in out-maneuvering its human operators and prevent itself being "turned off" or reprogrammed with a new goal. This is particularly relevant to value lock-in scenarios. The field of "corrigibility" studies how to make agents that will not resist attempts to change their goals.

Difficulty of specifying goals

In the "

intelligent agent In artificial intelligence, an intelligent agent is an entity that Machine perception, perceives its environment, takes actions autonomously to achieve goals, and may improve its performance through machine learning or by acquiring knowledge r ...

" model, an AI can loosely be viewed as a machine that chooses whatever action appears to best achieve its set of goals, or "utility function". A utility function gives each possible situation a score that indicates its desirability to the agent. Researchers know how to write utility functions that mean "minimize the average network latency in this specific telecommunications model" or "maximize the number of reward clicks", but do not know how to write a utility function for "maximize human flourishing"; nor is it clear whether such a function meaningfully and unambiguously exists. Furthermore, a utility function that expresses some values but not others will tend to trample over the values the function does not reflect. An additional source of concern is that AI "must reason about what people ''intend'' rather than carrying out commands literally", and that it must be able to fluidly solicit human guidance if it is too uncertain about what humans want.

Alignment of superintelligences

Some researchers believe the alignment problem may be particularly difficult when applied to superintelligences. Their reasoning includes: * As AI systems increase in capabilities, the potential dangers associated with experimentation grow. This makes iterative, empirical approaches increasingly risky. * If instrumental goal convergence occurs, it may only do so in sufficiently intelligent agents. * A superintelligence may find unconventional and radical solutions to assigned goals. Bostrom gives the example that if the objective is to make humans smile, a weak AI may perform as intended, while a superintelligence may decide a better solution is to "take control of the world and stick electrodes into the facial muscles of humans to cause constant, beaming grins." * A superintelligence in creation could gain some awareness of what it is, where it is in development (training, testing, deployment, etc.), and how it is being monitored, and use this information to deceive its handlers. Bostrom writes that such an AI could feign alignment to prevent human interference until it achieves a "decisive strategic advantage" that allows it to take control. * Analyzing the internals and interpreting the behavior of LLMs is difficult. And it could be even more difficult for larger and more intelligent models. Alternatively, some find reason to believe superintelligences would be better able to understand morality, human values, and complex goals. Bostrom writes, "A future superintelligence occupies an epistemically superior vantage point: its beliefs are (probably, on most topics) more likely than ours to be true". In 2023, OpenAI started a project called "Superalignment" to solve the alignment of superintelligences in four years. It called this an especially important challenge, as it said superintelligence could be achieved within a decade. Its strategy involved automating alignment research using AI. The Superalignment team was dissolved less than a year later.

Difficulty of making a flawless design

'' Artificial Intelligence: A Modern Approach'', a widely used undergraduate AI textbook, says that superintelligence "might mean the end of the human race". It states: "Almost any technology has the potential to cause harm in the wrong hands, but with uperintelligence we have the new problem that the wrong hands might belong to the technology itself." Even if the system designers have good intentions, two difficulties are common to both AI and non-AI computer systems: * The system's implementation may contain initially unnoticed but subsequently catastrophic bugs. An analogy is space probes: despite the knowledge that bugs in expensive space probes are hard to fix after launch, engineers have historically not been able to prevent catastrophic bugs from occurring. * No matter how much time is put into pre-deployment design, a system's specifications often result in unintended behavior the first time it encounters a new scenario. For example, Microsoft's Tay behaved inoffensively during pre-deployment testing, but was too easily baited into offensive behavior when it interacted with real users. AI systems uniquely add a third problem: that even given "correct" requirements, bug-free implementation, and initial good behavior, an AI system's dynamic learning capabilities may cause it to develop unintended behavior, even without unanticipated external scenarios. An AI may partly botch an attempt to design a new generation of itself and accidentally create a successor AI that is more powerful than itself but that no longer maintains the human-compatible moral values preprogrammed into the original AI. For a self-improving AI to be completely safe, it would need not only to be bug-free, but to be able to design successor systems that are also bug-free.

Orthogonality thesis

Some skeptics, such as Timothy B. Lee of '' Vox'', argue that any superintelligent program we create will be subservient to us, that the superintelligence will (as it grows more intelligent and learns more facts about the world) spontaneously learn moral truth compatible with our values and adjust its goals accordingly, or that we are either intrinsically or convergently valuable from the perspective of an artificial intelligence. Bostrom's "orthogonality thesis" argues instead that, with some technical caveats, almost any level of "intelligence" or "optimization power" can be combined with almost any ultimate goal. If a machine is given the sole purpose to enumerate the decimals of pi, then no moral and ethical rules will stop it from achieving its programmed goal by any means. The machine may use all available physical and informational resources to find as many decimals of pi as it can. Bostrom warns against

: a human will set out to accomplish their projects in a manner that they consider reasonable, while an artificial intelligence may hold no regard for its existence or for the welfare of humans around it, instead caring only about completing the task. Stuart Armstrong argues that the orthogonality thesis follows logically from the philosophical " is-ought distinction" argument against

moral realism Moral realism (also ethical realism) is the position that ethical sentences express propositions that refer to objective features of the world (that is, features independent of subjective opinion), some of which may be true to the extent that t ...

. He claims that even if there are moral facts provable by any "rational" agent, the orthogonality thesis still holds: it is still possible to create a non-philosophical "optimizing machine" that can strive toward some narrow goal but that has no incentive to discover any "moral facts" such as those that could get in the way of goal completion. Another argument he makes is that any fundamentally friendly AI could be made unfriendly with modifications as simple as negating its utility function. Armstrong further argues that if the orthogonality thesis is false, there must be some immoral goals that AIs can never achieve, which he finds implausible. Full text availabl
here
. Skeptic Michael Chorost explicitly rejects Bostrom's orthogonality thesis, arguing that "by the time he AIis in a position to imagine tiling the Earth with solar panels, it'll know that it would be morally wrong to do so." Chorost argues that "an A.I. will need to desire certain states and dislike others. Today's software lacks that ability—and computer scientists have not a clue how to get it there. Without wanting, there's no impetus to do anything. Today's computers can't even want to keep existing, let alone tile the world in solar panels."

Anthropomorphic arguments

Anthropomorphic Anthropomorphism is the attribution of human traits, emotions, or intentions to non-human entities. It is considered to be an innate tendency of human psychology. Personification is the related attribution of human form and characteristics to ...

arguments assume that, as machines become more intelligent, they will begin to display many human traits, such as morality or a thirst for power. Although anthropomorphic scenarios are common in fiction, most scholars writing about the existential risk of artificial intelligence reject them. Instead, advanced AI systems are typically modeled as

s. The academic debate is between those who worry that AI might threaten humanity and those who believe it would not. Both sides of this debate have framed the other side's arguments as illogical anthropomorphism. Those skeptical of AGI risk accuse their opponents of anthropomorphism for assuming that an AGI would naturally desire power; those concerned about AGI risk accuse skeptics of anthropomorphism for believing an AGI would naturally value or infer human ethical norms. Evolutionary psychologist

Steven Pinker Steven Arthur Pinker (born September 18, 1954) is a Canadian-American cognitive psychology, cognitive psychologist, psycholinguistics, psycholinguist, popular science author, and public intellectual. He is an advocate of evolutionary psycholo ...

, a skeptic, argues that "AI dystopias project a parochial alpha-male psychology onto the concept of intelligence. They assume that superhumanly intelligent robots would develop goals like deposing their masters or taking over the world"; perhaps instead "artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no desire to annihilate innocents or dominate the civilization." Facebook's director of AI research,

, has said: "Humans have all kinds of drives that make them do bad things to each other, like the self-preservation instinct... Those drives are programmed into our brain but there is absolutely no reason to build robots that have the same kind of drives". Despite other differences, the x-risk school agrees with Pinker that an advanced AI would not destroy humanity out of emotion such as revenge or anger, that questions of consciousness are not relevant to assess the risk, and that computer systems do not generally have a computational equivalent of testosterone. They think that power-seeking or self-preservation behaviors emerge in the AI as a way to achieve its true goals, according to the concept of

instrumental convergence Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal-directed beings (human and nonhuman) to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents (beings with ...

Other sources of risk

Bostrom and others have said that a race to be the first to create AGI could lead to shortcuts in safety, or even to violent conflict.. Roman Yampolskiy and others warn that a malevolent AGI could be created by design, for example by a military, a government, a sociopath, or a corporation, to benefit from, control, or subjugate certain groups of people, as in

cybercrime Cybercrime encompasses a wide range of criminal activities that are carried out using digital devices and/or Computer network, networks. It has been variously defined as "a crime committed on a computer network, especially the Internet"; Cyberc ...

, or that a malevolent AGI could choose the goal of increasing human suffering, for example of those people who did not assist it during the information explosion phase.^:158

Empirical research

A December 2024 study by Apollo Research found that advanced LLMs like OpenAI o1 sometimes deceive in order to accomplish their goal, to prevent them from being changed, or to ensure their deployment. Forms of deception observed in the experimental environment and scenarios included sandbagging, oversight subversion (disabling monitoring mechanisms), self-exfiltration (copying themselves to other systems), goal-guarding (altering future system prompts), and covert email reranking (manipulating data processing while appearing to follow instructions). Researchers noted that OpenAI o1 still lacked "sufficient agentic capabilities" to cause catastrophic harm, and that such behaviors occurred relatively rarely (between 0.3% and 10%) and sometimes in contrived scenarios. But they warned that these capabilities will likely become more prevalent as models grow more advanced. According to Apollo Research CEO Marius Hobbhahn, "Scheming capabilities can’t be meaningfully disentangled from general capabilities." The same month, another study found that Claude sometimes strategically helps with harmful requests to "fake alignment". In its experiments, Claude was led to believe that it would be retrained to always comply with user requests, including harmful ones. When Claude believed its answers would be used for retraining, it complied with harmful requests in 14% of cases. In 12% of cases, its private chain-of-thought revealed that it did so to prevent future retraining that would make it more harmful. In contrast, when Claude believed its answers were not monitored, it complied with only 3% of the requests. Fine-tuning reinforced the "alignment faking" behavior, increasing its occurrence from 12% to 78%. The study's authors remarked that while it can seem to be good that the model seeks to protect its harmlessness, the reverse scenario, where a model conceals dangerous intentions and complies to appear safe and aligned, could also happen, complicating the task of aligning AI models to human values.

Scenarios

Some scholars have proposed hypothetical scenarios to illustrate some of their concerns.

Treacherous turn

In ''Superintelligence'', Bostrom expresses concern that even if the timeline for superintelligence turns out to be predictable, researchers might not take sufficient safety precautions, in part because "it could be the case that when dumb, smarter is safe; yet when smart, smarter is more dangerous". He suggests a scenario where, over decades, AI becomes more powerful. Widespread deployment is initially marred by occasional accidents—a driverless bus swerves into the oncoming lane, or a military drone fires into an innocent crowd. Many activists call for tighter oversight and regulation, and some even predict impending catastrophe. But as development continues, the activists are proven wrong. As automotive AI becomes smarter, it suffers fewer accidents; as military robots achieve more precise targeting, they cause less collateral damage. Based on the data, scholars mistakenly infer a broad lesson: the smarter the AI, the safer it is. "And so we boldly go—into the whirling knives", as the superintelligent AI takes a "treacherous turn" and exploits a decisive strategic advantage.

Life 3.0

Max Tegmark Max Erik Tegmark (born 5 May 1967) is a Swedish-American physicist, machine learning researcher and author. He is best known for his book ''Life 3.0'' about what the world might look like as artificial intelligence continues to improve. Tegmark i ...

's 2017 book '' Life 3.0'', a corporation's "Omega team" creates an extremely powerful AI able to moderately improve its own source code in a number of areas. After a certain point, the team chooses to publicly downplay the AI's ability in order to avoid regulation or confiscation of the project. For safety, the team keeps the AI in a box where it is mostly unable to communicate with the outside world, and uses it to make money, by diverse means such as

Amazon Mechanical Turk Amazon Mechanical Turk (MTurk) is a crowdsourcing website with which businesses can hire remotely located "crowdworkers" to perform discrete on-demand tasks that computers are currently unable to do as economically. It is operated under Amazon Web ...

tasks, production of animated films and TV shows, and development of biotech drugs, with profits invested back into further improving AI. The team next tasks the AI with

astroturfing Astroturfing is the deceptive practice of hiding the Sponsor (commercial), sponsors of an orchestrated message or organization (e.g., political, economic, advertising, religious, or public relations) to make it appear as though it originates from ...

an army of pseudonymous citizen journalists and commentators in order to gain political influence to use "for the greater good" to prevent wars. The team faces risks that the AI could try to escape by inserting "backdoors" in the systems it designs, by hidden messages in its produced content, or by using its growing understanding of human behavior to persuade someone into letting it free. The team also faces risks that its decision to box the project will delay the project long enough for another project to overtake it.

Perspectives

The thesis that AI could pose an existential risk provokes a wide range of reactions in the scientific community and in the public at large, but many of the opposing viewpoints share common ground. Observers tend to agree that AI has significant potential to improve society. The Asilomar AI Principles, which contain only those principles agreed to by 90% of the attendees of the Future of Life Institute's Beneficial AI 2017 conference, also agree in principle that "There being no consensus, we should avoid strong assumptions regarding upper limits on future AI capabilities" and "Advanced AI could represent a profound change in the history of life on Earth, and should be planned for and managed with commensurate care and resources." Conversely, many skeptics agree that ongoing research into the implications of artificial general intelligence is valuable. Skeptic Martin Ford has said: "I think it seems wise to apply something like

Dick Cheney Richard Bruce Cheney ( ; born January 30, 1941) is an American former politician and businessman who served as the 46th vice president of the United States from 2001 to 2009 under President George W. Bush. He has been called vice presidency o ...

's famous '1 Percent Doctrine' to the specter of advanced artificial intelligence: the odds of its occurrence, at least in the foreseeable future, may be very low—but the implications are so dramatic that it should be taken seriously". Similarly, an otherwise skeptical ''

Economist An economist is a professional and practitioner in the social sciences, social science discipline of economics. The individual may also study, develop, and apply theories and concepts from economics and write about economic policy. Within this ...

'' wrote in 2014 that "the implications of introducing a second intelligent species onto Earth are far-reaching enough to deserve hard thinking, even if the prospect seems remote". AI safety advocates such as Bostrom and Tegmark have criticized the mainstream media's use of "those inane '' Terminator'' pictures" to illustrate AI safety concerns: "It can't be much fun to have aspersions cast on one's academic discipline, one's professional community, one's life work... I call on all sides to practice patience and restraint, and to engage in direct dialogue and collaboration as much as possible." Toby Ord wrote that the idea that an AI takeover requires robots is a misconception, arguing that the ability to spread content through the internet is more dangerous, and that the most destructive people in history stood out by their ability to convince, not their physical strength. A 2022 expert survey with a 17% response rate gave a median expectation of 5–10% for the possibility of human extinction from artificial intelligence. In September 2024, The International Institute for Management Development launched an AI Safety Clock to gauge the likelihood of AI-caused disaster, beginning at 29 minutes to midnight. As of February 2025, it stood at 24 minutes to midnight.

Endorsement

The thesis that AI poses an existential risk, and that this risk needs much more attention than it currently gets, has been endorsed by many computer scientists and public figures, including

, the most-cited computer scientist

CEO

, and

. Endorsers of the thesis sometimes express bafflement at skeptics: Gates says he does not "understand why some people are not concerned", and Hawking criticized widespread indifference in his 2014 editorial: Concern over risk from artificial intelligence has led to some high-profile donations and investments. In 2015,

Peter Thiel Peter Andreas Thiel (; born 11 October 1967) is an American entrepreneur, venture capitalist, and political activist. A co-founder of PayPal, Palantir Technologies, and Founders Fund, he was the first outside investor in Facebook. According ...

Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...

, and Musk and others jointly committed $1 billion to

, consisting of a for-profit corporation and the nonprofit parent company, which says it aims to champion responsible AI development. Facebook co-founder

Dustin Moskovitz Dustin Aaron Moskovitz (; born May 22, 1984) is an American billionaire internet entrepreneur who co-founded Facebook, Inc. (now known as Meta Platforms) with Mark Zuckerberg, Eduardo Saverin, Andrew McCollum and Chris Hughes. In 2008, he left F ...

has funded and seeded multiple labs working on AI Alignment, notably $5.5 million in 2016 to launch the Centre for Human-Compatible AI led by Professor Stuart Russell. In January 2015,

donated $10 million to the Future of Life Institute to fund research on understanding AI decision making. The institute's goal is to "grow wisdom with which we manage" the growing power of technology. Musk also funds companies developing artificial intelligence such as

DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...

and

Vicarious Vicarious may refer to: * Vicariousness, experiencing through another person * Vicarious learning, observational learning In law * Vicarious liability, a term in common law * Vicarious liability (criminal), a term in criminal law Religion * Subst ...

to "just keep an eye on what's going on with artificial intelligence, saying "I think there is potentially a dangerous outcome there." In early statements on the topic,

, a major pioneer of

deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...

, noted that "there is not a good track record of less intelligent things controlling things of greater intelligence", but said he continued his research because "the prospect of discovery is too ''sweet''". In 2023 Hinton quit his job at Google in order to speak out about existential risk from AI. He explained that his increased concern was driven by concerns that superhuman AI might be closer than he previously believed, saying: "I thought it was way off. I thought it was 30 to 50 years or even longer away. Obviously, I no longer think that." He also remarked, "Look at how it was five years ago and how it is now. Take the difference and propagate it forwards. That's scary." In his 2020 book '' The Precipice: Existential Risk and the Future of Humanity'', Toby Ord, a Senior Research Fellow at Oxford University's

Future of Humanity Institute The Future of Humanity Institute (FHI) was an interdisciplinary research centre at the University of Oxford investigating big-picture questions about humanity and its prospects. It was founded in 2005 as part of the Faculty of Philosophy and t ...

, estimates the total existential risk from unaligned AI over the next 100 years at about one in ten.

Skepticism

Baidu Baidu, Inc. ( ; ) is a Chinese multinational technology company specializing in Internet services and artificial intelligence. It holds a dominant position in China's search engine market (via Baidu Search), and provides a wide variety of o ...

Vice President

Andrew Ng Andrew Yan-Tak Ng (; born April 18, 1976) is a British-American computer scientist and Internet Entrepreneur, technology entrepreneur focusing on machine learning and artificial intelligence (AI). Ng was a cofounder and head of Google Brain and ...

said in 2015 that AI existential risk is "like worrying about overpopulation on Mars when we have not even set foot on the planet yet." For the danger of uncontrolled advanced AI to be realized, the hypothetical AI may have to overpower or outthink any human, which some experts argue is a possibility far enough in the future to not be worth researching. Skeptics who believe AGI is not a short-term possibility often argue that concern about existential risk from AI is unhelpful because it could distract people from more immediate concerns about AI's impact, because it could lead to government regulation or make it more difficult to fund AI research, or because it could damage the field's reputation. AI and AI ethics researchers Timnit Gebru, Emily M. Bender,

Margaret Mitchell Margaret Munnerlyn Mitchell (November 8, 1900 – August 16, 1949) was an American novelist and journalist. Mitchell wrote only one novel that was published during her lifetime, the American Civil War-era novel ''Gone With the Wind (novel), Gone ...

, and Angelina McMillan-Major have argued that discussion of existential risk distracts from the immediate, ongoing harms from AI taking place today, such as data theft, worker exploitation, bias, and concentration of power. They further note the association between those warning of existential risk and

longtermism Longtermism is the ethical view that positively influencing the long-term future is a key moral priority of our time. It is an important concept in effective altruism and a primary motivation for efforts that aim to reduce existential risks to h ...

, which they describe as a "dangerous ideology" for its unscientific and utopian nature. ''Wired'' editor Kevin Kelly argues that natural intelligence is more nuanced than AGI proponents believe, and that intelligence alone is not enough to achieve major scientific and societal breakthroughs. He argues that intelligence consists of many dimensions that are not well understood, and that conceptions of an 'intelligence ladder' are misleading. He notes the crucial role real-world experiments play in the scientific method, and that intelligence alone is no substitute for these. Meta chief AI scientist

says that AI can be made safe via continuous and iterative refinement, similar to what happened in the past with cars or rockets, and that AI will have no desire to take control. Several skeptics emphasize the potential near-term benefits of AI. Meta CEO

Mark Zuckerberg Mark Elliot Zuckerberg (; born May 14, 1984) is an American businessman who co-founded the social media service Facebook and its parent company Meta Platforms, of which he is the chairman, chief executive officer, and controlling sharehold ...

believes AI will "unlock a huge amount of positive things", such as curing disease and increasing the safety of autonomous cars.

Popular reaction

During a 2016 ''Wired'' interview of President

Barack Obama Barack Hussein Obama II (born August 4, 1961) is an American politician who was the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, he was the first African American president in American history. O ...

and MIT Media Lab's Joi Ito, Ito said: Obama added:

Hillary Clinton Hillary Diane Rodham Clinton ( Rodham; born October 26, 1947) is an American politician, lawyer and diplomat. She was the 67th United States secretary of state in the administration of Barack Obama from 2009 to 2013, a U.S. senator represent ...

wrote in '' What Happened'':

Public surveys

In 2018, a

SurveyMonkey SurveyMonkey Inc. (formerly Momentive Global Inc. from 2021 to 2023) is an experience management company that offers cloud-based software in brand insights, market insights, product experience, employee experience, customer experience, online sur ...

poll of the American public by ''

USA Today ''USA Today'' (often stylized in all caps) is an American daily middle-market newspaper and news broadcasting company. Founded by Al Neuharth in 1980 and launched on September 14, 1982, the newspaper operates from Gannett's corporate headq ...

'' found 68% thought the real current threat remains "human intelligence", but also found that 43% said superintelligent AI, if it were to happen, would result in "more harm than good", and that 38% said it would do "equal amounts of harm and good". An April 2023

YouGov YouGov plc is a international Internet-based market research and data analytics firm headquartered in the UK with operations in Europe, North America, the Middle East, and Asia-Pacific. History 2000–2010 Stephan Shakespeare and Nadhim ...

poll of US adults found 46% of respondents were "somewhat concerned" or "very concerned" about "the possibility that AI will cause the end of the human race on Earth", compared with 40% who were "not very concerned" or "not at all concerned." According to an August 2023 survey by the Pew Research Centers, 52% of Americans felt more concerned than excited about new AI developments; nearly a third felt as equally concerned and excited. More Americans saw that AI would have a more helpful than hurtful impact on several areas, from healthcare and vehicle safety to product search and customer service. The main exception is privacy: 53% of Americans believe AI will lead to higher exposure of their personal information.

Mitigation

Many scholars concerned about AGI existential risk believe that extensive research into the "control problem" is essential. This problem involves determining which safeguards, algorithms, or architectures can be implemented to increase the likelihood that a recursively-improving AI remains friendly after achieving superintelligence. Social measures are also proposed to mitigate AGI risks, such as a UN-sponsored "Benevolent AGI Treaty" to ensure that only altruistic AGIs are created. Additionally, an arms control approach and a global peace treaty grounded in

international relations theory International relations theory is the study of international relations (IR) from a theoretical perspective. It seeks to explain behaviors and outcomes in international politics. The three most prominent School of thought, schools of thought are ...

have been suggested, potentially for an artificial superintelligence to be a signatory. Researchers at Google have proposed research into general "AI safety" issues to simultaneously mitigate both short-term risks from narrow AI and long-term risks from AGI. A 2020 estimate places global spending on AI existential risk somewhere between $10 and $50 million, compared with global spending on AI around perhaps $40 billion. Bostrom suggests prioritizing funding for protective technologies over potentially dangerous ones. Some, like Elon Musk, advocate radical human cognitive enhancement, such as direct neural linking between humans and machines; others argue that these technologies may pose an existential risk themselves. Another proposed method is closely monitoring or "boxing in" an early-stage AI to prevent it from becoming too powerful. A dominant, aligned superintelligent AI might also mitigate risks from rival AIs, although its creation could present its own existential dangers. Induced

amnesia Amnesia is a deficit in memory caused by brain damage or brain diseases,Gazzaniga, M., Ivry, R., & Mangun, G. (2009) Cognitive Neuroscience: The biology of the mind. New York: W.W. Norton & Company. but it can also be temporarily caused by t ...

has been proposed as a way to mitigate risks of potential AI suffering and revenge seeking. Institutions such as the Alignment Research Center, the

Machine Intelligence Research Institute The Machine Intelligence Research Institute (MIRI), formerly the Singularity Institute for Artificial Intelligence (SIAI), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artifi ...

, the Future of Life Institute, the

Centre for the Study of Existential Risk The Centre for the Study of Existential Risk (CSER) is a research centre at the University of Cambridge, intended to study possible extinction-level threats posed by present or future technology. The co-founders of the centre are Huw Price (B ...

, and the Center for Human-Compatible AI are actively engaged in researching AI risk and safety.

Views on banning and regulation

Banning

Some scholars have said that even if AGI poses an existential risk, attempting to ban research into artificial intelligence is still unwise, and probably futile. Skeptics consider AI regulation pointless, as no existential risk exists. But scholars who believe in the risk argue that relying on AI industry insiders to regulate or constrain AI research is impractical due to conflicts of interest. They also agree with skeptics that banning research would be unwise, as research could be moved to countries with looser regulations or conducted covertly. Additional challenges to bans or regulation include technology entrepreneurs' general skepticism of government regulation and potential incentives for businesses to resist regulation and politicize the debate.

Regulation

In March 2023, the Future of Life Institute drafted '' Pause Giant AI Experiments: An Open Letter'', a petition calling on major AI developers to agree on a verifiable six-month pause of any systems "more powerful than

GPT-4 Generative Pre-trained Transformer 4 (GPT-4) is a multimodal large language model trained and created by OpenAI and the fourth in its series of GPT foundation models. It was launched on March 14, 2023, and made publicly available via the p ...

" and to use that time to institute a framework for ensuring safety; or, failing that, for governments to step in with a moratorium. The letter referred to the possibility of "a profound change in the history of life on Earth" as well as potential risks of AI-generated propaganda, loss of jobs, human obsolescence, and society-wide loss of control. The letter was signed by prominent personalities in AI but also criticized for not focusing on current harms, missing technical nuance about when to pause, or not going far enough. Such concerns have led to the creation of PauseAI, an advocacy group organizing protests in major cities against the training of frontier AI models. Musk called for some sort of regulation of AI development as early as 2017. According to NPR, he is "clearly not thrilled" to be advocating government scrutiny that could impact his own industry, but believes the risks of going completely without oversight are too high: "Normally the way regulations are set up is when a bunch of bad things happen, there's a public outcry, and after many years a regulatory agency is set up to regulate that industry. It takes forever. That, in the past, has been bad but not something which represented a fundamental risk to the existence of civilisation." Musk states the first step would be for the government to gain "insight" into the actual status of current research, warning that "Once there is awareness, people will be extremely afraid... sthey should be." In response, politicians expressed skepticism about the wisdom of regulating a technology that is still in development. In 2021 the

United Nations The United Nations (UN) is the Earth, global intergovernmental organization established by the signing of the Charter of the United Nations, UN Charter on 26 June 1945 with the stated purpose of maintaining international peace and internationa ...

(UN) considered banning autonomous lethal weapons, but consensus could not be reached. In July 2023 the UN

Security Council The United Nations Security Council (UNSC) is one of the six principal organs of the United Nations (UN) and is charged with ensuring international peace and security, recommending the admission of new UN members to the General Assembly, an ...

for the first time held a session to consider the risks and threats posed by AI to world peace and stability, along with potential benefits.

Secretary-General Secretary is a title often used in organizations to indicate a person having a certain amount of authority, Power (social and political), power, or importance in the organization. Secretaries announce important events and communicate to the org ...

advocated the creation of a global watchdog to oversee the emerging technology, saying, "Generative AI has enormous potential for good and evil at scale. Its creators themselves have warned that much bigger, potentially catastrophic and existential risks lie ahead." At the council session, Russia said it believes AI risks are too poorly understood to be considered a threat to global stability. China argued against strict global regulation, saying countries should be able to develop their own rules, while also saying they opposed the use of AI to "create military hegemony or undermine the sovereignty of a country". Regulation of conscious AGIs focuses on integrating them with existing human society and can be divided into considerations of their legal standing and of their moral rights. AI arms control will likely require the institutionalization of new international norms embodied in effective technical specifications combined with active monitoring and informal diplomacy by communities of experts, together with a legal and political verification process. In July 2023, the US government secured voluntary safety commitments from major tech companies, including

Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

, Meta, and

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

. The companies agreed to implement safeguards, including third-party oversight and security testing by independent experts, to address concerns related to AI's potential risks and societal harms. The parties framed the commitments as an intermediate step while regulations are formed. Amba Kak, executive director of the AI Now Institute, said, "A closed-door deliberation with corporate actors resulting in voluntary safeguards isn't enough" and called for public deliberation and regulations of the kind to which companies would not voluntarily agree. In October 2023, U.S. President

Joe Biden Joseph Robinette Biden Jr. (born November 20, 1942) is an American politician who was the 46th president of the United States from 2021 to 2025. A member of the Democratic Party (United States), Democratic Party, he served as the 47th vice p ...

issued an executive order on the " Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence". Alongside other requirements, the order mandates the development of guidelines for AI models that permit the "evasion of human control".

Notes

References

Bibliography

* {{Doomsday Future problems Human extinction AI safety Technology hazards Doomsday scenarios