AlphaGo Zero is a version of

DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...

's Go software

AlphaGo AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that c ...

. AlphaGo's team published an article in ''

Nature Nature is an inherent character or constitution, particularly of the Ecosphere (planetary), ecosphere or the universe as a whole. In this general sense nature refers to the Scientific law, laws, elements and phenomenon, phenomena of the physic ...

'' in October 2017 introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero: surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0; reached the level of AlphaGo Master in 21 days; and exceeded all previous versions in 40 days. Training

artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...

(AI) without

dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record o ...

s derived from human experts has significant implications for the development of AI with superhuman skills, as expert data is "often expensive, unreliable, or simply unavailable."

Demis Hassabis Sir Demis Hassabis (born 27 July 1976) is a British artificial intelligence (AI) researcher, and entrepreneur. He is the chief executive officer and co-founder of Google DeepMind, and Isomorphic Labs, and a UK Government AI Adviser. In 2024, Ha ...

, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge". Furthermore, AlphaGo Zero performed better than standard deep reinforcement learning models (such as Deep Q-Network implementations) due to its integration of

Monte Carlo tree search In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree. MCTS ...

. David Silver, one of the first authors of DeepMind's papers published in ''Nature'' on AlphaGo, said that it is possible to have generalized AI algorithms by removing the need to learn from humans. Google later developed

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind ...

, a generalized version of AlphaGo Zero that could play

chess Chess is a board game for two players. It is an abstract strategy game that involves Perfect information, no hidden information and no elements of game of chance, chance. It is played on a square chessboard, board consisting of 64 squares arran ...

and Shōgi in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (

Stockfish Stockfish is unsalted fish, especially cod, dried by cold air and wind on wooden racks (which are called "hjell" in Norway) on the foreshore. The drying of food is the world's oldest known preservation method, and dried fish has a storage li ...

) and a top Shōgi program (

Elmo Elmo is a Muppet character on the children's television show ''Sesame Street''. A furry red monster who speaks in a high-pitched falsetto voice and frequently refers to himself in the third person, he hosts the last full 15-minute segmen ...

Architecture

The network in AlphaGo Zero is a ResNet with two heads. * The stem of the network takes as input a 17x19x19 tensor representation of the Go board. ** 8 channels are the positions of the current player's stones from the last eight time steps. (1 if there is a stone, 0 otherwise. If the time step go before the beginning of the game, then 0 in all positions.) ** 8 channels are the positions of the other player's stones from the last eight time steps. ** 1 channel is all 1 if black is to move, and 0 otherwise. * The body is a ResNet with either 20 or 40 residual blocks and 256 channels. * There are two heads, a policy head and a value head. ** Policy head outputs a

logit In statistics, the logit ( ) function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in Data transformation (statistics), data transformations. Ma ...

array of size

19 \times 19 + 1

, representing the logit of making a move in one of the points, plus the logit of passing. ** Value head outputs a number in the range

(-1, +1)

, representing the expected score for the current player. -1 represents current player losing, and +1 winning.

Training

AlphaGo Zero's neural network was trained using

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

, with 64 GPU workers and 19 CPU parameter servers. Only four TPUs were used for inference. The

neural network A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either biological cells or signal pathways. While individual neurons are simple, many of them together in a network can perfor ...

initially knew nothing about Go beyond the

rules Rule or ruling may refer to: Human activity * The exercise of political or personal control by someone with authority or power * Business rule, a rule pertaining to the structure or behavior internal to a business * School rule, a rule tha ...

. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome. In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession. It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level. Training cost 3e23 FLOPs, ten times that of AlphaZero. For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run. DeepMind submitted its initial findings in a paper to ''Nature'' in April 2017, which was then published in October 2017.

Hardware cost

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.

Applications

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as

protein folding Protein folding is the physical process by which a protein, after Protein biosynthesis, synthesis by a ribosome as a linear chain of Amino acid, amino acids, changes from an unstable random coil into a more ordered protein tertiary structure, t ...

(see

AlphaFold AlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predictions of protein structure. It is designed using deep learning techniques. AlphaFold 1 (2018) placed first in the overall ...

) or accurately simulating chemical reactions. AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car. DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.

Reception

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo.

Oren Etzioni Oren Etzioni (born 1964) is Professor Emeritus of Computer Science at the University of Washington, and founding CEO of the Allen Institute for Artificial Intelligence (AI2). Etzioni is a co-founder oVercept an AI startup. Etzioni is the found ...

of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do it—and their ability to train the system in 40 days, on four TPUs".

The Guardian ''The Guardian'' is a British daily newspaper. It was founded in Manchester in 1821 as ''The Manchester Guardian'' and changed its name in 1959, followed by a move to London. Along with its sister paper, ''The Guardian Weekly'', ''The Guardi ...

called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of

Sheffield University The University of Sheffield (informally Sheffield University or TUOS) is a public research university in Sheffield, South Yorkshire, England. Its history traces back to the foundation of Sheffield Medical School in 1828, Firth College in 1879 ...

and Tom Mitchell of

Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...

, who called it an impressive feat and an “outstanding engineering accomplishment" respectively.

Mark Pesce Mark D. Pesce ( ; born 1962) is an American-Australian author, researcher, engineer, futurist and teacher. Early life Pesce was born in Everett, Massachusetts in 1962. In September 1980, Pesce attended Massachusetts Institute of Technology (M ...

of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".

Gary Marcus Gary Fred Marcus (born 1970) is an American psychologist, cognitive scientist, and author, known for his research on the intersection of cognitive psychology, neuroscience, and artificial intelligence (AI). Marcus is professor ''emeritus'' of ps ...

, a psychologist at

New York University New York University (NYU) is a private university, private research university in New York City, New York, United States. Chartered in 1831 by the New York State Legislature, NYU was founded in 1832 by Albert Gallatin as a Nondenominational ...

, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains". In response to the reports, South Korean Go professional

Lee Sedol Lee Sedol (; born 2 March 1983), or Lee Se-dol, is a South Korean former professional Go player of 9 dan rank. As of February 2016, he ranked second in international titles (18), behind only Lee Chang-ho (21). His nickname is "The Stro ...

said, "The previous version of AlphaGo wasn’t perfect, and I believe that’s why AlphaGo Zero was made." On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players. Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGo's playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, I’ve become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. It’s now between computers." Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team. "Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said. Chinese Go professional Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."

Comparison with predecessors

AlphaZero

On 5 December 2017, DeepMind team released a preprint on

arXiv arXiv (pronounced as "archive"—the X represents the Chi (letter), Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not Scholarly pee ...

, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in

shogi , also known as Japanese chess, is a Strategy game, strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as chess, Western chess, chaturanga, xiangqi, Indian chess, and janggi. ...

, and Go, defeating world-champion programs,

, and 3-day version of AlphaGo Zero in each case. AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ)

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include: * AZ has hard-coded rules for setting search hyperparameters. * The neural network is now updated continually. * Chess (unlike Go) can end in a tie; therefore AZ can take into account the possibility of a tie game. An

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...

program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

References

External links and further reading

* * *
AlphaGo Zero Games
{{Go (game) 2017 software Applied machine learning AlphaGo 2017 in go