Self-play
   HOME





Self-play
Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves". Definition and motivation In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage: # It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge. # It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning. Czarnecki et al argue that most of the games that people play for fun are "Ga ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Multi-agent Reinforcement Learning
] Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics. Multi-agent reinforcement learning is closely related to game theory and especially repeated games, as well as multi-agent systems. Its study combines the pursuit of finding ideal algorithms that maximize rewards with a more sociological set of concepts. While research in single-agent reinforcement learning is concerned with finding the algorithm that gets the biggest number of points for one agent, research in multi-agent reinforcement learning evaluates and quantifies social metrics, such as cooperation, reciprocity, equity, social influence, language and discrimination. Definition ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


AlphaZero
AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team released a preprint paper introducing AlphaZero, which would soon play three games by defeating world-champion chess engines Stockfish (chess), Stockfish, Elmo (shogi engine), Elmo, and the three-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use. AlphaZero was trained solely via Self-play (reinforcement learning technique), self-play using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel computing, parallel, with no access to Chess opening book (computers), opening books or Endgame tablebase, endgame tables. After four hours of training, DeepMind estimated AlphaZero wa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Tabula Rasa
''Tabula rasa'' (; Latin for "blank slate") is the idea of individuals being born empty of any built-in mental content, so that all knowledge comes from later perceptions or sensory experiences. Proponents typically form the extreme "nurture" side of the nature versus nurture debate, arguing that humans are born without any "natural" psychological traits and that all aspects of one's personality, social and emotional behaviour, knowledge, or sapience are later imprinted by one's environment onto the mind as one would onto a wax tablet. This idea is the central view posited in the theory of knowledge known as empiricism. Empiricists disagree with the doctrines of innatism or rationalism, which hold that the mind is born already in possession of specific knowledge or rational capacity. Etymology ''Tabula rasa'' is a Latin phrase often translated as ''clean slate'' in English and originates from the Roman '' tabula'', a wax-covered tablet used for notes, which was blanked ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Reinforcement Learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the cumulative reward (the feedback of which might be incomplete or delayed). The search for this balance is known as the exploration–exploitation dilemma. The environment is typically stated in the form of a Markov decision process (MDP), as many reinforcement learning algorithms use dyn ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Chess
Chess is a board game for two players. It is an abstract strategy game that involves Perfect information, no hidden information and no elements of game of chance, chance. It is played on a square chessboard, board consisting of 64 squares arranged in an 8×8 grid. The players, referred to as White and Black in chess, "White" and "Black", each control sixteen Chess piece, pieces: one king (chess), king, one queen (chess), queen, two rook (chess), rooks, two bishop (chess), bishops, two knight (chess), knights, and eight pawn (chess), pawns, with each type of piece having a different pattern of movement. An enemy piece may be captured (removed from the board) by moving one's own piece onto the square it occupies. The object of the game is to "checkmate" (threaten with inescapable capture) the enemy king. There are also several ways a game can end in a draw (chess), draw. The recorded history of chess goes back to at least the emergence of chaturanga—also thought to be an ancesto ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Shogi
, also known as Japanese chess, is a Strategy game, strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as chess, Western chess, chaturanga, xiangqi, Indian chess, and janggi. ''Shōgi'' means general's (''shō'' ) board game (''gi'' ). Shogi was the earliest historical chess-related game to allow captured pieces to be returned to the board by the capturing player. This ''drop rule'' is speculated to have been invented in the 15th century and possibly connected to the practice of 15th-century Mercenary#15th to 18th centuries, mercenaries switching loyalties when captured instead of being killed. The earliest predecessor of the game, chaturanga, originated in India in the 6th century, and the game was likely transmitted to Japan via China or Korea sometime after the Nara period."Shogi". ''Encyclopædia Britannica''. 2002. Shogi in its present form was played as early as the 16th century, while a direct ancesto ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Go (game)
# Go is an abstract strategy game, abstract strategy board game for two players in which the aim is to fence off more territory than the opponent. The game was invented in China more than 2,500 years ago and is believed to be the oldest board game continuously played to the present day. A 2016 survey by the International Go Federation's 75 member nations found that there are over 46 million people worldwide who know how to play Go, and over 20 million current players, the majority of whom live in East Asia. The Game piece (board game), playing pieces are called ''Go equipment#Stones, stones''. One player uses the white stones and the other black stones. The players take turns placing their stones on the vacant intersections (''points'') on the #Boards, board. Once placed, stones may not be moved, but ''captured stones'' are immediately removed from the board. A single stone (or connected group of stones) is ''captured'' when surrounded by the opponent's stones on all Orthogona ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Diplomacy (game)
''Diplomacy'' is a strategic board game created by Allan B. Calhamer in 1954 and released commercially in the United States in 1959. Its main distinctions from most board wargames are its negotiation phases (players spend much of their time forming and betraying alliances with other players and forming beneficial strategies) and the absence of dice and other game elements that produce random effects. Set in Europe in the years leading to the First World War, ''Diplomacy'' is played by two to seven players, each controlling the armed forces of a major European power (or, with fewer players, multiple powers). Each player aims to move their few starting units and defeat those of others to win possession of a majority of strategic cities and provinces marked as "supply centers" on the map; these supply centers allow players who control them to produce more units. Following each round of player negotiations, each player can issue attack and support orders, which are then executed d ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Stratego
''Stratego'' ( ) is a Strategy game, strategy board game for two players on a board of 10×10 squares. Each player controls 40 pieces representing individual Army officer ranks, officer and soldier ranks in an army. The pieces have Napoleonic Wars, Napoleonic wikt:insignia, insignia. The objective of the game is to either find and capture the opponent's ''Flag'' or to capture all movable enemy pieces that the opponent cannot make any further moves. ''Stratego'' has simple enough rules for young children to play but a depth of strategy that is also appealing to adults. The game is a slightly modified copy of an early 20th century France, French game named ''L'Attaque'' ("''The Attack''"), and has been in production in Europe since World War II and the United States since 1961. There are now two- and four-player versions, versions with 10, 30 or 40 pieces per player, and boards with smaller sizes (number of spaces). There are also variant pieces and different . The Internationa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


LessWrong
''LessWrong'' (also written ''Less Wrong'') is a community blog and Internet forum, forum focused on discussion of cognitive biases, philosophy, psychology, economics, rationality, and artificial intelligence, among other topics. It is associated with the rationalist community. Purpose LessWrong describes itself as an online forum and community aimed at improving human reasoning, rationality, and decision-making, with the goal of helping its users hold more accurate beliefs and achieve their personal objectives. The best known posts of ''LessWrong'' are "The Sequences", a series of essays which aim to describe how to avoid the typical failure modes of human reasoning with the goal of improving decision-making and the evaluation of evidence. One suggestion is the use of Bayes' theorem as a decision-making tool. There is also a focus on psychological barriers that prevent good decision-making, including fear conditioning and List of cognitive biases, cognitive biases that have be ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Reinforcement Learning
Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs from supervised learning in not needing labelled input-output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead, the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the cumulative reward (the feedback of which might be incomplete or delayed). The search for this balance is known as the exploration–exploitation dilemma. The environment is typically stated in the form of a Markov decision process (MDP), as many reinforcement learning algorithms use dyn ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]