Self-play is a technique for improving the performance of

reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

agents. Intuitively, agents learn to improve their performance by playing "against themselves".

Definition and motivation

multi-agent reinforcement learning ] Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does acti ...

experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage: # It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge. # It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning. Czarnecki et al argue that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets

L_1, L_2, ..., L_n

, such that any

i < j, \pi_i\in L_i, \pi_j \in L_j

, the strategy

\pi_j

beats the strategy

\pi_i

. Then, in population-based self-play, if the population is larger than

\max_i , L_i,

, then the algorithm would converge to the best possible strategy.

Usage

Self-play is used by the

AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and Go (game), go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind ...

program to improve its performance in the games of

chess Chess is a board game for two players. It is an abstract strategy game that involves Perfect information, no hidden information and no elements of game of chance, chance. It is played on a square chessboard, board consisting of 64 squares arran ...

shogi , also known as Japanese chess, is a Strategy game, strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as chess, Western chess, chaturanga, xiangqi, Indian chess, and janggi. ...

and go. Self-play is also used to train the Cicero AI system to outperform humans at the game of

Diplomacy Diplomacy is the communication by representatives of State (polity), state, International organization, intergovernmental, or Non-governmental organization, non-governmental institutions intended to influence events in the international syste ...

. The technique is also used in training the DeepNash system to play the game

Stratego ''Stratego'' ( ) is a Strategy game, strategy board game for two players on a board of 10×10 squares. Each player controls 40 pieces representing individual Army officer ranks, officer and soldier ranks in an army. The pieces have Napoleonic W ...

Connections to other disciplines

Self-play has been compared to the epistemological concept of

tabula rasa ''Tabula rasa'' (; Latin for "blank slate") is the idea of individuals being born empty of any built-in mental content, so that all knowledge comes from later perceptions or sensory experiences. Proponents typically form the extreme "nurture" ...

that describes the way that humans acquire knowledge from a "blank slate".

References

{{compu-AI-stub Reinforcement learning Machine learning algorithms

Definition and motivation

Usage

Connections to other disciplines

Further reading

References