Thompson sampling,
named after
William R. Thompson
William Robin Thompson (June 29, 1887January 30, 1972) was a Canadian entomologist and also wrote on the philosophy of science in his book ''Science and Common Sense: An Aristotelian Excursion'' (1937). He specialized in the biological control of ...
, is a
heuristic
A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediat ...
for choosing actions that addresses the
exploration-exploitation dilemma
The exploration-exploitation dilemma, also known as the explore-exploit tradeoff, is a fundamental concept in decision-making that arises in many domains. It is depicted as the balancing act between two opposing strategies. Exploitation involves c ...
in the
multi-armed bandit
In probability theory and machine learning, the multi-armed bandit problem (sometimes called the ''K''- or ''N''-armed bandit problem) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices ...
problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
Description
Consider a set of contexts
, a set of actions
, and rewards in
. In each round, the player obtains a context
, plays an action
and receives a reward
following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.
The elements of Thompson sampling are as follows:
# a likelihood function
;
# a set
of parameters
of the distribution of
;
# a prior distribution
on these parameters;
# past observations triplets
;
# a posterior distribution
, where
is the likelihood function.
Thompson sampling consists in playing the action
according to the probability that it maximizes the expected reward; action
is chosen with probability
:
where
is the
indicator function
In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x ...
.
In practice, the rule is implemented by sampling. In each round, parameters
are sampled from the posterior
, and an action
chosen that maximizes