Simultaneous perturbation stochastic approximation (SPSA) is an
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
ic method for optimizing systems with multiple unknown
parameters
A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...
. It is a type of
stochastic approximation
Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving ...
algorithm. As an optimization method, it is appropriately suited to large-scale population models, adaptive modeling, simulation
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, and
atmospheric model
An atmospheric model is a mathematical model constructed around the full set of primitive dynamical equations which govern atmospheric motions. It can supplement these equations with parameterizations for turbulent diffusion, radiation, mois ...
ing. Many examples are presented at the SPSA website http://www.jhuapl.edu/SPSA. A comprehensive book on the subject is Bhatnagar et al. (2013). An early paper on the subject is Spall (1987) and the foundational paper providing the key theory and justification is Spall (1992).
SPSA is a descent method capable of finding global minima, sharing this property with other methods as
simulated annealing
Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. ...
. Its main feature is the gradient approximation that requires only two measurements of the objective function, regardless of the dimension of the optimization problem. Recall that we want to find the optimal control
with loss
function
:
:
Both Finite Differences Stochastic Approximation (FDSA)
and SPSA use the same iterative process:
:
where
represents the
iterate,
is the estimate of the gradient of the objective function
evaluated at
, and
is a positive number sequence converging to 0. If
is a ''p''-dimensional vector, the
component of the
symmetric
Symmetry (from grc, συμμετρία "agreement in dimensions, due proportion, arrangement") in everyday language refers to a sense of harmonious and beautiful proportion and balance. In mathematics, "symmetry" has a more precise definiti ...
finite difference gradient estimator is:
:FD:
''1 ≤i ≤p'', where
is the unit vector with a 1 in the
place, and
is a small positive number that decreases with ''n''. With this method, ''2p'' evaluations of ''J'' for each
are needed. Clearly, when ''p'' is large, this estimator loses efficiency.
Let now
be a random perturbation vector. The
component of the stochastic perturbation gradient estimator is:
:SP:
Remark that FD perturbs only one direction at a time, while the SP estimator disturbs all directions at the same time (the numerator is identical in all ''p'' components). The number of loss function measurements needed in the SPSA method for each
is always 2, independent of the
dimension
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...
''p''. Thus, SPSA uses ''p'' times fewer function evaluations than FDSA, which makes it a lot more efficient.
Simple experiments with ''p=2'' showed that SPSA converges in the same number of iterations as FDSA. The latter follows
approximately
An approximation is anything that is intentionally similar but not exactly equal to something else.
Etymology and usage
The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ' ...
the
steepest descent direction, behaving like the gradient method. On the other hand, SPSA, with the random search direction, does not follow exactly the gradient path. In average though, it tracks it nearly because the gradient approximation is an almost
unbiased
estimator of the gradient, as shown in the following lemma.
Convergence lemma
Denote by
:
the bias in the estimator
. Assume that
are all mutually independent with zero-mean, bounded second
moments, and
uniformly bounded. Then
→0 w.p. 1.
Sketch of the proof
The main
idea
In common usage and in philosophy, ideas are the results of thought. Also in philosophy, ideas can also be mental representational images of some object. Many philosophers have considered ideas to be a fundamental ontological category of bei ...
is to use conditioning on
to express