Natural evolution strategies (NES) are a family of
numerical optimization algorithms for
black box
In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
problems. Similar in spirit to
evolution strategies, they iteratively update the (continuous) parameters of a ''search distribution'' by following the natural gradient towards higher expected fitness.
Method
The general procedure is as follows: the ''parameterized'' search distribution is used to produce a batch of search points, and the
fitness function {{no footnotes, date=May 2015
A fitness function is a particular type of objective function that is used to summarise, as a single figure of merit, how close a given design solution is to achieving the set aims. Fitness functions are used in genet ...
is evaluated at each such point. The distribution’s parameters (which include ''strategy parameters'') allow the algorithm to adaptively capture the (local) structure of the fitness function. For example, in the case of a
Gaussian distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu i ...
, this comprises the mean and the
covariance matrix. From the samples, NES estimates a search gradient on the parameters towards higher expected fitness. NES then performs a gradient ascent step along the
natural gradient, a second order method which, unlike the plain gradient, renormalizes the update with respect to uncertainty. This step is crucial, since it prevents oscillations, premature convergence, and undesired effects stemming from a given parameterization. The entire process reiterates until a stopping criterion is met.
All members of the NES family operate based on the same principles. They differ in the type of
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomeno ...
and the gradient
approximation
An approximation is anything that is intentionally similar but not exactly equal to something else.
Etymology and usage
The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix '' ...
method used. Different search spaces require different search distributions; for example, in low dimensionality it can be highly beneficial to model the full covariance matrix. In high dimensions, on the other hand, a more scalable alternative is to limit the covariance to the
diagonal
In geometry, a diagonal is a line segment joining two vertices of a polygon or polyhedron, when those vertices are not on the same edge. Informally, any sloping line is called diagonal. The word ''diagonal'' derives from the ancient Gree ...
only. In addition, highly multi-modal search spaces may benefit from more
heavy-tailed distributions (such as
Cauchy
Baron Augustin-Louis Cauchy (, ; ; 21 August 178923 May 1857) was a French mathematician, engineer, and physicist who made pioneering contributions to several branches of mathematics, including mathematical analysis and continuum mechanics. He ...
, as opposed to the Gaussian). A last distinction arises between distributions where we can analytically compute the natural gradient, and more general distributions where we need to estimate it from samples.
Search gradients
Let
denote the parameters of the search distribution
and
the fitness function evaluated at
. NES then pursues the objective of maximizing the ''expected fitness under the search distribution''
::
through
gradient ascent. The gradient can be rewritten as
::
:::
:::
:::
:::