Nelder–Mead method
   HOME

TheInfoList



OR:

The Nelder–Mead method (also downhill simplex method, amoeba method, or polytope method) is a
numerical method In numerical analysis, a numerical method is a mathematical tool designed to solve numerical problems. The implementation of a numerical method with an appropriate convergence check in a programming language is called a numerical algorithm. Mathem ...
used to find the minimum or maximum of an
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
in a multidimensional space. It is a direct search method (based on function comparison) and is often applied to nonlinear
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
problems for which derivatives may not be known. However, the Nelder–Mead technique is a
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
search method that can converge to non-stationary points * * (algorithm summary online). on problems that can be solved by alternative methods. * Yu, Wen Ci. 1979. "Positive basis and a class of direct search techniques". ''Scientia Sinica'' 'Zhongguo Kexue'' 53—68. * Yu, Wen Ci. 1979. "The convergent property of the simplex evolutionary technique". ''Scientia Sinica'' 'Zhongguo Kexue'' 69–77. * * The Nelder–Mead technique was proposed by
John Nelder John Ashworth Nelder (8 October 1924 – 7 August 2010) was a British statistician known for his contributions to experimental design, analysis of variance, computational statistics, and statistical theory. Contributions Nelder's work was infl ...
and Roger Mead in 1965, as a development of the method of Spendley et al.


Overview

The method uses the concept of a
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
, which is a special
polytope In elementary geometry, a polytope is a geometric object with flat sides ('' faces''). Polytopes are the generalization of three-dimensional polyhedra to any number of dimensions. Polytopes may exist in any general number of dimensions as an ...
of ''n'' + 1 vertices in ''n'' dimensions. Examples of simplices include a line segment on a line, a triangle on a plane, a
tetrahedron In geometry, a tetrahedron (plural: tetrahedra or tetrahedrons), also known as a triangular pyramid, is a polyhedron composed of four triangular faces, six straight edges, and four vertex corners. The tetrahedron is the simplest of all ...
in three-dimensional space and so forth. The method approximates a local optimum of a problem with ''n'' variables when the objective function varies smoothly and is
unimodal In mathematics, unimodality means possessing a unique mode. More generally, unimodality means there is only a single highest value, somehow defined, of some mathematical object. Unimodal probability distribution In statistics, a unimodal p ...
. Typical implementations minimize functions, and we maximize f(\mathbf x) by minimizing - f(\mathbf x). For example, a suspension bridge engineer has to choose how thick each strut, cable, and pier must be. These elements are interdependent, but it is not easy to visualize the impact of changing any specific element. Simulation of such complicated structures is often extremely computationally expensive to run, possibly taking upwards of hours per execution. The Nelder–Mead method requires, in the original variant, no more than two evaluations per iteration, except for the ''shrink'' operation described later, which is attractive compared to some other direct-search optimization methods. However, the overall number of iterations to proposed optimum may be high. Nelder–Mead in ''n'' dimensions maintains a set of ''n'' + 1 test points arranged as a
simplex In geometry, a simplex (plural: simplexes or simplices) is a generalization of the notion of a triangle or tetrahedron to arbitrary dimensions. The simplex is so-named because it represents the simplest possible polytope in any given dimension. ...
. It then extrapolates the behavior of the objective function measured at each test point in order to find a new test point and to replace one of the old test points with the new one, and so the technique progresses. The simplest approach is to replace the worst point with a point reflected through the
centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...
of the remaining ''n'' points. If this point is better than the best current point, then we can try stretching exponentially out along this line. On the other hand, if this new point isn't much better than the previous value, then we are stepping across a valley, so we shrink the simplex towards a better point. An intuitive explanation of the algorithm from "Numerical Recipes": *
The downhill simplex method now takes a series of steps, most steps just moving the point of the simplex where the function is largest (“highest point”) through the opposite face of the simplex to a lower point. These steps are called reflections, and they are constructed to conserve the volume of the simplex (and hence maintain its nondegeneracy). When it can do so, the method expands the simplex in one or another direction to take larger steps. When it reaches a “valley floor”, the method contracts itself in the transverse direction and tries to ooze down the valley. If there is a situation where the simplex is trying to “pass through the eye of a needle”, it contracts itself in all directions, pulling itself in around its lowest (best) point.
Unlike modern optimization methods, the Nelder–Mead heuristic can converge to a non-stationary point, unless the problem satisfies stronger conditions than are necessary for modern methods. Modern improvements over the Nelder–Mead heuristic have been known since 1979. Many variations exist depending on the actual nature of the problem being solved. A common variant uses a constant-size, small simplex that roughly follows the gradient direction (which gives
steepest descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the ...
). Visualize a small triangle on an elevation map flip-flopping its way down a valley to a local bottom. This method is also known as the flexible polyhedron method. This, however, tends to perform poorly against the method described in this article because it makes small, unnecessary steps in areas of little interest.


One possible variation of the NM algorithm

(This approximates the procedure in the original Nelder–Mead article.) We are trying to minimize the function f(\mathbf x), where \mathbf x \in \mathbb^n. Our current test points are \mathbf x_1, \ldots, \mathbf x_. 1. Order according to the values at the vertices: : f(\mathbf x_1) \leq f(\mathbf x_2) \leq \cdots \leq f(\mathbf x_). : Check whether method should stop. See Termination below. Sometimes inappropriately called "convergence". 2. Calculate \mathbf x_o, the
centroid In mathematics and physics, the centroid, also known as geometric center or center of figure, of a plane figure or solid figure is the arithmetic mean position of all the points in the surface of the figure. The same definition extends to any ...
of all points except \mathbf x_. 3. Reflection : Compute reflected point \mathbf x_r = \mathbf x_o + \alpha (\mathbf x_o - \mathbf x_) with \alpha > 0. : If the reflected point is better than the second worst, but not better than the best, i.e. f(\mathbf x_1) \leq f(\mathbf x_r) < f(\mathbf x_n), :: then obtain a new simplex by replacing the worst point \mathbf x_ with the reflected point \mathbf x_r, and go to step 1. 4. Expansion : If the reflected point is the best point so far, f(\mathbf x_r) < f(\mathbf x_1), :: then compute the expanded point \mathbf x_e = \mathbf x_o + \gamma (\mathbf x_r - \mathbf x_o) with \gamma > 1. :: If the expanded point is better than the reflected point, f(\mathbf x_e) < f(\mathbf x_r), ::: then obtain a new simplex by replacing the worst point \mathbf x_ with the expanded point \mathbf x_e and go to step 1; ::: else obtain a new simplex by replacing the worst point \mathbf x_ with the reflected point \mathbf x_r and go to step 1. 5. Contraction : Here it is certain that f(\mathbf x_r) \geq f(\mathbf x_n). (Note that \mathbf x_n is second or "next" to the worst point.) : If f(\mathbf x_r) < f(\mathbf x_), :: then compute the contracted point on the outside \mathbf x_c = \mathbf x_o + \rho(\mathbf x_ - \mathbf x_o) with 0 < \rho \leq 0.5. :: If the contracted point is better than the reflected point, i.e. f(\mathbf x_c) < f(\mathbf x_), ::: then obtain a new simplex by replacing the worst point \mathbf x_ with the contracted point \mathbf x_c and go to step 1; :: Else go to step 6; : If f(\mathbf x_r) \geq f(\mathbf x_), :: then compute the contracted point on the inside \mathbf x_c = \mathbf x_o + \rho(\mathbf x_ - \mathbf x_o) with 0 < \rho \leq 0.5. :: If the contracted point is better than the worst point, i.e. f(\mathbf x_c) < f(\mathbf x_), ::: then obtain a new simplex by replacing the worst point \mathbf x_ with the contracted point \mathbf x_c and go to step 1; :: Else go to step 6; 6. Shrink : Replace all points except the best (\mathbf x_1) with : \mathbf x_i = \mathbf x_1 + \sigma(\mathbf x_i - \mathbf x_1) and go to step 1. Note: \alpha, \gamma, \rho and \sigma are respectively the reflection, expansion, contraction and shrink coefficients. Standard values are \alpha = 1, \gamma = 2, \rho = 1/2 and \sigma = 1/2. For the reflection, since \mathbf x_ is the vertex with the higher associated value among the vertices, we can expect to find a lower value at the reflection of \mathbf x_ in the opposite face formed by all vertices \mathbf x_i except \mathbf x_. For the expansion, if the reflection point \mathbf x_r is the new minimum along the vertices, we can expect to find interesting values along the direction from \mathbf x_o to \mathbf x_r. Concerning the contraction, if f(\mathbf x_r) > f(\mathbf x_n), we can expect that a better value will be inside the simplex formed by all the vertices \mathbf x_i. Finally, the shrink handles the rare case that contracting away from the largest point increases f, something that cannot happen sufficiently close to a non-singular minimum. In that case we contract towards the lowest point in the expectation of finding a simpler landscape. However, Nash notes that finite-precision arithmetic can sometimes fail to actually shrink the simplex, and implemented a check that the size is actually reduced.


Initial simplex

The initial simplex is important. Indeed, a too small initial simplex can lead to a local search, consequently the NM can get more easily stuck. So this simplex should depend on the nature of the problem. However, the original article suggested a simplex where an initial point is given as \mathbf x_1, with the others generated with a fixed step along each dimension in turn. Thus the method is sensitive to scaling of the variables that make up \mathbf x.


Termination

Criteria are needed to break the iterative cycle. Nelder and Mead used the sample standard deviation of the function values of the current simplex. If these fall below some tolerance, then the cycle is stopped and the lowest point in the simplex returned as a proposed optimum. Note that a very "flat" function may have almost equal function values over a large domain, so that the solution will be sensitive to the tolerance. Nash adds the test for shrinkage as another termination criterion. Note that programs terminate, while iterations may converge.


See also

*
Derivative-free optimization Derivative-free optimization is a discipline in mathematical optimization that does not use derivative information in the classical sense to find optimal solutions: Sometimes information about the derivative of the objective function ''f'' is unav ...
*
COBYLA Michael James David Powell (29 July 193619 April 2015) was a British mathematician, who worked in the Department of Applied Mathematics and Theoretical Physics (DAMTP) at the University of Cambridge. Education and early life Born in London, Po ...
*
NEWUOA Michael James David Powell (29 July 193619 April 2015) was a British mathematician, who worked in the Department of Applied Mathematics and Theoretical Physics (DAMTP) at the University of Cambridge. Education and early life Born in London, Po ...
*
LINCOA Michael James David Powell (29 July 193619 April 2015) was a British mathematician, who worked in the Department of Applied Mathematics and Theoretical Physics (DAMTP) at the University of Cambridge. Education and early life Born in London, Po ...
*
Nonlinear conjugate gradient method In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function \displaystyle f(x) :: \displaystyle f(x)=\, Ax-b\, ^2, the minimum of f is obtained when ...
*
Levenberg–Marquardt algorithm In mathematics and computing, the Levenberg–Marquardt algorithm (LMA or just LM), also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least sq ...
* Broyden–Fletcher–Goldfarb–Shanno or BFGS method *
Differential evolution In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as ...
* Pattern search (optimization) *
CMA-ES Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continu ...


References


Further reading

* * * * *


External links


Nelder–Mead (Downhill Simplex) explanation and visualization with the Rosenbrock banana functionJohn Burkardt: Nelder–Mead code in Matlab
- note that a variation of the Nelder–Mead method is also implemented by the Matlab function fminsearch.

* ttps://github.com/fchollet/nelder-mead nelder-mead- A Python implementation of the Nelder–Mead method
SOVA 1.0 (freeware)
- Simplex Optimization for Various Applications

- HillStormer, a practical tool for nonlinear, multivariate and linear constrained Simplex Optimization by Nelder Mead. {{DEFAULTSORT:Nelder-Mead method Optimization algorithms and methods