In mathematics, a smooth maximum of an

indexed family In mathematics, a family, or indexed family, is informally a collection of objects, each associated with an index from some index set. For example, a ''family of real numbers, indexed by the set of integers'' is a collection of real numbers, whe ...

''x''₁, ..., ''x''_''n'' of numbers is a smooth approximation to the

maximum In mathematical analysis, the maxima and minima (the respective plurals of maximum and minimum) of a function, known collectively as extrema (the plural of extremum), are the largest and smallest value of the function, either within a given r ...

function

\max(x_1,\ldots,x_n),

meaning a

parametric family In mathematics and its applications, a parametric family or a parameterized family is a family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are parametrized (fa ...

of functions

m_\alpha(x_1,\ldots,x_n)

such that for every , the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

For large positive values of the parameter

\alpha > 0

, the following formulation is a smooth,

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point i ...

approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum. :

\mathcal_\alpha (x_1,\ldots,x_n) = \frac

\mathcal_\alpha

has the following properties: #

\mathcal_\alpha\to \max

\alpha\to\infty

\mathcal_0

is the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ) or arithmetic average, or just the ''mean'' or the '' average'' (when the context is clear), is the sum of a collection of numbers divided by the count of numbers in the collection. The coll ...

of its inputs #

\mathcal_\alpha\to \min

\alpha\to -\infty

The gradient of

\mathcal_

is closely related to

softmax The softmax function, also known as softargmax or normalized exponential function, converts a vector of real numbers into a probability distribution of possible outcomes. It is a generalization of the logistic function to multiple dimensions, a ...

and is given by :

\nabla_\mathcal_\alpha (x_1,\ldots,x_n) = \frac + \alpha(x_i - \mathcal_\alpha (x_1,\ldots,x_n))

This makes the softmax function useful for optimization techniques that use

gradient descent In mathematics, gradient descent (also often called steepest descent) is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of ...

LogSumExp

Another smooth maximum is

LogSumExp The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function (mathematics), function is a smooth maximum – a smooth function, smooth approximation to the maximum function, mainly used by machine learning algorithms. It is def ...

: :

\mathrm_\alpha(x_1, \ldots,  x_n) = (1/\alpha)\log( \exp(\alpha x_1) + \ldots + \exp( \alpha x_n))

This can also be normalized if the

x_i

are all non-negative, yielding a function with domain

[0,\infty)^n

and range

[0, \infty)

: :

g(x_1, \ldots,  x_n) = \log( \exp(x_1) + \ldots + \exp(x_n) - (n - 1) )

The

(n - 1)

term corrects for the fact that

\exp(0) = 1

by canceling out all but one zero exponential, and

\log 1 = 0

if all

x_i

are zero.

p-Norm

Another smooth maximum is the p-norm: :

, ,  (x_1, \ldots,  x_n) , , _p = \left( , x_1, ^p + \cdots + , x_n, ^p \right)^

which converges to

, ,  (x_1, \ldots, x_n) , , _\infty = \max_ , x_i,

p \to \infty

. An advantage of the p-norm is that it is a

norm Naturally occurring radioactive materials (NORM) and technologically enhanced naturally occurring radioactive materials (TENORM) consist of materials, usually industrial wastes or by-products enriched with radioactive elements found in the envir ...

. As such it is "scale invariant" (homogeneous):

, ,  (\lambda x_1, \ldots,  \lambda x_n) , , _p = , \lambda,  \times , ,  (x_1, \ldots,  x_n) , , _p

, and it satisfies the triangular inequality.

Other choices of smoothing function

\begin
\textstyle\max_\varepsilon(a, b)
&= \frac \\
&= \frac
\end

where

\varepsilon \geq 0

is a parameter. As

\varepsilon \to 0

, \cdot, _\varepsilon \to , \cdot,

and thus

\textstyle\max_\varepsilon \to \max

References

{{Reflist Mathematical notation Basic concepts in set theory https://www.johndcook.com/soft_maximum.pdf M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," ''in Proc. ESANN'', Apr. 2014, pp. 271-276. (https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2014-153.pdf)

Examples

LogSumExp

p-Norm

Other choices of smoothing function

See also

References