mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

, a smooth maximum of an

indexed family In mathematics, a family, or indexed family, is informally a collection of objects, each associated with an index from some index set. For example, a family of real numbers, indexed by the set of integers, is a collection of real numbers, wher ...

''x''₁, ..., ''x''_''n'' of numbers is a smooth approximation to the

maximum In mathematical analysis, the maximum and minimum of a function (mathematics), function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given Interval (ma ...

function

\max(x_1,\ldots,x_n),

meaning a

parametric family In mathematics and its applications, a parametric family or a parameterized family is a indexed family, family of objects (a set of related objects) whose differences depend only on the chosen values for a set of parameters. Common examples are p ...

of functions

m_\alpha(x_1,\ldots,x_n)

such that for every , the function is smooth, and the family converges to the maximum function as . The concept of smooth minimum is similarly defined. In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, as and as . The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

Examples

Boltzmann operator

For large positive values of the parameter

\alpha > 0

, the following formulation is a smooth,

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...

approximation of the maximum function. For negative values of the parameter that are large in absolute value, it approximates the minimum. :

\mathcal_\alpha (x_1,\ldots,x_n) = \frac

\mathcal_\alpha

has the following properties: #

\mathcal_\alpha\to \max

\alpha\to\infty

\mathcal_0

is the

arithmetic mean In mathematics and statistics, the arithmetic mean ( ), arithmetic average, or just the ''mean'' or ''average'' is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results fr ...

of its inputs #

\mathcal_\alpha\to \min

\alpha\to -\infty

The gradient of

\mathcal_

is closely related to

softmax The softmax function, also known as softargmax or normalized exponential function, converts a tuple of real numbers into a probability distribution of possible outcomes. It is a generalization of the logistic function to multiple dimensions, a ...

and is given by :

\nabla_\mathcal_\alpha (x_1,\ldots,x_n) = \frac + \alpha(x_i - \mathcal_\alpha (x_1,\ldots,x_n))

This makes the softmax function useful for optimization techniques that use

gradient descent Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function. The idea is to take repeated steps in the opposite direction of the gradi ...

. This operator is sometimes called the Boltzmann operator, after the

Boltzmann distribution In statistical mechanics and mathematics, a Boltzmann distribution (also called Gibbs distribution Translated by J.B. Sykes and M.J. Kearsley. See section 28) is a probability distribution or probability measure that gives the probability tha ...

LogSumExp

Another smooth maximum is LogSumExp: :

\mathrm_\alpha(x_1, \ldots,  x_n) = \frac \log \sum_^n \exp \alpha x_i

This can also be normalized if the

x_i

are all non-negative, yielding a function with domain

is defined as follows:
: \mathrm_\alpha(x) = \frac \log \frac \sum_^n \exp \alpha x_i It is a Non-expansive function">non-expansive operator. As \alpha \to \infty, it acts like a maximum. As \alpha \to 0, it acts like an arithmetic mean. As \alpha \to -\infty, it acts like a minimum. This operator can be viewed as a particular instantiation of the quasi-arithmetic mean. It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence. The operator has previously been utilized in other areas, such as power engineering.

p-Norm

Another smooth maximum is the

p-norm In mathematics, the spaces are function spaces defined using a natural generalization of the -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue , although according to the Bourbaki ...

: :

\,  (x_1, \ldots,  x_n) \, _p = \left( \sum_^n , x_i, ^p \right)^\frac

which converges to

\,  (x_1, \ldots, x_n) \, _\infty = \max_ , x_i,

p \to \infty

. An advantage of the p-norm is that it is a

norm Norm, the Norm or NORM may refer to: In academic disciplines * Normativity, phenomenon of designating things as good or bad * Norm (geology), an estimate of the idealised mineral content of a rock * Norm (philosophy), a standard in normative e ...

. As such it is scale invariant (

homogeneous Homogeneity and heterogeneity are concepts relating to the uniformity of a substance, process or image. A homogeneous feature is uniform in composition or character (i.e., color, shape, size, weight, height, distribution, texture, language, i ...

\,  (\lambda x_1, \ldots,  \lambda x_n) \, _p = , \lambda,  \cdot \,  (x_1, \ldots,  x_n) \, _p

, and it satisfies the

triangle inequality In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of Degeneracy (mathematics)#T ...

Smooth maximum unit

The following binary operator is called the Smooth Maximum Unit (SMU): :

\begin
\textstyle\max_\varepsilon(a, b)
&= \frac \\
&= \frac
\end

where

\varepsilon \geq 0

is a parameter. As

\varepsilon \to 0

, \cdot, _\varepsilon \to , \cdot,

and thus

\textstyle\max_\varepsilon \to \max

References