The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus)

function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-orie ...

is a

smooth maximum In mathematics, a smooth maximum of an indexed family ''x''1, ..., ''x'n'' of numbers is a smooth approximation to the maximum function \max(x_1,\ldots,x_n), meaning a parametric family of functions m_\alpha(x_1,\ldots,x_n) such that ...

– a smooth

approximation An approximation is anything that is intentionally similar but not exactly equal to something else. Etymology and usage The word ''approximation'' is derived from Latin ''approximatus'', from ''proximus'' meaning ''very near'' and the prefix ...

to the

maximum In mathematical analysis, the maximum and minimum of a function (mathematics), function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given Interval (ma ...

function, mainly used by

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

algorithms. It is defined as the

logarithm In mathematics, the logarithm of a number is the exponent by which another fixed value, the base, must be raised to produce that number. For example, the logarithm of to base is , because is to the rd power: . More generally, if , the ...

of the sum of the

exponentials Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value *Expo ...

of the arguments:

\mathrm(x_1, \dots, x_n) = \log\left( \exp(x_1) + \cdots + \exp(x_n) \right).

Properties

The LogSumExp function domain is

\R^n

, the

real coordinate space In mathematics, the real coordinate space or real coordinate ''n''-space, of dimension , denoted or , is the set of all ordered -tuples of real numbers, that is the set of all sequences of real numbers, also known as '' coordinate vectors''. ...

, and its codomain is

\R

, the

real line A number line is a graphical representation of a straight line that serves as spatial representation of numbers, usually graduated like a ruler with a particular origin (geometry), origin point representing the number zero and evenly spaced mark ...

. It is an approximation to the maximum

\max_i x_i

with the following bounds

\max \leq \mathrm(x_1, \dots, x_n) \leq \max + \log(n).

The first inequality is strict unless

n = 1

. The second inequality is strict unless all arguments are equal. (Proof: Let

m = \max_i x_i

. Then

\exp(m) \leq \sum_^n \exp(x_i) \leq n \exp(m)

. Applying the logarithm to the inequality gives the result.) In addition, we can scale the function to make the bounds tighter. Consider the function

\frac 1 t \mathrm(tx_1, \dots, tx_n)

. Then

\max < \frac 1 t \mathrm(tx_1, \dots, tx_n) \leq \max + \frac.

(Proof: Replace each

x_i

with

tx_i

for some

t>0

in the inequalities above, to give

\max < \mathrm(tx_1, \dots, tx_n)\leq \max + \log(n).

and, since

t>0

t \max < \mathrm(tx_1, \dots, tx_n)\leq t\max + \log(n).

finally, dividing by

t

gives the result.) Also, if we multiply by a negative number instead, we of course find a comparison to the

\min

function:

\min - \frac \leq \frac 1  \mathrm(-tx) < \min.

The LogSumExp function is

convex Convex or convexity may refer to: Science and technology * Convex lens, in optics Mathematics * Convex set, containing the whole line segment that joins points ** Convex polygon, a polygon which encloses a convex set of points ** Convex polytop ...

, and is

strictly increasing In mathematical writing, the term strict refers to the property of excluding equality and equivalence and often occurs in the context of inequality and monotonic functions. It is often attached to a technical term to indicate that the exclusiv ...

everywhere in its domain. It is not strictly convex, since it is

affine Affine may describe any of various topics concerned with connections or affinities. It may refer to: * Affine, a Affinity_(law)#Terminology, relative by marriage in law and anthropology * Affine cipher, a special case of the more general substi ...

(linear plus a constant) on the diagonal and parallel lines: :

\mathrm(x_1 + c, \dots, x_n + c) =\mathrm(x_1, \dots, x_n) + c.

Other than this direction, it is strictly convex (the Hessian has rank ), so for example restricting to a

hyperplane In geometry, a hyperplane is a generalization of a two-dimensional plane in three-dimensional space to mathematical spaces of arbitrary dimension. Like a plane in space, a hyperplane is a flat hypersurface, a subspace whose dimension is ...

that is transverse to the diagonal results in a strictly convex function. See

\mathrm_0^+

, below. Writing

\mathbf = (x_1, \dots, x_n),

the

partial derivative In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). P ...

s are:

\frac = 
\frac,

which means the

gradient In vector calculus, the gradient of a scalar-valued differentiable function f of several variables is the vector field (or vector-valued function) \nabla f whose value at a point p gives the direction and the rate of fastest increase. The g ...

of LogSumExp is the

softmax function The softmax function, also known as softargmax or normalized exponential function, converts a tuple of real numbers into a probability distribution of possible outcomes. It is a generalization of the logistic function to multiple dimensions, a ...

. The

convex conjugate In mathematics and mathematical optimization, the convex conjugate of a function is a generalization of the Legendre transformation which applies to non-convex functions. It is also known as Legendre–Fenchel transformation, Fenchel transformati ...

of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations

The LSE function is often encountered when the usual arithmetic computations are performed on a

logarithmic scale A logarithmic scale (or log scale) is a method used to display numerical data that spans a broad range of values, especially when there are significant differences among the magnitudes of the numbers involved. Unlike a linear Scale (measurement) ...

, as in

log probability In probability theory and computer science, a log probability is simply a logarithm of a probability. The use of log probabilities means representing probabilities on a logarithmic scale (-\infty, 0], instead of the standard , 1/math> unit interva ...

. Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale:

\mathrm(\log(x_1), ..., \log(x_n)) = \log(x_1 + \dots + x_n)

A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers. Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).

\mathrm(x_1, \dots, x_n) = x^* + \log\left( \exp(x_1-x^*)+ \cdots + \exp(x_n-x^*) \right)

where

x^* = \max

Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

A strictly convex log-sum-exp type function

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function by adding an extra argument set to zero:

\mathrm_0^+(x_1,...,x_n) = \mathrm(0,x_1,...,x_n)

This function is a proper Bregman generator (strictly convex and

differentiable In mathematics, a differentiable function of one real variable is a function whose derivative exists at each point in its domain. In other words, the graph of a differentiable function has a non- vertical tangent line at each interior point in ...

). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family. In

tropical analysis In the mathematical discipline of idempotent analysis, tropical analysis is the study of the tropical semiring. Applications The max tropical semiring can be used appropriately to determine marking times within a given Petri net and a vector fil ...

, this is the sum in the

log semiring In mathematics, in the field of tropical analysis, the log semiring is the semiring structure on the logarithmic scale, obtained by considering the extended real numbers as logarithms. That is, the operations of addition and multiplication are def ...

References

{{refend Logarithms

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References