HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a
convex function In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (the set of poi ...
of an
integral In mathematics, an integral assigns numbers to functions in a way that describes displacement, area, volume, and other concepts that arise by combining infinitesimal data. The process of finding integrals is called integration. Along with ...
to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by
Otto Hölder Ludwig Otto Hölder (December 22, 1859 – August 29, 1937) was a German mathematician born in Stuttgart. Early life and education Hölder was the youngest of three sons of professor Otto Hölder (1811–1890), and a grandson of professor Chri ...
in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple
corollary In mathematics and logic, a corollary ( , ) is a theorem of less importance which can be readily deduced from a previous, more notable statement. A corollary could, for instance, be a proposition which is incidentally proved while proving another ...
that the opposite is true of concave transformations. Jensen's inequality generalizes the statement that the secant line of a convex function lies ''above'' the graph of the function, which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (for ''t'' ∈  ,1, :t f(x_1) + (1-t) f(x_2), while the graph of the function is the convex function of the weighted means, :f(t x_1 + (1-t) x_2). Thus, Jensen's inequality is :f(t x_1 + (1-t) x_2) \leq t f(x_1) + (1-t) f(x_2). In the context of
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, it is generally stated in the following form: if ''X'' is a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
and is a convex function, then :\varphi(\operatorname \leq \operatorname \left varphi(X)\right The difference between the two sides of the inequality, \operatorname \left varphi(X)\right- \varphi\left(\operatorname right), is called the Jensen gap.


Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language of
measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
or (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to its ''full strength''.


Finite form

For a real
convex function In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (the set of poi ...
\varphi, numbers x_1, x_2, \ldots, x_n in its domain, and positive weights a_i, Jensen's inequality can be stated as: and the inequality is reversed if \varphi is
concave Concave or concavity may refer to: Science and technology * Concave lens * Concave mirror Mathematics * Concave function, the negative of a convex function * Concave polygon, a polygon which is not convex * Concave set In geometry, a subset o ...
, which is Equality holds if and only if x_1=x_2=\cdots =x_n or \varphi is linear on a domain containing x_1,x_2,\cdots ,x_n. As a particular case, if the weights a_i are all equal, then () and () become For instance, the function is ''
concave Concave or concavity may refer to: Science and technology * Concave lens * Concave mirror Mathematics * Concave function, the negative of a convex function * Concave polygon, a polygon which is not convex * Concave set In geometry, a subset o ...
'', so substituting \varphi(x) = \log(x) in the previous formula () establishes the (logarithm of the) familiar arithmetic-mean/geometric-mean inequality: :\log\!\left( \frac\right) \geq \frac \quad \text \quad \frac \geq \sqrt /math> A common application has as a function of another variable (or set of variables) , that is, x_i = g(t_i). All of this carries directly over to the general continuous case: the weights are replaced by a non-negative integrable function , such as a probability distribution, and the summations are replaced by integrals.


Measure-theoretic and probabilistic form

Let (\Omega, A, \mu) be a probability space. Let f : \Omega \to \mathbb be a \mu-measurable function and \varphi : \mathbb \to \mathbb be convex. Then: \varphi\left(\int_\Omega f \,\mathrm\mu\right) \leq \int_\Omega \varphi \circ f \,\mathrm\mu In real analysis, we may require an estimate on :\varphi\left(\int_a^b f(x)\, dx\right) where a, b \in \mathbb, and f\colon , b\to \R is a non-negative Lebesgue- integrable function. In this case, the Lebesgue measure of , b/math> need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get :\varphi\left(\frac\int_a^b f(x)\, dx\right) \le \frac \int_a^b \varphi(f(x)) \,dx. The same result can be equivalently stated in a
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
setting, by a simple change of notation. Let (\Omega, \mathfrak,\operatorname) be a probability space, ''X'' an integrable real-valued
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
and a
convex function In mathematics, a real-valued function is called convex if the line segment between any two points on the graph of the function lies above the graph between the two points. Equivalently, a function is convex if its epigraph (the set of poi ...
. Then: :\varphi\left(\operatorname right) \leq \operatorname \left \varphi(X) \right In this probability setting, the measure is intended as a probability \operatorname, the integral with respect to as an
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
\operatorname, and the function f as a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
''X''. Note that the equality holds if and only if is a linear function on some convex set A such that \mathrm(X \in A) = 1 (which follows by inspecting the measure-theoretical proof below).


General inequality in a probabilistic setting

More generally, let ''T'' be a real topological vector space, and ''X'' a ''T''-valued integrable random variable. In this general setting, ''integrable'' means that there exists an element \operatorname /math> in ''T'', such that for any element ''z'' in the
dual space In mathematics, any vector space ''V'' has a corresponding dual vector space (or just dual space for short) consisting of all linear forms on ''V'', together with the vector space structure of pointwise addition and scalar multiplication by cons ...
of ''T'': \operatorname, \langle z, X \rangle, <\infty , and \langle z, \operatorname rangle = \operatorname langle z, X \rangle/math>. Then, for any measurable convex function and any sub- σ-algebra \mathfrak of \mathfrak: :\varphi\left(\operatorname\left \mid\mathfrak\rightright) \leq \operatorname\left varphi(X)\mid\mathfrak\right Here \operatorname cdot\mid\mathfrak/math> stands for the expectation conditioned to the σ-algebra \mathfrak. This general statement reduces to the previous ones when the topological vector space is the real axis, and \mathfrak is the trivial -algebra (where is the
empty set In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in othe ...
, and is the sample space).


A sharpened and generalized form

Let ''X'' be a one-dimensional random variable with mean \mu and variance \sigma^2\ge 0. Let \varphi(x) be a twice differentiable function, and define the function : h(x)\triangleq\frac-\frac. Then : \sigma^2\inf \frac \le \sigma^2\inf h(x) \leq E\left varphi \left(X\right)\right\varphi\left(E right)\le \sigma^2\sup h(x) \le \sigma^2\sup \frac. In particular, when \varphi(x) is convex, then \varphi''(x)\ge 0, and the standard form of Jensen's inequality immediately follows for the case where \varphi(x) is additionally assumed to be twice differentiable.


Proofs

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where is a real number (see figure). Assuming a hypothetical distribution of values, one can immediately identify the position of \operatorname /math> and its image \varphi(\operatorname in the graph. Noticing that for convex mappings the corresponding distribution of values is increasingly "stretched out" for increasing values of , it is easy to see that the distribution of is broader in the interval corresponding to and narrower in for any ; in particular, this is also true for X_0 = \operatorname /math>. Consequently, in this picture the expectation of will always shift upwards with respect to the position of \varphi(\operatorname . A similar reasoning holds if the distribution of covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e. :\varphi(\operatorname \leq \operatorname
varphi(X) Phi (; uppercase Φ, lowercase φ or ϕ; grc, ϕεῖ ''pheî'' ; Modern Greek: ''fi'' ) is the 21st letter of the Greek alphabet. In Archaic and Classical Greek (c. 9th century BC to 4th century BC), it represented an aspirated voicel ...
= \operatorname with equality when is not strictly convex, e.g. when it is a straight line, or when follows a degenerate distribution (i.e. is a constant). The proofs below formalize this intuitive notion.


Proof 1 (finite form)

If and are two arbitrary nonnegative real numbers such that then convexity of implies :\forall x_1, x_2: \qquad \varphi \left (\lambda_1 x_1+\lambda_2 x_2 \right )\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2). This can be generalized: if are nonnegative real numbers such that , then :\varphi(\lambda_1 x_1+\lambda_2 x_2+\cdots+\lambda_n x_n)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)+\cdots+\lambda_n\,\varphi(x_n), for any . The ''finite form'' of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for ''n'' = 2. Suppose the statement is true for some ''n'', so :\varphi\left(\sum_^\lambda_i x_i\right) \leq \sum_^\lambda_i \varphi\left(x_i\right) for any such that . One needs to prove it for . At least one of the is strictly smaller than 1, say ; therefore by convexity inequality: :\begin \varphi\left(\sum_^\lambda_i x_i\right) &= \varphi\left((1-\lambda_)\sum_^ \frac x_i + \lambda_ x_ \right) \\ &\leq (1-\lambda_) \varphi\left(\sum_^ \frac x_i \right)+\lambda_\,\varphi(x_). \end Since , : \sum_^ \frac = 1, applying the induction hypothesis gives : \varphi\left(\sum_^\frac x_i\right) \leq \sum_^\frac \varphi(x_i) therefore : \begin \varphi\left(\sum_^\lambda_i x_i\right) &\leq (1-\lambda_) \sum_^\frac \varphi(x_i)+\lambda_\,\varphi(x_) =\sum_^\lambda_i \varphi(x_i) \end We deduce the equality is true for , by the principle of mathematical induction it follows that the result is also true for all integer greater than 2. In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as: :\varphi\left(\int x\,d\mu_n(x) \right)\leq \int \varphi(x)\,d\mu_n(x), where ''μ''''n'' is a measure given by an arbitrary convex combination of Dirac deltas: :\mu_n= \sum_^n \lambda_i \delta_. Since convex functions are
continuous Continuity or continuous may refer to: Mathematics * Continuity (mathematics), the opposing concept to discreteness; common examples include ** Continuous probability distribution or random variable in probability and statistics ** Continuous g ...
, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.


Proof 2 (measure-theoretic form)

Let g be a real-valued \mu-integrable function on a probability space \Omega, and let \varphi be a convex function on the real numbers. Since \varphi is convex, at each real number x we have a nonempty set of subderivatives, which may be thought of as lines touching the graph of \varphi at x, but which are at or below the graph of \varphi at all points (support lines of the graph). Now, if we define :x_0:=\int_\Omega g\, d\mu, because of the existence of subderivatives for convex functions, we may choose a and b such that :ax + b \leq \varphi(x), for all real x and :ax_0+ b = \varphi(x_0). But then we have that :\varphi \circ g (\omega) \geq ag(\omega)+ b for almost all \omega \in \Omega. Since we have a probability measure, the integral is monotone with \mu(\Omega) = 1 so that :\int_\Omega \varphi \circ g\, d\mu \geq \int_\Omega (ag + b)\, d\mu = a\int_\Omega g\, d\mu + b\int_\Omega d\mu = ax_0 + b = \varphi (x_0) = \varphi \left (\int_\Omega g\, d\mu \right ), as desired.


Proof 3 (general inequality in a probabilistic setting)

Let ''X'' be an integrable random variable that takes values in a real topological vector space ''T''. Since \varphi: T \to \R is convex, for any x,y \in T, the quantity :\frac, is decreasing as approaches 0+. In particular, the ''subdifferential'' of \varphi evaluated at in the direction is well-defined by :(D\varphi)(x)\cdot y:=\lim_ \frac=\inf_ \frac. It is easily seen that the subdifferential is linear in (that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for , one gets :\varphi(x)\leq \varphi(x+y)-(D\varphi)(x)\cdot y. In particular, for an arbitrary sub--algebra \mathfrak we can evaluate the last inequality when x = \operatorname \mid\mathfrak\,y=X-\operatorname \mid\mathfrak/math> to obtain :\varphi(\operatorname \mid\mathfrak \leq \varphi(X)-(D\varphi)(\operatorname \mid\mathfrak\cdot (X-\operatorname \mid\mathfrak. Now, if we take the expectation conditioned to \mathfrak on both sides of the previous expression, we get the result since: :\operatorname \left [\left[(D\varphi)(\operatorname \mid\mathfrak\cdot (X-\operatorname \mid\mathfrak\right]\mid\mathfrak \right] = (D\varphi)(\operatorname \mid\mathfrak\cdot \operatorname[\left( X-\operatorname[X\mid\mathfrak] \right) \mid \mathfrak]=0, by the linearity of the subdifferential in the ''y'' variable, and the following well-known property of the
conditional expectation In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value – the value it would take “on average” over an arbitrarily large number of occurrences – give ...
: :\operatorname \left \left(\operatorname[X\mid\mathfrak\right)_\mid\mathfrak_\right_.html" ;"title="\mid\mathfrak.html" ;"title="\left(\operatorname[X\mid\mathfrak">\left(\operatorname[X\mid\mathfrak\right) \mid\mathfrak \right ">\mid\mathfrak.html" ;"title="\left(\operatorname[X\mid\mathfrak">\left(\operatorname[X\mid\mathfrak\right) \mid\mathfrak \right = \operatorname[ X \mid\mathfrak].


Applications and special cases


Form involving a probability density function

Suppose is a measurable subset of the real line and ''f''(''x'') is a non-negative function such that :\int_^\infty f(x)\,dx = 1. In probabilistic language, ''f'' is a
probability density function In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) ca ...
. Then Jensen's inequality becomes the following statement about convex integrals: If ''g'' is any real-valued measurable function and \varphi is convex over the range of ''g'', then : \varphi\left(\int_^\infty g(x)f(x)\, dx\right) \le \int_^\infty \varphi(g(x)) f(x)\, dx. If ''g''(''x'') = ''x'', then this form of the inequality reduces to a commonly used special case: :\varphi\left(\int_^\infty x\, f(x)\, dx\right) \le \int_^\infty \varphi(x)\,f(x)\, dx. This is applied in Variational Bayesian methods.


Example: even

moment Moment or Moments may refer to: * Present time Music * The Moments, American R&B vocal group Albums * ''Moment'' (Dark Tranquillity album), 2020 * ''Moment'' (Speed album), 1998 * ''Moments'' (Darude album) * ''Moments'' (Christine Guldbrand ...
s of a random variable

If ''g''(''x'') = ''x2n'', and ''X'' is a random variable, then ''g'' is convex as : \frac(x) = 2n(2n - 1)x^ \geq 0\quad \forall\ x \in \R and so : g(\operatorname = (\operatorname ^ \leq\operatorname ^ In particular, if some even moment ''2n'' of ''X'' is finite, ''X'' has a finite mean. An extension of this argument shows ''X'' has finite moments of every order l\in\N dividing ''n''.


Alternative finite form

Let and take to be the counting measure on , then the general form reduces to a statement about sums: : \varphi\left(\sum_^ g(x_i)\lambda_i \right) \le \sum_^ \varphi(g(x_i)) \lambda_i, provided that and :\lambda_1 + \cdots + \lambda_n = 1. There is also an infinite discrete form.


Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving: : e^ \leq \operatorname \left e^X \right where the
expected value In probability theory, the expected value (also called expectation, expectancy, mathematical expectation, mean, average, or first moment) is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of a ...
s are with respect to some
probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon ...
in the
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
. Proof: Let \varphi(x) = e^x in \varphi\left(\operatorname right) \leq \operatorname \left \varphi(X) \right


Information theory

If is the true probability density for , and is another density, then applying Jensen's inequality for the random variable and the convex function gives :\operatorname varphi(Y)\ge \varphi(\operatorname Therefore: :-D(p(x)\, q(x))=\int p(x) \log \left (\frac \right ) \, dx \le \log \left ( \int p(x) \frac\,dx \right ) = \log \left (\int q(x)\,dx \right ) =0 a result called Gibbs' inequality. It shows that the average message length is minimised when codes are assigned on the basis of the true probabilities ''p'' rather than any other distribution ''q''. The quantity that is non-negative is called the Kullback–Leibler divergence of ''q'' from ''p''. Since is a strictly convex function for , it follows that equality holds when equals almost everywhere.


Rao–Blackwell theorem

If ''L'' is a convex function and \mathfrak a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get :L(\operatorname delta(X) \mid \mathfrak \le \operatorname (\delta(X)) \mid \mathfrak\quad \Longrightarrow \quad \operatorname delta(X)_\mid_\mathfrak.html" ;"title="(\operatorname delta(X) \mid \mathfrak">(\operatorname delta(X) \mid \mathfrak\le \operatorname (\delta(X)) So if δ(''X'') is some estimator of an unobserved parameter θ given a vector of observables ''X''; and if ''T''(''X'') is a
sufficient statistic In statistics, a statistic is ''sufficient'' with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the para ...
for θ; then an improved estimator, in the sense of having a smaller expected loss ''L'', can be obtained by calculating :\delta_1 (X) = \operatorname_ delta(X') \mid T(X')= T(X) the expected value of δ with respect to θ, taken over all possible vectors of observations ''X'' compatible with the same value of ''T''(''X'') as that observed. Further, because T is a sufficient statistics, \delta_1 (X) does not depend on θ, hence, becomes a statistics. This result is known as the
Rao–Blackwell theorem In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squ ...
.


Financial Performance Simulation

A popular method of measuring the investment performance of an investment is the
Internal Rate of Return Internal rate of return (IRR) is a method of calculating an investment’s rate of return. The term ''internal'' refers to the fact that the calculation excludes external factors, such as the risk-free rate, inflation, the cost of capital, or ...
(IRR) which is the rate by which a series of uncertain future cash flows are discounted using Present Value Theory to cause the sum of the future cash flows to equal the initial investment. While it is tempting to perform Monte Carlo simulation of the IRR, Jensen's Inequality introduces a bias due to fact that the IRR function is a curved function and the expectation operator is a linear function.


See also

* Karamata's inequality for a more general inequality * Popoviciu's inequality * Law of averages * A proof without words of Jensen's inequality


Notes


References

* * Tristan Needham (1993) "A Visual Explanation of Jensen's Inequality",
American Mathematical Monthly ''The American Mathematical Monthly'' is a mathematical journal founded by Benjamin Finkel in 1894. It is published ten times each year by Taylor & Francis for the Mathematical Association of America. The ''American Mathematical Monthly'' is an ...
100(8):768–71. * * * *Sam Savage (2012
The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty
(1st ed.) Wiley. ISBN 978-0471381976


External links


Jensen's Operator Inequality
of Hansen and Pedersen. * * * {{Convex analysis and variational analysis Convex analysis Inequalities Probabilistic inequalities Statistical inequalities Theorems in analysis Theorems involving convexity Articles containing proofs