probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...

, Markov's inequality gives an upper bound for the

probability Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...

that a

non-negative In mathematics, the sign of a real number is its property of being either positive, negative, or zero. Depending on local conventions, zero may be considered as being neither positive nor negative (having no sign or a unique third sign), or it ...

function of a

random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...

is greater than or equal to some positive constant. It is named after the Russian mathematician

Andrey Markov Andrey Andreyevich Markov, first name also spelled "Andrei", in older works also spelled Markoff) (14 June 1856 – 20 July 1922) was a Russian mathematician best known for his work on stochastic processes. A primary subject of his research lat ...

, although it appeared earlier in the work of

Pafnuty Chebyshev Pafnuty Lvovich Chebyshev ( rus, Пафну́тий Льво́вич Чебышёв, p=pɐfˈnutʲɪj ˈlʲvovʲɪtɕ tɕɪbɨˈʂof) ( – ) was a Russian mathematician and considered to be the founding father of Russian mathematics. Chebysh ...

(Markov's teacher), and many sources, especially in

analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...

, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to

Chebyshev's inequality In probability theory, Chebyshev's inequality (also called the Bienaymé–Chebyshev inequality) guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from t ...

as the second Chebyshev inequality) or Bienaymé's inequality. Markov's inequality (and other similar inequalities) relate probabilities to expectations, and provide (frequently loose but still useful) bounds for the

cumulative distribution function In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x. Eve ...

of a random variable.

Statement

If is a nonnegative random variable and , then the probability that is at least is at most the expectation of divided by : :

\operatorname(X \geq a) \leq \frac.

Let

a = \tilde \cdot \operatorname(X)

(where

\tilde > 0

); then we can rewrite the previous inequality as :

\operatorname(X \geq \tilde \cdot \operatorname(X)) \leq \frac.

In the language of

measure theory In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...

, Markov's inequality states that if is a

measure space A measure space is a basic object of measure theory, a branch of mathematics that studies generalized notions of volumes. It contains an underlying set, the subsets of this set that are feasible for measuring (the -algebra) and the method that ...

f

is a

measurable In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...

extended real-valued function, and , then :

\mu(\) \leq \frac 1 \varepsilon \int_X , f, \,d\mu.

This measure-theoretic definition is sometimes referred to as

Extended version for monotonically increasing functions

If is a

monotonically increasing In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of orde ...

nonnegative function for the nonnegative reals, is a random variable, , and , then :

\operatorname P (X \ge a) \le \frac.

An immediate corollary, using higher moments of supported on values larger than 0, is :

\operatorname P (, X,  \ge a) \le \frac.

Proofs

We separate the case in which the measure space is a probability space from the more general case because the probability case is more accessible for the general reader.

Intuition

\operatorname(X) = \operatorname(X < a)\cdot \operatorname(X, X where \operatorname(X, X is larger than or equal to 0 as r.v. X is non-negative and \operatorname(X, X\geq a) is larger than or equal to a because the conditional expectation only takes into account of values larger than or equal to a which r.v. X can take.

Hence intuitively \operatorname(X)\geq  \operatorname(X \geq a)\cdot \operatorname(X, X\geq a)\geq a \cdot \operatorname(X\geq a), which directly leads to \operatorname(X\geq a)\leq \frac .

Probability-theoretic proof

Method 1: From the definition of expectation: :

\operatorname(X)=\int_^ xf(x) \, dx

However, X is a non-negative random variable thus, :

\operatorname(X)=\int_^\infty xf(x) \, dx = \int_0^\infty xf(x) \, dx

From this we can derive, :

\operatorname(X)=\int_0^a xf(x) \, dx + \int_a^\infty xf(x) \, dx \ge  \int_a^\infty xf(x) \, dx \ge\int_a^\infty af(x) \, dx = a\int_a^\infty f(x) \, dx= a \operatorname(X \ge a)

From here, dividing through by

a

allows us to see that :

\Pr(X \ge a) \le \operatorname(X)/a

Method 2: For any event

E

, let

I_E

be the indicator random variable of

E

, that is,

I_E=1

E

occurs and

I_E=0

otherwise. Using this notation, we have

I_=1

if the event

X\geq a

occurs, and

I_=0

X .  Then, given a>0,

: aI_ \leq X which is clear if we consider the two possible values of X\geq a .  If X, then I_=0, and so a I_=0\leq X .  Otherwise, we have X\geq a, for which I_=1 and so aI_=a\leq X .

Since \operatorname is a monotonically increasing function, taking expectation of both sides of an inequality cannot reverse it. Therefore,

: \operatorname(aI_) \leq \operatorname(X). Now, using linearity of expectations, the left side of this inequality is the same as

: a\operatorname(I_) = a(1\cdot\operatorname(X \geq a) + 0\cdot\operatorname(X < a)) = a\operatorname(X \geq a). Thus we have

: a\operatorname(X \geq a) \leq \operatorname(X) and since ''a'' > 0, we can divide both sides by ''a''.

Measure-theoretic proof

We may assume that the function

f

is non-negative, since only its absolute value enters in the equation. Now, consider the real-valued function ''s'' on ''X'' given by :

s(x) =
\begin
  \varepsilon, & \text f(x) \geq \varepsilon  \\
  0, & \text f(x) < \varepsilon
\end

Then

0\leq s(x)\leq f(x)

. By the definition of the Lebesgue integral :

\int_X f(x) \, d\mu \geq \int_X s(x) \, d \mu = \varepsilon \mu( \ )

and since

\varepsilon >0

, both sides can be divided by

\varepsilon

, obtaining :

\mu(\) \leq \int_X f \,d\mu.

Corollaries

Chebyshev's inequality

uses the

variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...

to bound the probability that a random variable deviates far from the mean. Specifically, :

\operatorname(, X-\operatorname(X),  \geq a) \leq \frac,

for any . Here is the

of X, defined as: :

\operatorname(X) = \operatorname X - \operatorname(X) )^2

Chebyshev's inequality follows from Markov's inequality by considering the random variable :

(X - \operatorname(X))^2

and the constant

a^2,

for which Markov's inequality reads :

\operatorname( (X - \operatorname(X))^2 \ge a^2) \le \frac.

This argument can be summarized (where "MI" indicates use of Markov's inequality): :

\operatorname(, X-\operatorname(X),  \geq a) = 
\operatorname\left((X-\operatorname(X))^2 \geq a^2\right) \,\overset\, 
\frac  =
\frac.

Other corollaries

#The "monotonic" result can be demonstrated by: #:

\operatorname P (, X,  \ge a) = \operatorname P \big(\varphi(, X, ) \ge \varphi(a)\big) \,\overset\, 
\frac

#: #The result that, for a nonnegative random variable , the

quantile function In probability and statistics, the quantile function, associated with a probability distribution of a random variable, specifies the value of the random variable such that the probability of the variable being less than or equal to that value equ ...

of satisfies: #:

Q_X(1-p) \leq \frac ,

#:the proof using #:

p \leq \operatorname P(X \geq Q_X(1-p)) \,\overset\, \frac .

#: #Let

M \succeq 0

be a self-adjoint matrix-valued random variable and . Then #:

\operatorname(M \npreceq a \cdot I) \leq \frac.

#:can be shown in a similar manner.

Examples

Assuming no income is negative, Markov's inequality shows that no more than 1/5 of the population can have more than 5 times the average income.

References

External links

The formal proof of Markov's inequality
in the Mizar system. {{DEFAULTSORT:Markov's Inequality Probabilistic inequalities Articles containing proofs