The order in probability notation is used in

probability theory Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expre ...

and

statistical theory The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistica ...

in direct parallel to the big ''O'' notation that is standard in

mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...

. Where the big ''O'' notation deals with the convergence of sequences or sets of ordinary numbers, the order in probability notation deals with convergence of sets of random variables, where convergence is in the sense of

convergence in probability In probability theory, there exist several different notions of convergence of sequences of random variables, including ''convergence in probability'', ''convergence in distribution'', and ''almost sure convergence''. The different notions of conve ...

Definitions

Small ''o'': convergence in probability

For a set of random variables ''X_n'' and corresponding set of constants ''a_n'' (both indexed by ''n'', which need not be discrete), the notation :

X_n=o_p(a_n)

means that the set of values ''X_n''/''a_n'' converges to zero in probability as ''n'' approaches an appropriate limit. Equivalently, ''X''_''n'' = o_''p''(''a''_''n'') can be written as ''X''_''n''/''a''_''n'' = o_''p''(1), i.e. :

\lim_ P\left \frac\ \geq \varepsilon\right = 0,

for every positive ε. Yvonne M. Bishop, Stephen E.Fienberg, Paul W. Holland. (1975, 2007) ''Discrete multivariate analysis'', Springer. ,

Big ''O'': stochastic boundedness

The notation :

X_n=O_p(a_n) \text n\to\infty

means that the set of values ''X_n''/''a_n'' is stochastically bounded. That is, for any ''ε'' > 0, there exists a finite ''M'' > 0 and a finite ''N'' > 0 such that :

P\left(, \frac,  > M\right) < \varepsilon,\; \forall \; n > N.

Comparison of the two definitions

The difference between the definitions is subtle. If one uses the definition of the limit, one gets: * Big

O_p(1)

\forall \varepsilon \quad \exists N_, \delta_ \quad  \text P(, X_n,  \geq \delta_) \leq \varepsilon \quad \forall n> N_

* Small

o_p(1)

\forall \varepsilon, \delta \quad \exists N_ \quad  \text P(, X_n,  \geq \delta) \leq \varepsilon \quad \forall n> N_

The difference lies in the

\delta

: for stochastic boundedness, it suffices that there exists one (arbitrary large)

\delta

to satisfy the inequality, and

\delta

is allowed to be dependent on

\varepsilon

(hence the

\delta_\varepsilon

). On the other hand, for convergence, the statement has to hold not only for one, but for any (arbitrary small)

\delta

. In a sense, this means that the sequence must be bounded, with a bound that gets smaller as the sample size increases. This suggests that if a sequence is

o_p(1)

, then it is

O_p(1)

, i.e. convergence in probability implies stochastic boundedness. But the reverse does not hold.

Example

(X_n)

is a stochastic sequence such that each element has finite variance, then :

X_n - E(X_n) = O_p\left(\sqrt\right)

(see Theorem 14.4-1 in Bishop et al.) If, moreover,

a_n^\operatorname(X_n) = \operatorname(a_n^X_n)

is a null sequence for a sequence

(a_n)

of real numbers, then

a_n^(X_n - E(X_n))

converges to zero in probability by Chebyshev's inequality, so :

X_n - E(X_n) = o_p(a_n).

References

{{DEFAULTSORT:Big O In Probability Notation Mathematical notation Probability theory Statistical theory Convergence (mathematics)