Azuma's inequality
   HOME

TheInfoList



OR:

In
probability theory Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set ...
, the Azuma–Hoeffding inequality (named after Kazuoki Azuma and Wassily Hoeffding) gives a concentration result for the values of martingales that have bounded differences. Suppose \ is a martingale (or super-martingale) and :, X_k - X_, \leq c_k, \,
almost surely In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (or Lebesgue measure 1). In other words, the set of possible exceptions may be non-empty, but it has probability 0. ...
. Then for all positive integers ''N'' and all positive reals ''\epsilon'', :\text(X_N - X_0 \geq \epsilon) \leq \exp\left ( \right). And symmetrically (when ''X''''k'' is a sub-martingale): :\text(X_N - X_0 \leq -\epsilon) \leq \exp\left ( \right). If ''X'' is a martingale, using both inequalities above and applying the
union bound In probability theory, Boole's inequality, also known as the union bound, says that for any finite or countable set of events, the probability that at least one of the events happens is no greater than the sum of the probabilities of the indivi ...
allows one to obtain a two-sided bound: :\text(, X_N - X_0, \geq \epsilon) \leq 2\exp\left ( \right).


Proof

The proof shares similar idea of the proof for the general form of Azuma's inequality listed below. Actually, this can be viewed as a direct corollary of the general form of Azuma's inequality.


A general form of Azuma's inequality


Limitation of the vanilla Azuma's inequality

Note that the vanilla Azuma's inequality requires symmetric bounds on martingale increments, i.e. -c_t \leq X_t - X_ \leq c_t. So, if known bound is asymmetric, e.g. a_t \leq X_t - X_ \leq b_t, to use Azuma's inequality, one need to choose c_t = \max(, a_t, , , b_t, ) which might be a waste of information on the boundedness of X_t - X_. However, this issue can be resolved and one can obtain a tighter probability bound with the following general form of Azuma's inequality.


Statement

Let \left\ be a martingale (or supermartingale) with respect to filtration \left\. Assume there are predictable processes \left\ and \left\ with respect to \left\ , i.e. for all t, A_t, B_t are \mathcal_-
measurable In mathematics, the concept of a measure is a generalization and formalization of geometrical measures (length, area, volume) and other common notions, such as mass and probability of events. These seemingly distinct concepts have many simila ...
, and constants 0 such that : A_t \leq X_t - X_ \leq B_t \quad \text \quad B_t - A_t \leq c_t almost surely. Then for all \epsilon>0, : \text(X_n - X_0 \geq \epsilon) \leq \exp \left( - \frac \right). Since a submartingale is a supermartingale with signs reversed, we have if instead \left\ is a martingale (or submartingale), : \text(X_n - X_0 \leq -\epsilon) \leq \exp \left(- \frac \right). If \left\ is a martingale, since it is both a supermartingale and submartingale, by applying union bound to the two inequalities above, we could obtain the two-sided bound: : \text(, X_n - X_0, \geq \epsilon) \leq 2\exp \left(- \frac \right).


Proof

We will prove the supermartingale case only as the rest are self-evident. By Doob decomposition, we could decompose supermartingale \left\ as X_t = Y_t + Z_t where \left\ is a martingale and \left\ is a nonincreasing predictable sequence (Note that if \left\ itself is a martingale, then Z_t = 0). From A_t \leq X_t - X_ \leq B_t, we have : -(Z_t - Z_) + A_t \leq Y_t - Y_ \leq -(Z_t - Z_) + B_t Applying
Chernoff bound In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is d ...
to Y_n - Y_0, we have for \epsilon>0, :\begin \text(Y_n-Y_0 \geq \epsilon) & \leq \underset \ e^ \mathbb ^\\ & = \underset \ e^ \mathbb \left exp \left( s \sum_^(Y_t-Y_) \right) \right\\ & = \underset \ e^ \mathbb \left exp_\left(_s_\sum_^(Y_t-Y_)_\right)_\mathbb_\left[\exp_\left(_s(Y_n-Y__\right)_\mid_\mathcal__\right\right.html" ;"title="exp \left( s(Y_n-Y_ \right) \mid \mathcal_ \right">exp \left( s \sum_^(Y_t-Y_) \right) \mathbb \left[\exp \left( s(Y_n-Y_ \right) \mid \mathcal_ \right\right">exp \left( s(Y_n-Y_ \right) \mid \mathcal_ \right">exp \left( s \sum_^(Y_t-Y_) \right) \mathbb \left[\exp \left( s(Y_n-Y_ \right) \mid \mathcal_ \right\right\end For the inner expectation term, since (i) \mathbb[Y_n - Y_ \mid \mathcal_]=0 as \left\ is a martingale; (ii) -(Z_n - Z_) + A_n \leq Y_n - Y_ \leq -(Z_n - Z_) + B_n ; (iii) -(Z_n - Z_) + A_n and -(Z_n - Z_) + B_n are both \mathcal_-measurable as \left\ is a predictable process; and (iv) B_n - A_n \leq c_n, by applying
Hoeffding's lemma In probability theory, Hoeffding's lemma is an inequality that bounds the moment-generating function of any bounded random variable. It is named after the Finnish–American mathematical statistician Wassily Hoeffding. The proof of Hoeff ...
, we have : \mathbb \left exp \left( s(Y_n-Y_) \mid \mathcal_ \right) \right\leq \exp \left(\frac \right) \leq \exp \left(\frac \right). Repeating this step, one could get : \text(Y_n-Y_0 \geq \epsilon) \leq \underset \ e^ \exp \left(\frac\right). Note that the minimum is achieved at s = \frac, so we have : \text(Y_n-Y_0 \geq \epsilon) \leq \exp \left(-\frac\right). Finally, since X_n - X_0 = (Y_n - Y_0) + (Z_n - Z_0) and Z_n - Z_0 \leq 0 as \left\ is nonincreasing, so event \left\ implies \left\, and therefore : \text(X_n-X_0 \geq \epsilon) \leq \text(Y_n-Y_0 \geq \epsilon) \leq \exp \left(-\frac\right). \square


Remark

Note that by setting A_t = -c_t, B_t = c_t, we could obtain the vanilla Azuma's inequality. Note that for either submartingale or supermartingale, only one side of Azuma's inequality holds. We can't say much about how fast a submartingale with bounded increments rises (or a supermartingale falls). This general form of Azuma's inequality applied to the Doob martingale gives
McDiarmid's inequality In probability theory and theoretical computer science, McDiarmid's inequality is a concentration inequality which bounds the deviation between the sampled value and the expected value of certain functions when they are evaluated on independent ra ...
which is common in the analysis of
randomized algorithm A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performa ...
s.


Simple example of Azuma's inequality for coin flips

Let ''F''''i'' be a sequence of independent and identically distributed random coin flips (i.e., let ''F''''i'' be equally likely to be −1 or 1 independent of the other values of ''F''''i''). Defining X_i = \sum_^i F_j yields a martingale with , ''X''''k'' − ''X''''k''−1,  ≤ 1, allowing us to apply Azuma's inequality. Specifically, we get : \text(X_n > t) \leq \exp\left(\frac\right). For example, if we set ''t'' proportional to ''n'', then this tells us that although the ''maximum'' possible value of ''X''''n'' scales linearly with ''n'', the ''probability'' that the sum scales linearly with ''n'' decreases exponentially fast with ''n''. If we set t=\sqrt we get: : \text(X_n > \sqrt) \leq 1/n, which means that the probability of deviating more than \sqrt approaches 0 as ''n'' goes to infinity.


Remark

A similar inequality was proved under weaker assumptions by
Sergei Bernstein Sergei Natanovich Bernstein (russian: Серге́й Ната́нович Бернште́йн, sometimes Romanized as ; 5 March 1880 – 26 October 1968) was a Ukrainian and Russian mathematician of Jewish origin known for contributions to parti ...
in 1937. Hoeffding proved this result for independent variables rather than martingale differences, and also observed that slight modifications of his argument establish the result for martingale differences (see page 9 of his 1963 paper).


See also

* Concentration inequality - a summary of tail-bounds on random variables.


Notes


References

* * * (vol. 4, item 22 in the collected works) * * * {{Cite book, first1=A. P. , last1=Godbole , first2= P. , last2=Hitczenko, title=Beyond the method of bounded differences, journal=DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume= 41, pages=43–58, year= 1998 , mr=1630408 , doi=10.1090/dimacs/041/03 , isbn=9780821808276 Probabilistic inequalities Martingale theory