For certain applications in

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...

, it is useful to know properties of the

probability distribution In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...

of the largest

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

of a finite sum of random matrices. Suppose

\

is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter ''t'': :

\Pr \left \

The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in , as the specific application of a general result which is derived below. A summary of related works is given.

Matrix Gaussian and Rademacher series

Self-adjoint matrices case

Consider a finite sequence

\

of fixed, self-adjoint matrices with dimension

d

, and let

\

be a finite sequence of

independent Independent or Independents may refer to: Arts, entertainment, and media Artist groups * Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s * Independ ...

standard normal or independent

Rademacher Rademacher is an occupational surname of German origin, which means "wheelmaker". It may refer to: People * Arthur Rademacher (1889–1981), Australian football player *Autumn Rademacher (born 1975), American basketball coach *Bill Rademacher (born ...

random variables. Then, for all

t\geq0

, :

\Pr \left\ \leq d \cdot e^

where :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Rectangular case

Consider a finite sequence

\

of fixed matrices with dimension

d_1\times d_2

, and let

\

be a finite sequence of independent standard normal or independent Rademacher random variables. Define the variance parameter :

\sigma^2 = \max \left\.

Then, for all

t\geq0

, :

\Pr \left\ \leq (d_1+d_2) \cdot e^.

Matrix Chernoff inequalities

The classical Chernoff bounds concern the sum of independent, nonnegative, and uniformly bounded random variables. In the matrix setting, the analogous theorem concerns a sum of positive-semidefinite random matrices subjected to a uniform eigenvalue bound.

Matrix Chernoff I

Consider a finite sequence

\

of independent, random, self-adjoint matrices with dimension

d

. Assume that each random matrix satisfies :

\mathbf_k \succeq \mathbf \quad \text \quad \lambda_(\mathbf_k) \leq R

almost surely. Define :

\mu_ = \lambda_\left( \sum_k \mathbb\, \mathbf_k \right) \quad \text \quad
\mu_ = \lambda_\left( \sum_k \mathbb\, \mathbf_k \right).

Then :

\quad \text \delta\in [0,1)\text

\quad \text \delta \geq 0.

Matrix Chernoff II

Consider a sequence

\

of independent, random, self-adjoint matrices that satisfy :

\mathbf_k \succeq \mathbf \quad \text \quad \lambda_(\mathbf_k) \leq 1

almost surely. Compute the minimum and maximum eigenvalues of the average expectation, :

\bar_ = \lambda_\left( \frac \sum_^n \mathbb\, \mathbf_k \right) \quad \text \quad
\bar_ = \lambda_\left( \frac \sum_^n \mathbb\, \mathbf_k \right).

Then :

\Pr \left\ \leq d \cdot e^ \quad \text 0 \leq \alpha \leq \bar_\text

\Pr \left\ \leq d \cdot e^ \quad \text \bar_ \leq \alpha \leq 1.

The binary information divergence is defined as :

D(a\Vert u) = a \left( \log a - \log u \right) + (1-a)\left( \log(1-a)-\log(1-u) \right)

for

a,u\in[0,1]

Matrix Bennett and Bernstein inequalities

In the scalar setting, Bernstein inequalities (probability theory), Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or subexponential. In the matrix case, the analogous results concern a sum of zero-mean random matrices.

Bounded case

Consider a finite sequence

\

of independent, random, self-adjoint matrices with dimension

d

. Assume that each random matrix satisfies :

\mathbb\mathbf_k = \mathbf \quad \text \quad \lambda_(\mathbf_k) \leq R

almost surely. Compute the norm of the total variance, :

\sigma^2 = \bigg\Vert \sum_k \mathbb\, (\mathbf^2_k) \bigg\Vert.

Then, the following chain of inequalities holds for all

t \geq 0

: :

\begin
\Pr \left\ 
& \leq d \cdot \exp \left( -\frac \cdot h\left( \frac \right) \right) \\
& \leq d \cdot \exp \left( \frac  \right) \\
& \leq 
\begin
d \cdot \exp ( -3t^2/8\sigma^2 ) \quad & \text t\leq \sigma^2/R; \\
d \cdot \exp ( -3t/8R ) \quad & \text t\geq \sigma^2/R. \\
\end
\end

The function

h(u)

is defined as

h(u) = (1+u) \log (1+u)-u

for

u \geq 0

Subexponential case

Consider a finite sequence

\

of independent, random, self-adjoint matrices with dimension

d

. Assume that :

\mathbb\,\mathbf_k = \mathbf \quad \text \quad \mathbb\,(\mathbf_k^p) \preceq \frac\cdot R^ \mathbf_k^2

for

p=2,3,4,\ldots

. Compute the variance parameter, :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Then, the following chain of inequalities holds for all

t \geq 0

: :

\begin
\Pr \left\ 
& \leq d \cdot \exp \left( \frac \right) \\
& \leq 
\begin
d \cdot \exp ( -t^2/4\sigma^2 ) \quad & \text t\leq \sigma^2/R; \\
d \cdot \exp ( -t/4R ) \quad & \text t\geq \sigma^2/R. \\
\end
\end

Rectangular case

Consider a finite sequence

\

of independent, random, matrices with dimension

d_1\times d_2

. Assume that each random matrix satisfies :

\mathbb\,\mathbf_k = \mathbf \quad \text \quad \Vert \mathbf_k \Vert \leq R

almost surely. Define the variance parameter :

\sigma^2 = \max \left\.

Then, for all

t \geq 0

\Pr \left\ \leq (d_1+d_2) \cdot \exp \left( \frac  \right)

holds. User-friendly tail bounds for sums of random matrices

Matrix Azuma, Hoeffding, and McDiarmid inequalities

Matrix Azuma

The scalar version of Azuma's inequality states that a scalar martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence. The following is the extension in matrix setting. Consider a finite adapted sequence

\

of self-adjoint matrices with dimension

d

, and a fixed sequence

\

of self-adjoint matrices that satisfy :

\mathbb_\,\mathbf_k = \mathbf \quad \text \quad \mathbf_k^2 \preceq \mathbf_k^2

almost surely. Compute the variance parameter :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Then, for all

t \geq 0

\Pr \left\ \leq d \cdot e^

The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand

\mathbf_k

is conditionally symmetric. Another example requires the assumption that

\mathbf_k

commutes almost surely with

\mathbf_k

Matrix Hoeffding

Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of Hoeffding's inequalities. Consider a finite sequence

\

of independent, random, self-adjoint matrices with dimension

d

, and let

\

be a sequence of fixed self-adjoint matrices. Assume that each random matrix satisfies :

\mathbb\,\mathbf_k = \mathbf \quad \text \quad \mathbf_k^2 \preceq \mathbf_k^2

almost surely. Then, for all

t \geq 0

\Pr \left\ \leq d \cdot e^

where :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

An improvement of this result was established in : for all

t \geq 0

\Pr \left\ \leq d \cdot e^

where :

\sigma^2 = \frac\bigg\Vert \sum_k \mathbf^2_k + \mathbb\,\mathbf^2_k  \bigg\Vert
\leq \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Matrix bounded difference (McDiarmid)

In scalar setting,

McDiarmid's inequality In probability theory and theoretical computer science, McDiarmid's inequality is a concentration inequality which bounds the deviation between the sampled value and the expected value of certain functions when they are evaluated on independent ran ...

provides one common way of bounding the differences by applying Azuma's inequality to a

Doob martingale In the probability theory, mathematical theory of probability, a Doob martingale (named after Joseph L. Doob, also known as a Levy martingale) is a stochastic process that approximates a given random variable and has the Martingale (probability t ...

. A version of the bounded differences inequality holds in the matrix setting. Let

\

be an independent, family of random variables, and let

\mathbf

be a function that maps

n

variables to a self-adjoint matrix of dimension

d

. Consider a sequence

\

of fixed self-adjoint matrices that satisfy :

\left( \mathbf(z_1,\ldots,z_k,\ldots,z_n) - \mathbf(z_1,\ldots,z'_k,\ldots,z_n) \right)^2 \preceq \mathbf_k^2,

where

z_i

and

z'_i

range over all possible values of

Z_i

for each index

i

. Compute the variance parameter :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Then, for all

t \geq 0

\Pr \left\ \leq d \cdot e^,

where

\mathbf=(Z_1,\ldots,Z_n)

. An improvement of this result was established in (see also ): for all

t \geq 0

\Pr \left\ \leq d \cdot e^,

where

\mathbf=(Z_1,\ldots,Z_n)

and

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Survey of related theorems

The first bounds of this type were derived by . Recall the theorem above for self-adjoint matrix Gaussian and Rademacher bounds: For a finite sequence

\

of fixed, self-adjoint matrices with dimension

d

and for

\

a finite sequence of

standard normal or independent

random variables, then :

\Pr \left\ \leq d \cdot e^

where :

\sigma^2 = \bigg\Vert \sum_k \mathbf^2_k \bigg\Vert.

Ahlswede and Winter would give the same result, except with :

\sigma_^2 = \sum_k \lambda_\max \left(\mathbf_k^2 \right )

. By comparison, the

\sigma^2

in the theorem above commutes

\Sigma

and

\lambda_\max

; that is, it is the largest eigenvalue of the sum rather than the sum of the largest eigenvalues. It is never larger than the Ahlswede–Winter value (by the

norm Naturally occurring radioactive materials (NORM) and technologically enhanced naturally occurring radioactive materials (TENORM) consist of materials, usually industrial wastes or by-products enriched with radioactive elements found in the envir ...

triangle inequality), but can be much smaller. Therefore, the theorem above gives a tighter bound than the Ahlswede–Winter result. The chief contribution of was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see Chernoff bound#Additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the

Golden–Thompson inequality In physics and mathematics, the Golden–Thompson inequality is a Trace inequalities, trace inequality between Matrix_exponential, exponentials of symmetric and Hermitian matrices proved independently by and . It has been developed in the context o ...

to proceed, whereas Tropp uses

Lieb's Theorem In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives ...

. Suppose one wished to vary the length of the series (''n'') and the dimensions of the matrices (''d'') while keeping the right-hand side approximately constant. Then n must vary approximately as the log of ''d''. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin give a result for matrices which are the outer product of two vectors. provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but proves a similar result using the Ahlswede–Winter approach. Finally, Oliveira proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.

Derivation and proof

Ahlswede and Winter

The Laplace transform argument found in is a significant result in its own right: Let

\mathbf

be a random self-adjoint matrix. Then :

\Pr \left \ \leq \inf_
\left \.

To prove this, fix

\theta > 0

. Then :

\begin \Pr \left \ &= \Pr \left \ \\
&= \Pr \left \\\
&\leq e^ \operatorname e^\\
&\leq e^ \operatorname \operatorname e^
\end

The second-to-last inequality is Markov's inequality. The last inequality holds since

e^ = \lambda_ (e^) \leq \operatorname (e^)

. Since the left-most quantity is independent of

\theta

, the infimum over

\theta>0

remains an upper bound for it. Thus, our task is to understand

\operatorname operatorname (e^) /math>  Nevertheless, since trace and expectation are both linear, we can commute them, so it is sufficient to consider \operatorname e^ := \mathbf_\mathbf(\theta), which we call the matrix generating function.  This is where the methods of  and  diverge.  The immediately following presentation follows .

The

implies that :

= \operatorname \mathbf_ (\theta) \mathbf_(\theta)

, where we used the linearity of expectation several times. Suppose

\mathbf = \sum_k \mathbf_k

. We can find an upper bound for

\operatorname \mathbf_ (\theta)

by iterating this result. Noting that

\operatorname(\mathbf) \leq \operatorname(\mathbf) \lambda_(\mathbf)

, then :

\leq \operatorname \left ( \operatorname e^ \right ) \lambda_ ( \operatorname e^).

Iterating this, we get :

\operatorname \mathbf_\mathbf (\theta) \leq 
(\operatorname \mathbf) \left \Pi_k \lambda_ (\operatorname e^) \right =
d e^

So far we have found a bound with an infimum over

\theta

. In turn, this can be bounded. At any rate, one can see how the Ahlswede–Winter bound arises as the sum of largest eigenvalues.

Tropp

The major contribution of is the application of

Lieb's theorem In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives ...

where had applied the

. Tropp's corollary is the following: If

H

is a fixed self-adjoint matrix and

X

is a random self-adjoint matrix, then :

\operatorname \operatorname e^ 
\leq \operatorname e^

Proof: Let

\mathbf = e^\mathbf

. Then Lieb's theorem tells us that :

f(\mathbf) = \operatorname e^

is concave. The final step is to use

Jensen's inequality In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier pr ...

to move the expectation inside the function: :

\operatorname \operatorname e^ 
\leq \operatorname e^.

This gives us the major result of the paper: the subadditivity of the log of the matrix generating function.

Subadditivity of log mgf

Let

\mathbf_k

be a finite sequence of independent, random self-adjoint matrices. Then for all

\theta \in \mathbb

, :

\operatorname \mathbf_ (\theta)
\leq \operatorname e^

Proof: It is sufficient to let

\theta=1

. Expanding the definitions, we need to show that :

\operatorname \operatorname e^ 
  \leq \operatorname e^.

To complete the proof, we use the law of total expectation. Let

\operatorname_k

be the expectation conditioned on

\mathbf_1, \ldots, \mathbf_k

. Since we assume all the

\mathbf_i

are independent, :

\operatorname_ e^ = \operatorname e^.

Define

\mathbf_k = \log \operatorname_ e^ = \log \mathbf_(\theta)

. Finally, we have :

\begin
\operatorname \operatorname e^ & = \operatorname_0 \cdots \operatorname_ \operatorname e^\\
&\leq \operatorname_0 \cdots \operatorname_ \operatorname e^\\
&= \operatorname_0 \cdots \operatorname_ \operatorname e^ \\
& \vdots\\
& = \operatorname e^
 \end

where at every step m we use Tropp's corollary with :

\mathbf_m = \sum_^ \mathbf_k + \sum_^n \mathbf_k

Master tail bound

The following is immediate from the previous result: :

\Pr \left \
\leq \inf_ \left \

All of the theorems given above are derived from this bound; the theorems consist in various ways to bound the infimum. These steps are significantly simpler than the proofs given.

References

* * * * * * * * * * {{cite journal , last1=Tropp , first1=J. , year=2010 , title=User-friendly tail bounds for sums of random matrices , arxiv=1004.4389 , doi=10.1007/s10208-011-9099-z , volume=12 , journal=Foundations of Computational Mathematics , issue=4 , pages=389–434 , s2cid=17735965 Linear algebra