For certain applications in
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as:
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as:
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrices.
...
, it is useful to know properties of the
probability distribution
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon i ...
of the largest
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...
of a
finite sum of
random matrices. Suppose
is a finite sequence of random matrices. Analogous to the well-known
Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter ''t'':
:
The following theorems answer this general question under various assumptions; these assumptions are named below by analogy to their classical, scalar counterparts. All of these theorems can be found in , as the specific application of a general result which is derived below. A summary of related works is given.
Matrix Gaussian and Rademacher series
Self-adjoint matrices case
Consider a finite sequence
of fixed,
self-adjoint matrices with dimension
, and let
be a finite sequence of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
standard
normal or independent
Rademacher Rademacher is an occupational surname of German origin, which means "wheelmaker". It may refer to:
People
* Arthur Rademacher (1889–1981), Australian football player
*Autumn Rademacher (born 1975), American basketball coach
*Bill Rademacher (born ...
random variables.
Then, for all
,
:
where
:
Rectangular case
Consider a finite sequence
of fixed matrices with dimension
, and let
be a finite sequence of independent standard normal or independent Rademacher random variables.
Define the variance parameter
:
Then, for all
,
:
Matrix Chernoff inequalities
The classical
Chernoff bounds concern the sum of independent, nonnegative, and uniformly bounded random variables.
In the matrix setting, the analogous theorem concerns a sum of
positive-semidefinite random matrices subjected to a uniform eigenvalue bound.
Matrix Chernoff I
Consider a finite sequence
of independent, random, self-adjoint matrices with dimension
.
Assume that each random matrix satisfies
:
almost surely.
Define
:
Then
:
:
Matrix Chernoff II
Consider a sequence
of independent, random, self-adjoint matrices that satisfy
:
almost surely.
Compute the minimum and maximum eigenvalues of the average expectation,
:
Then
:
:
The binary information divergence is defined as
:
for
.
Matrix Bennett and Bernstein inequalities
In the scalar setting, Bernstein inequalities (probability theory), Bennett and Bernstein inequalities describe the upper tail of a sum of independent, zero-mean random variables that are either bounded or
subexponential. In the matrix
case, the analogous results concern a sum of zero-mean random matrices.
Bounded case
Consider a finite sequence
of independent, random, self-adjoint matrices with dimension
.
Assume that each random matrix satisfies
:
almost surely.
Compute the norm of the total variance,
:
Then, the following chain of inequalities holds for all
:
:
The function
is defined as
for
.
Subexponential case
Consider a finite sequence
of independent, random, self-adjoint matrices with dimension
.
Assume that
:
for
.
Compute the variance parameter,
:
Then, the following chain of inequalities holds for all
:
:
Rectangular case
Consider a finite sequence
of independent, random, matrices with dimension
.
Assume that each random matrix satisfies
:
almost surely.
Define the variance parameter
:
Then, for all
:
holds.
[ User-friendly tail bounds for sums of random matrices]
Matrix Azuma, Hoeffding, and McDiarmid inequalities
Matrix Azuma
The scalar version of
Azuma's inequality states that a scalar
martingale exhibits normal concentration about its mean value, and the scale for deviations is controlled by the total maximum squared range of the difference sequence.
The following is the extension in matrix setting.
Consider a finite adapted sequence
of self-adjoint matrices with dimension
, and a fixed sequence
of self-adjoint matrices that satisfy
:
almost surely.
Compute the variance parameter
:
Then, for all
:
The constant 1/8 can be improved to 1/2 when there is additional information available. One case occurs when each summand
is conditionally symmetric.
Another example requires the assumption that
commutes almost surely with
.
Matrix Hoeffding
Placing addition assumption that the summands in Matrix Azuma are independent gives a matrix extension of
Hoeffding's inequalities.
Consider a finite sequence
of independent, random, self-adjoint matrices with dimension
, and let
be a sequence of fixed self-adjoint matrices.
Assume that each random matrix satisfies
:
almost surely.
Then, for all
:
where
:
An improvement of this result was established in :
for all
:
where
:
Matrix bounded difference (McDiarmid)
In scalar setting,
McDiarmid's inequality
In probability theory and theoretical computer science, McDiarmid's inequality is a concentration inequality which bounds the deviation between the sampled value and the expected value of certain functions when they are evaluated on independent ran ...
provides one common way of bounding the differences by applying
Azuma's inequality to a
Doob martingale
In the probability theory, mathematical theory of probability, a Doob martingale (named after Joseph L. Doob,
also known as a Levy martingale) is a stochastic process that approximates a given random variable and has the Martingale (probability t ...
. A version of the bounded differences inequality holds in the matrix setting.
Let
be an independent, family of random variables, and let
be a function that maps
variables to a self-adjoint matrix of dimension
.
Consider a sequence
of fixed self-adjoint matrices that satisfy
:
where
and
range over all possible values of
for each index
.
Compute the variance parameter
:
Then, for all
:
where
.
An improvement of this result was established in (see also ):
for all
:
where
and
Survey of related theorems
The first bounds of this type were derived by . Recall the
theorem above for self-adjoint matrix Gaussian and Rademacher bounds:
For a finite sequence
of fixed,
self-adjoint matrices with dimension
and for
a finite sequence of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
standard
normal or independent
Rademacher Rademacher is an occupational surname of German origin, which means "wheelmaker". It may refer to:
People
* Arthur Rademacher (1889–1981), Australian football player
*Autumn Rademacher (born 1975), American basketball coach
*Bill Rademacher (born ...
random variables, then
:
where
:
Ahlswede and Winter would give the same result, except with
:
.
By comparison, the
in the theorem above commutes
and
; that is, it is the largest eigenvalue of the sum rather than the sum of the largest eigenvalues. It is never larger than the Ahlswede–Winter value (by the
norm
Naturally occurring radioactive materials (NORM) and technologically enhanced naturally occurring radioactive materials (TENORM) consist of materials, usually industrial wastes or by-products enriched with radioactive elements found in the envir ...
triangle inequality), but can be much smaller. Therefore, the theorem above gives a tighter bound than the Ahlswede–Winter result.
The chief contribution of was the extension of the Laplace-transform method used to prove the scalar Chernoff bound (see
Chernoff bound#Additive form (absolute error)) to the case of self-adjoint matrices. The procedure given in the
derivation below. All of the recent works on this topic follow this same procedure, and the chief differences follow from subsequent steps. Ahlswede & Winter use the
Golden–Thompson inequality In physics and mathematics, the Golden–Thompson inequality is a Trace inequalities, trace inequality between Matrix_exponential, exponentials of symmetric and Hermitian matrices proved independently by and . It has been developed in the context o ...
to proceed, whereas Tropp uses
Lieb's Theorem
In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives ...
.
Suppose one wished to vary the length of the series (''n'') and the dimensions of the
matrices (''d'') while keeping the right-hand side approximately constant. Then
n must vary approximately as the log of ''d''. Several papers have attempted to establish a bound without a dependence on dimensions. Rudelson and Vershynin give a result for matrices which are the outer product of two vectors.
provide a result without the dimensional dependence for low rank matrices. The original result was derived independently from the Ahlswede–Winter approach, but proves a similar result using the Ahlswede–Winter approach.
Finally, Oliveira proves a result for matrix martingales independently from the Ahlswede–Winter framework. Tropp slightly improves on the result using the Ahlswede–Winter framework. Neither result is presented in this article.
Derivation and proof
Ahlswede and Winter
The Laplace transform argument found in is a significant result in its own right:
Let
be a random self-adjoint matrix. Then
:
To prove this, fix
. Then
:
The second-to-last inequality is
Markov's inequality. The last inequality holds since
. Since the left-most quantity is independent of
, the infimum over
remains an upper bound for it.
Thus, our task is to understand