In mathematics, the square root of a matrix extends the notion of

square root In mathematics, a square root of a number is a number such that ; in other words, a number whose ''square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...

from numbers to matrices. A matrix is said to be a square root of if the matrix product is equal to . Some authors use the name ''square root'' or the notation only for the specific case when is positive semidefinite, to denote the unique matrix that is positive semidefinite and such that (for real-valued matrices, where is the

transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...

of ). Less frequently, the name ''square root'' may be used for any factorization of a positive semidefinite matrix as , as in the Cholesky factorization, even if . This distinct meaning is discussed in '.

Examples

In general, a matrix can have several square roots. In particular, if

A = B^2

then

A=(-B)^2

as well. The 2×2

identity matrix In linear algebra, the identity matrix of size n is the n\times n square matrix with ones on the main diagonal and zeros elsewhere. Terminology and notation The identity matrix is often denoted by I_n, or simply by I if the size is immaterial ...

\textstyle\begin1 & 0\\ 0 & 1\end

has infinitely many square roots. They are given by :

\begin \pm 1 &  0\\  0 & \pm 1\end

and

\begin a &  b\\  c & -a\end

where

(a, b, c)

are any numbers (real or complex) such that

a^2+bc=1

. In particular if

(a,b,t)

is any Pythagorean triple—that is, any set of positive integers such that

a^2 + b^2 = t^2

, then

\frac\begina & b\\ b & -a\end

is a square root matrix of

I

which is symmetric and has rational entries. Thus :

\begin1 & 0\\ 0 & 1\end= \begin0 & 1\\ 1 & 0\end^2 = \begin\frac & \frac\\ \frac & -\frac\end^2.

Minus identity has a square root, for example: :

-\begin1 & 0\\ 0 & 1\end = \begin0 & -1\\ 1 & 0\end^2,

which can be used to represent the

imaginary unit The imaginary unit or unit imaginary number () is a solution to the quadratic equation x^2+1=0. Although there is no real number with this property, can be used to extend the real numbers to what are called complex numbers, using addition a ...

and hence all

complex numbers In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the for ...

using 2×2 real matrices, see Matrix representation of complex numbers. Just as with the

real numbers In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every ...

, a real matrix may fail to have a real square root, but have a square root with complex-valued entries. Some matrices have no square root. An example is the matrix

\begin0 & 1\\ 0 & 0\end.

While the square root of a nonnegative

integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

is either again an integer or an

irrational number In mathematics, the irrational numbers (from in- prefix assimilated to ir- (negative prefix, privative) + rational) are all the real numbers that are not rational numbers. That is, irrational numbers cannot be expressed as the ratio of two inte ...

, in contrast an ''integer matrix'' can have a square root whose entries are rational, yet non-integral, as in examples above.

Positive semidefinite matrices

A symmetric real ''n'' × ''n'' matrix is called '' positive semidefinite'' if

x^\textsf A x \geq 0

for all

x \in \mathbb^n

(here

x^\textsf

denotes the

, changing a column vector into a row vector). A square real matrix is positive semidefinite if and only if

A = B^\textsf B

for some matrix . There can be many different such matrices . A positive semidefinite matrix can also have many matrices such that

A = B B

. However, always has precisely one square root that is positive semidefinite (and hence symmetric). In particular, since is required to be symmetric,

B=B^\textsf

, so the two conditions

A = B B

A = B^\textsf B

are equivalent. For complex-valued matrices, the

conjugate transpose In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an m \times n complex matrix \boldsymbol is an n \times m matrix obtained by transposing \boldsymbol and applying complex conjugate on each entry (the complex c ...

B^*

is used instead and positive semidefinite matrices are Hermitian, meaning

B^*=B

. This unique matrix is called the principal, non-negative, or positive square root (the latter in the case of positive definite matrices). The principal square root of a real positive semidefinite matrix is real. The principal square root of a positive definite matrix is positive definite; more generally, the rank of the principal square root of is the same as the rank of . The operation of taking the principal square root is continuous on this set of matrices. These properties are consequences of the holomorphic functional calculus applied to matrices. The existence and uniqueness of the principal square root can be deduced directly from the

Jordan normal form In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to ...

(see below).

Matrices with distinct eigenvalues

An matrix with ''distinct nonzero eigenvalues'' has 2^''n'' square roots. Such a matrix, , has an eigendecomposition where is the matrix whose columns are eigenvectors of and is the diagonal matrix whose diagonal elements are the corresponding eigenvalues . Thus the square roots of are given by , where is any square root matrix of , which, for distinct eigenvalues, must be diagonal with diagonal elements equal to square roots of the diagonal elements of ; since there are two possible choices for a square root of each diagonal element of , there are 2^''n'' choices for the matrix . This also leads to a proof of the above observation, that a positive-definite matrix has precisely one positive-definite square root: a positive definite matrix has only positive eigenvalues, and each of these eigenvalues has only one positive square root; and since the eigenvalues of the square root matrix are the diagonal elements of , for the square root matrix to be itself positive definite necessitates the use of only the unique positive square roots of the original eigenvalues.

Solutions in closed form

If a matrix is idempotent, meaning

A^2 = A

, then by definition one of its square roots is the matrix itself.

Diagonal and triangular matrices

If is a

diagonal In geometry, a diagonal is a line segment joining two vertices of a polygon or polyhedron, when those vertices are not on the same edge. Informally, any sloping line is called diagonal. The word ''diagonal'' derives from the ancient Gree ...

''n'' × ''n'' matrix

D = \operatorname(\lambda_1,\dots,\lambda_n)

, then some of its square roots are diagonal matrices

\operatorname(\mu_1,\dots,\mu_n)

, where

\mu_i = \pm \sqrt

. If the diagonal elements of are real and non-negative then it is positive semidefinite, and if the square roots are taken with non-negative sign, the resulting matrix is the principal root of . A diagonal matrix may have additional non-diagonal roots if some entries on the diagonal are equal, as exemplified by the identity matrix above. If is an upper triangular matrix (meaning its entries are

u_ = 0

for

i > j

) and at most one of its diagonal entries is zero, then one upper triangular solution of the equation

B^2 = U

can be found as follows. Since the equation

u_ = b_^2

should be satisfied, let

b_

be the principal square root of the complex number

u_

. By the assumption

u_ \neq 0

, this guarantees that

b_ + b_ \neq 0

for all (because the principal square roots of complex numbers all lie on one half of the complex plane). From the equation :

u_ = b_ b_ + b_ b_ + b_ b_ + \dots + b_ b_

we deduce that

b_

can be computed recursively for

j-i

increasing from 1 to ''n''-1 as: :

b_ = \frac\left(u_ - b_ b_ - b_ b_ - \dots - b_ b_\right).

If is upper triangular but has multiple zeroes on the diagonal, then a square root might not exist, as exemplified by

\left(\begin0 & 1\\ 0 & 0\end\right)

. Note the diagonal entries of a triangular matrix are precisely its

eigenvalue In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denot ...

s (see Triangular matrix#Properties).

By diagonalization

An ''n'' × ''n'' matrix is diagonalizable if there is a matrix and a diagonal matrix such that . This happens if and only if has ''n''

eigenvector In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denote ...

s which constitute a basis for . In this case, can be chosen to be the matrix with the ''n'' eigenvectors as columns, and thus a square root of is :

R = V S V^~,

where is any square root of . Indeed, :

\left(V D^\frac V^\right)^2 = V D^\frac \left(V^ V\right) D^\frac V^ = V D V^ = A ~.

For example, the matrix

A = \left(\begin 33 & 24\\ 48 & 57\end\right)

can be diagonalized as , where :

V = \begin 1 & 1\\ 2 & -1\end

and

D = \begin 81 & 0\\ 0 & 9\end

. has principal square root :

D^\frac = \begin 9 & 0\\ 0 & 3\end

, giving the square root :

A^\frac = V D^\frac V^ = \begin 5 & 2\\ 4 & 7\end

. When is symmetric, the diagonalizing matrix can be made an

orthogonal matrix In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors. One way to express this is Q^\mathrm Q = Q Q^\mathrm = I, where is the transpose of and is the identity ...

by suitably choosing the eigenvectors (see

spectral theorem In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized (that is, represented as a diagonal matrix in some basis). This is extremely useful b ...

). Then the inverse of is simply the transpose, so that :

R = V S V^\textsf~.

By Schur decomposition

Every complex-valued square matrix

A

, regardless of diagonalizability, has a Schur decomposition given by

A=QUQ^*

where

U

is upper triangular and

Q

is unitary (meaning The

s of

A

are exactly the diagonal entries of

U

; if at most one of them is zero, then the following is a square root :

A^\frac = Q U^\frac Q^*.

where a square root

U^\frac

of the upper triangular matrix

U

can be found as described above. If

A

is positive definite, then the eigenvalues are all positive reals, so the chosen diagonal of

U^\frac

also consists of positive reals. Hence the eigenvalues of

Q U^\frac Q^*

are positive reals, which means the resulting matrix is the principal root of

A

By Jordan decomposition

Similarly as for the Schur decomposition, every square matrix

A

can be decomposed as

A = P^ J P

where is invertible and is in

. To see that any complex matrix with positive eigenvalues has a square root of the same form, it suffices to check this for a Jordan block. Any such block has the form λ(''I'' + ''N'') with λ > 0 and ''N''

nilpotent In mathematics, an element x of a ring R is called nilpotent if there exists some positive integer n, called the index (or sometimes the degree), such that x^n=0. The term was introduced by Benjamin Peirce in the context of his work on the cl ...

. If is the binomial expansion for the square root (valid in , ''z'', < 1), then as a formal power series its square equals 1 + ''z''. Substituting ''N'' for ''z'', only finitely many terms will be non-zero and gives a square root of the Jordan block with eigenvalue . It suffices to check uniqueness for a Jordan block with λ = 1. The square constructed above has the form where ''L'' is polynomial in ''N'' without constant term. Any other square root ''T'' with positive eigenvalues has the form ''T'' = ''I'' + ''M'' with nilpotent, commuting with ''N'' and hence ''L''. But then . Since and commute, the matrix is nilpotent and is invertible with inverse given by a Neumann series. Hence . If is a matrix with positive eigenvalues and minimal polynomial , then the Jordan decomposition into generalized eigenspaces of can be deduced from the partial fraction expansion of . The corresponding projections onto the generalized eigenspaces are given by real polynomials in . On each eigenspace, has the form as above. The power series expression for the square root on the eigenspace show that the principal square root of has the form ''q''(''A'') where ''q''(''t'') is a polynomial with real coefficients.

Power series

Recall the formal power series

(1 - z)^\frac = \sum_^\infty (-1)^n \binom z^n

, which converges provided

\, z\,  \leq 1

(since the coefficients of the power series are summable). Plugging in

z = I - A

into this expression yields :

A^\frac := \sum_^\infty (-1)^n  (I - A)^n

provided that

\limsup_n\, (I - A)^n\, ^\frac < 1

. By virtue of Gelfand formula, that condition is equivalent to the requirement that the spectrum of

A

is contained within the disk

D(1,1) \subseteq \mathbb

. This method of defining or computing

A^\frac

is especially useful in the case where

A

is positive semi-definite. In that case, we have

\left\, I - \frac\right\,  \leq 1

and therefore

\left\, \left(I - \frac\right)^n\right\,  \leq \left\, I - \frac\right\, ^n \leq 1

, so that the expression

\, A\, ^\frac\left(\sum_^\infty (-1)^n \binom \left(I - \frac\right)^n\right)

defines a square root of

A

which moreover turns out to be the unique positive semi-definite root. This method remains valid to define square roots of operators on infinite-dimensional Banach or Hilbert spaces or certain elements of (C*) Banach algebras.

Iterative solutions

By Denman–Beavers iteration

Another way to find the square root of an matrix ''A'' is the Denman–Beavers square root iteration. Let ''Y''₀ = ''A'' and ''Z''₀ = ''I'', where ''I'' is the

. The iteration is defined by :

\begin
  Y_ &= \frac \left(Y_k + Z_k^\right), \\ 
  Z_ &= \frac \left(Z_k + Y_k^\right). 
\end

As this uses a pair of sequences of matrix inverses whose later elements change comparatively little, only the first elements have a high computational cost since the remainder can be computed from earlier elements with only a few passes of a variant of

Newton's method In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots (or zeroes) of a real ...

for computing inverses, :

X_ = 2X_n - X_n B X_n.

With this, for later values of one would set

X_0 = Z_^

and

B = Z_k,

and then use

Z_k^ = X_n

for some small n (perhaps just 1), and similarly for

Y_k^.

Convergence is not guaranteed, even for matrices that do have square roots, but if the process converges, the matrix

Y_k

converges quadratically to a square root ^1/2, while

Z_k

converges to its inverse, ^−1/2.

By the Babylonian method

Yet another iterative method is obtained by taking the well-known formula of the Babylonian method for computing the square root of a real number, and applying it to matrices. Let ''X''₀ = ''I'', where ''I'' is the

. The iteration is defined by :

X_ = \frac \left(X_k + A X_k^\right)\,.

Again, convergence is not guaranteed, but if the process converges, the matrix

X_k

converges quadratically to a square root ''A''^1/2. Compared to Denman–Beavers iteration, an advantage of the Babylonian method is that only one matrix inverse need be computed per iteration step. On the other hand, as Denman–Beavers iteration uses a pair of sequences of matrix inverses whose later elements change comparatively little, only the first elements have a high computational cost since the remainder can be computed from earlier elements with only a few passes of a variant of

for computing inverses (see Denman–Beavers iteration above); of course, the same approach can be used to get the single sequence of inverses needed for the Babylonian method. However, unlike Denman–Beavers iteration, the Babylonian method is numerically unstable and more likely to fail to converge. The Babylonian method follows from

for the equation

X^2-A=0

and using

AX_k=X_k A

for

Square roots of positive operators

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matric ...

and operator theory, given a

bounded Boundedness or bounded may refer to: Economics * Bounded rationality, the idea that human rationality in decision-making is bounded by the available information, the cognitive limitations, and the time available to make the decision * Bounded e ...

positive semidefinite operator (a non-negative operator) ''T'' on a complex Hilbert space, ''B'' is a square root of ''T'' if ''T'' = ''B* B'', where ''B*'' denotes the Hermitian adjoint of ''B''. According to the

, the continuous functional calculus can be applied to obtain an operator ''T''^1/2 such that ''T''^1/2 is itself positive and (''T''^1/2)² = ''T''. The operator ''T''^1/2 is the unique non-negative square root of ''T''. A bounded non-negative operator on a complex Hilbert space is self adjoint by definition. So Conversely, it is trivially true that every operator of the form ''B* B'' is non-negative. Therefore, an operator ''T'' is non-negative

if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bi ...

''T'' = ''B* B'' for some ''B'' (equivalently, ''T'' = ''CC*'' for some ''C''). The Cholesky factorization provides another particular example of square root, which should not be confused with the unique non-negative square root.

Unitary freedom of square roots

If ''T'' is a non-negative operator on a finite-dimensional Hilbert space, then all square roots of ''T'' are related by unitary transformations. More precisely, if ''T'' = ''A*A'' = ''B*B'', then there exists a unitary ''U'' such that ''A'' = ''UB''. Indeed, take ''B'' = ''T'' to be the unique non-negative square root of ''T''. If ''T'' is strictly positive, then ''B'' is invertible, and so is unitary: :

\begin
  U^*U &= \left(\left(B^*\right)^A^*\right)\left(AB^\right)
        = \left(B^*\right)^T \left(B^\right) \\
       &= \left(B^*\right)^ B^* B \left(B^\right) = I.
\end

If ''T'' is non-negative without being strictly positive, then the inverse of ''B'' cannot be defined, but the Moore–Penrose pseudoinverse ''B''⁺ can be. In that case, the operator is a

partial isometry In functional analysis a partial isometry is a linear map between Hilbert spaces such that it is an isometry on the orthogonal complement of its kernel. The orthogonal complement of its kernel is called the initial subspace and its range is cal ...

, that is, a unitary operator from the range of ''T'' to itself. This can then be extended to a unitary operator ''U'' on the whole space by setting it equal to the identity on the kernel of ''T''. More generally, this is true on an infinite-dimensional Hilbert space if, in addition, ''T'' has closed range. In general, if ''A'', ''B'' are closed and densely defined operators on a Hilbert space ''H'', and ''A* A'' = ''B* B'', then ''A = UB'' where ''U'' is a partial isometry.

Some applications

Square roots, and the unitary freedom of square roots, have applications throughout functional analysis and linear algebra.

Polar decomposition

If ''A'' is an invertible operator on a finite-dimensional Hilbert space, then there is a unique unitary operator ''U'' and positive operator ''P'' such that :

A = UP;

this is the polar decomposition of ''A''. The positive operator ''P'' is the unique positive square root of the positive operator ''A''^∗''A'', and ''U'' is defined by . If ''A'' is not invertible, then it still has a polar composition in which ''P'' is defined in the same way (and is unique). The unitary operator ''U'' is not unique. Rather it is possible to determine a "natural" unitary operator as follows: ''AP''⁺ is a unitary operator from the range of ''A'' to itself, which can be extended by the identity on the kernel of ''A''^∗. The resulting unitary operator ''U'' then yields the polar decomposition of ''A''.

Kraus operators

By Choi's result, a linear map :

\Phi : C^ \to C^

is completely positive if and only if it is of the form :

\Phi(A) = \sum_i ^k V_i A V_i^*

where ''k'' ≤ ''nm''. Let ⊂ C^{''n'' × ''n''} be the ''n''² elementary matrix units. The positive matrix :

M_\Phi = \left(\Phi \left(E_\right)\right)_ \in C^

is called the ''Choi matrix'' of Φ. The Kraus operators correspond to the, not necessarily square, square roots of ''M''_Φ: For any square root ''B'' of ''M''_Φ, one can obtain a family of Kraus operators ''V_i'' by undoing the Vec operation to each column ''b_i'' of ''B''. Thus all sets of Kraus operators are related by partial isometries.

Mixed ensembles

In quantum physics, a density matrix for an ''n''-level quantum system is an ''n'' × ''n'' complex matrix ''ρ'' that is positive semidefinite with trace 1. If ''ρ'' can be expressed as :

\rho = \sum_i p_i v_i v_i^*

where

p_i > 0

and Σ ''p_i'' = 1, the set :

\left\

is said to be an ensemble that describes the mixed state ''ρ''. Notice is not required to be orthogonal. Different ensembles describing the state ''ρ'' are related by unitary operators, via the square roots of ''ρ''. For instance, suppose :

\rho = \sum_j a_j a_j^*.

The trace 1 condition means :

\sum_j a_j ^* a_j = 1.

Let :

p_i = a_i ^* a_i,

and ''v_i'' be the normalized ''a_i''. We see that :

\left\

gives the mixed state ''ρ''.

Unscented Kalman Filter

In the Unscented Kalman Filter (UKF), the square root of the state error covariance matrix is required for the unscented transform which is the statistical linearization method used. A comparison between different matrix square root calculation methods within a UKF application of GPS/INS sensor fusion was presented, which indicated that the Cholesky decomposition method was best suited for UKF applications.

Notes

References

* * , Chapter IV, Reisz functional calculus * * * * * * * Matrix theory

Examples

Positive semidefinite matrices

Matrices with distinct eigenvalues

Solutions in closed form

Diagonal and triangular matrices

By diagonalization

By Schur decomposition

By Jordan decomposition

Power series

Iterative solutions

By Denman–Beavers iteration

By the Babylonian method

Square roots of positive operators

Unitary freedom of square roots

Some applications

Polar decomposition

Kraus operators

Mixed ensembles

Unscented Kalman Filter

See also

Notes

References