In mathematics, the generalized minimal residual method (GMRES) is an
iterative method
In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the ''n''-th approximation is derived from the pre ...
for the
numerical solution of an indefinite nonsymmetric
system of linear equations
In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variables.
For example,
:\begin
3x+2y-z=1\\
2x-2y+4z=-2\\
-x+\fracy-z=0
\end
is a system of three equations in th ...
. The method approximates the solution by the vector in a
Krylov subspace with minimal
residual. The
Arnoldi iteration is used to find this vector.
The GMRES method was developed by
Yousef Saad and Martin H. Schultz in 1986. It is a generalization and improvement of the
MINRES method due to Paige and Saunders in 1975. The MINRES method requires that the matrix is symmetric, but has the advantage that it only requires handling of three vectors. GMRES is a special case of the
DIIS method developed by Peter Pulay in 1980. DIIS is applicable to non-linear systems.
The method
Denote the
Euclidean norm of any vector v by
. Denote the (square) system of linear equations to be solved by
:
The matrix ''A'' is assumed to be
invertible of size ''m''-by-''m''. Furthermore, it is assumed that b is normalized, i.e., that
.
The ''n''-th
Krylov subspace for this problem is
:
where
is the initial error given an initial guess
. Clearly
if
.
GMRES approximates the exact solution of
by the vector
that minimizes the Euclidean norm of the
residual .
The vectors
might be close to
linearly dependent, so instead of this basis, the
Arnoldi iteration is used to find orthonormal vectors
which form a basis for
. In particular,
.
Therefore, the vector
can be written as
with
, where
is the ''m''-by-''n'' matrix formed by
.
The Arnoldi process also produces an (
)-by-
upper
Hessenberg matrix with
:
For symmetric matrices, a symmetric tri-diagonal matrix is actually achieved, resulting in the
minres method.
Because columns of
are orthonormal, we have
:
where
:
is the first vector in the
standard basis
In mathematics, the standard basis (also called natural basis or canonical basis) of a coordinate vector space (such as \mathbb^n or \mathbb^n) is the set of vectors whose components are all zero, except one that equals 1. For example, in the c ...
of
, and
:
being the first trial vector (usually zero). Hence,
can be found by minimizing the Euclidean norm of the residual
:
This is a
linear least squares problem of size ''n''.
This yields the GMRES method. On the
-th iteration:
# calculate
with the Arnoldi method;
# find the
which minimizes
;
# compute
;
# repeat if the residual is not yet small enough.
At every iteration, a matrix-vector product
must be computed. This costs about
floating-point operations for general dense matrices of size
, but the cost can decrease to
for
sparse matrices. In addition to the matrix-vector product,
floating-point operations must be computed at the ''n'' -th iteration.
Convergence
The ''n''th iterate minimizes the residual in the Krylov subspace
. Since every subspace is contained in the next subspace, the residual does not increase. After ''m'' iterations, where ''m'' is the size of the matrix ''A'', the Krylov space ''K''
''m'' is the whole of R
''m'' and hence the GMRES method arrives at the exact solution. However, the idea is that after a small number of iterations (relative to ''m''), the vector ''x''
''n'' is already a good approximation to the
exact solution.
This does not happen in general. Indeed, a theorem of Greenbaum, Pták and Strakoš states that for every nonincreasing sequence ''a''
1, ..., ''a''
''m''−1, ''a''
''m'' = 0, one can find a matrix ''A'' such that the , , ''r''
''n'', , = ''a''
''n'' for all ''n'', where ''r''
''n'' is the residual defined above. In particular, it is possible to find a matrix for which the residual stays constant for ''m'' − 1 iterations, and only drops to zero at the last iteration.
In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of ''A'', that is
, is
positive definite, then
:
where
and
denote the smallest and largest
eigenvalue
In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denot ...
of the matrix
, respectively.
If ''A'' is
symmetric and positive definite, then we even have
:
where
denotes the
condition number of ''A'' in the Euclidean norm.
In the general case, where ''A'' is not positive definite, we have
:
where ''P''
''n'' denotes the set of polynomials of degree at most ''n'' with ''p''(0) = 1, ''V'' is the matrix appearing in the
spectral decomposition of ''A'', and ''σ''(''A'') is the
spectrum
A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of color ...
of ''A''. Roughly speaking, this says that fast convergence occurs when the eigenvalues of ''A'' are clustered away from the origin and ''A'' is not too far from
normality.
All these inequalities bound only the residuals instead of the actual error, that is, the distance between the current iterate ''x''
''n'' and the exact solution.
Extensions of the method
Like other iterative methods, GMRES is usually combined with a
preconditioning method in order to speed up convergence.
The cost of the iterations grow as O(''n''
2), where ''n'' is the iteration number. Therefore, the method is sometimes restarted after a number, say ''k'', of iterations, with ''x''
''k'' as initial guess. The resulting method is called GMRES(''k'') or Restarted GMRES. For non-positive definite matrices, this method may suffer from stagnation in convergence as the restarted subspace is often close to the earlier subspace.
The shortcomings of GMRES and restarted GMRES are addressed by the recycling of Krylov subspace in the GCRO type methods such as GCROT and GCRODR.
Recycling of Krylov subspaces in GMRES can also speed up convergence when sequences of linear systems need to be solved.
Comparison with other solvers
The Arnoldi iteration reduces to the
Lanczos iteration for symmetric matrices. The corresponding
Krylov subspace method is the minimal residual method (MinRes) of Paige and Saunders. Unlike the unsymmetric case, the MinRes method is given by a three-term
recurrence relation. It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does.
Another class of methods builds on the
unsymmetric Lanczos iteration, in particular the
BiCG method. These use a three-term recurrence relation, but they do not attain the minimum residual, and hence the residual does not decrease monotonically for these methods. Convergence is not even guaranteed.
The third class is formed by methods like
CGS and
BiCGSTAB. These also work with a three-term recurrence relation (hence, without optimality) and they can even terminate prematurely without achieving convergence. The idea behind these methods is to choose the generating polynomials of the iteration sequence suitably.
None of these three classes is the best for all matrices; there are always examples in which one class outperforms the other. Therefore, multiple solvers are tried in practice to see which one is the best for a given problem.
Solving the least squares problem
One part of the GMRES method is to find the vector
which minimizes
:
Note that
is an (''n'' + 1)-by-''n'' matrix, hence it gives an over-constrained linear system of ''n''+1 equations for ''n'' unknowns.
The minimum can be computed using a
QR decomposition: find an (''n'' + 1)-by-(''n'' + 1)
orthogonal matrix Ω
''n'' and an (''n'' + 1)-by-''n'' upper
triangular matrix
In mathematics, a triangular matrix is a special kind of square matrix. A square matrix is called if all the entries ''above'' the main diagonal are zero. Similarly, a square matrix is called if all the entries ''below'' the main diagonal ar ...
such that
:
The triangular matrix has one more row than it has columns, so its bottom row consists of zero. Hence, it can be decomposed as
:
where
is an ''n''-by-''n'' (thus square) triangular matrix.
The QR decomposition can be updated cheaply from one iteration to the next, because the Hessenberg matrices differ only by a row of zeros and a column:
:
where ''h''
''n+1'' = (''h''
1,''n+1'', …, ''h''
''n+1,n+1'')
T. This implies that premultiplying the Hessenberg matrix with Ω
''n'', augmented with zeroes and a row with multiplicative identity, yields almost a triangular matrix:
:
This would be triangular if σ is zero. To remedy this, one needs the
Givens rotation
:
where
:
With this Givens rotation, we form
:
Indeed,
:
is a triangular matrix.
Given the QR decomposition, the minimization problem is easily solved by noting that
:
Denoting the vector
by
:
with ''g''
''n'' ∈ R
''n'' and γ
''n'' ∈ R, this is
:
The vector ''y'' that minimizes this expression is given by
:
Again, the vectors
are easy to update.
[Stoer and Bulirsch, §8.7.2]
Example code
Regular GMRES (MATLAB / GNU Octave)
function , e
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline o ...
= gmres( A, b, x, max_iterations, threshold)
n = length(A);
m = max_iterations;
% use x as the initial vector
r = b - A * x;
b_norm = norm(b);
error = norm(r) / b_norm;
% initialize the 1D vectors
sn = zeros(m, 1);
cs = zeros(m, 1);
%e1 = zeros(n, 1);
e1 = zeros(m+1, 1);
e1(1) = 1;
e = rror
r_norm = norm(r);
Q(:,1) = r / r_norm;
beta = r_norm * e1; %Note: this is not the beta scalar in section "The method" above but the beta scalar multiplied by e1
for k = 1:m
% run arnoldi
(1:k+1, k), Q(:, k+1)= arnoldi(A, Q, k);
% eliminate the last element in H ith row and update the rotation matrix
(1:k+1, k), cs(k), sn(k)= apply_givens_rotation(H(1:k+1,k), cs, sn, k);
% update the residual vector
beta(k + 1) = -sn(k) * beta(k);
beta(k) = cs(k) * beta(k);
error = abs(beta(k + 1)) / b_norm;
% save the error
e = ; error
The semicolon or semi-colon is a symbol commonly used as orthographic punctuation. In the English language, a semicolon is most commonly used to link (in a single sentence) two independent clauses that are closely related in thought. When ...
if (error <= threshold)
break;
end
end
% if threshold is not reached, k = m at this point (and not m+1)
% calculate the result
y = H(1:k, 1:k) \ beta(1:k);
x = x + Q(:, 1:k) * y;
end
%----------------------------------------------------%
% Arnoldi Function %
%----------------------------------------------------%
function , q
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe