HOME

TheInfoList



OR:

In numerical analysis, catastrophic cancellation is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers. For example, if there are two studs, one L_1 = 254.5\,\text long and the other L_2 = 253.5\,\text long, and they are measured with a ruler that is good only to the centimeter, then the approximations could come out to be \tilde L_1 = 255\,\text and \tilde L_2 = 253\,\text. These may be good approximations, in relative error, to the true lengths: the approximations are in error by less than 2% of the true lengths, , L_1 - \tilde L_1, /, L_1, < 2\%. However, if the ''approximate'' lengths are subtracted, the difference will be \tilde L_1 - \tilde L_2 = 255\,\text - 253\,\text = 2\,\text, even though the true difference between the lengths is L_1 - L_2 = 254.5\,\text - 253.5\,\text = 1\,\text. The difference of the approximations, 2\,\text, is in error by 100% of the magnitude of the difference of the true values, 1\,\text. Catastrophic cancellation may happen even if the difference is computed exactly, as in the example above—it is not a property of any particular kind of arithmetic like
floating-point arithmetic In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
; rather, it is inherent to subtraction, when the ''inputs'' are approximations themselves. Indeed, in floating-point arithmetic, when the inputs are close enough, the floating-point difference is computed exactly, by the
Sterbenz lemma In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974. The Sterbe ...
—there is no rounding error introduced by the floating-point subtraction operation.


Formal analysis

Formally, catastrophic cancellation happens because subtraction is ill-conditioned at nearby inputs: even if approximations \tilde x = x (1 + \delta_x) and \tilde y = y (1 + \delta_y) have small relative errors , \delta_x, = , x - \tilde x, /, x, and , \delta_y, = , y - \tilde y, /, y, from true values x and y, respectively, the relative error of the approximate ''difference'' \tilde x - \tilde y from the true difference x - y is inversely proportional to the true difference: \begin \tilde x - \tilde y &= x (1 + \delta_x) - y (1 + \delta_y) = x - y + x \delta_x - y \delta_y \\ &= x - y + (x - y) \frac \\ &= (x - y) \biggr(1 + \frac\biggr). \end Thus, the relative error of the exact difference \tilde x - \tilde y of the approximations from the difference x - y of the true numbers is \left, \frac\. which can be arbitrarily large if the true inputs x and y are close.


In numerical algorithms

Subtracting nearby numbers in floating-point arithmetic does not always cause catastrophic cancellation, or even any error—by the
Sterbenz lemma In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974. The Sterbe ...
, if the numbers are close enough the floating-point difference is exact. But cancellation may ''amplify'' errors in the inputs that arose from rounding in other floating-point arithmetic.


Example: Difference of squares

Given numbers x and y, the naive attempt to compute the mathematical function x^2 - y^2 by the floating-point arithmetic \operatorname(\operatorname(x^2) - \operatorname(y^2)) is subject to catastrophic cancellation when x and y are close in magnitude, because the subtraction can expose the rounding errors in the squaring. The alternative factoring (x + y) (x - y), evaluated by the floating-point arithmetic \operatorname(\operatorname(x + y) \cdot \operatorname(x - y)), avoids catastrophic cancellation because it avoids introducing rounding error leading into the subtraction. For example, if x = 1 + 2^ \approx 1.0000000018626451 and y = 1 + 2^ \approx 1.0000000009313226, then the true value of the difference x^2 - y^2 is 2^ \cdot (1 + 2^ + 2^) \approx 1.8626451518330422 \times 10^. In IEEE 754 binary64 arithmetic, evaluating the alternative factoring (x + y) (x - y) gives the correct result exactly (with no rounding), but evaluating the naive expression x^2 - y^2 gives the floating-point number 2^ = 1.8626451\underline \times 10^, of which less than half the digits are correct and the other (underlined) digits reflect the missing terms 2^ + 2^, lost due to rounding when calculating the intermediate squared values.


Example: Complex arcsine

When computing the complex arcsine function, one may be tempted to use the logarithmic formula directly: \arcsin(z) = i \log \bigl(\sqrt - i z\bigr). However, suppose z = i y for y \ll 0. Then \sqrt \approx -y and i z = -y; call the difference between them \varepsilon—a very small difference, nearly zero. If \sqrt is evaluated in floating-point arithmetic giving \operatorname\Bigl(\sqrt\Bigr) = \sqrt(1 + \delta) with any error \delta \ne 0, where \operatorname(\cdots) denotes floating-point rounding, then computing the difference \sqrt(1 + \delta) - i z of two nearby numbers, both very close to -y, may amplify the error \delta in one input by a factor of 1/\varepsilon—a very large factor because \varepsilon was nearly zero. For instance, if z = -1234567 i, the true value of \arcsin(z) is approximately -14.71937803983977i, but using the naive logarithmic formula in IEEE 754 binary64 arithmetic may give -14.719\underlinei, with only five out of sixteen digits correct and the remainder (underlined) all garbage. In the case of z = i y for y < 0, using the identity \arcsin(z) = -\arcsin(-z) avoids cancellation because \sqrt = \sqrt \approx -y but i (-z) = -i z = y, so the subtraction is effectively addition with the same sign which does not cancel.


Example: Radix conversion

Numerical constants in software programs are often written in decimal, such as in the C fragment double x = 1.000000000000001; to declare and initialize an IEEE 754 binary64 variable named x. However, 1.000000000000001 is not a binary64 floating-point number; the nearest one, which x will be initialized to in this fragment, is 1.0000000000000011102230246251565404236316680908203125 = 1 + 5\cdot 2^. Although the radix conversion from decimal floating-point to binary floating-point only incurs a small relative error, catastrophic cancellation may amplify it into a much larger one: double x = 1.000000000000001; // rounded to 1 + 5*2^ double y = 1.000000000000002; // rounded to 1 + 9*2^ double z = y - x; // difference is exactly 4*2^ The difference 1.000000000000002 - 1.000000000000001 is 0.000000000000001 = 1.0\times 10^. The relative errors of x from 1.000000000000001 and of y from 1.000000000000002 are both below 10^ = 0.0000000000001\%, and the floating-point subtraction y - x is computed exactly by the Sterbenz lemma. But even though the inputs are good approximations, and even though the subtraction is computed exactly, the difference of the ''approximations'' \tilde y - \tilde x = (1 + 9\cdot 2^) - (1 + 5\cdot 2^) = 4\cdot 2^ \approx 8.88\times 10^ has a relative error of over 11\% from the difference 1.0\times 10^ of the original values as written in decimal: catastrophic cancellation amplified a tiny error in radix conversion into a large error in the output.


Benign cancellation

Cancellation is sometimes useful and desirable in numerical algorithms. For example, the 2Sum and Fast2Sum algorithms both rely on such cancellation after a rounding error in order to exactly compute what the error was in a floating-point addition operation as a floating-point number itself. The function \log(1 + x), if evaluated naively at points 0 < x \lll 1, will lose most of the digits of x in rounding \operatorname(1 + x). However, the function \log(1 + x) itself is well-conditioned at inputs near 0. Rewriting it as \log(1 + x) = x \frac exploits cancellation in \hat x := \operatorname(1 + x) - 1 to avoid the error from \log(1 + x) evaluated directly. This works because the cancellation in the numerator \log(\operatorname(1 + x)) = \hat x + O(\hat x^2) and the cancellation in the denominator \hat x = \operatorname(1 + x) - 1 counteract each other; the function \mu(\xi) = \log(1 + \xi)/\xi is well-enough conditioned near zero that \mu(\hat x) gives a good approximation to \mu(x), and thus x\cdot \mu(\hat x) gives a good approximation to x\cdot \mu(x) = \log(1 + x).


References

{{reflist Numerical analysis