HOME

TheInfoList



OR:

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on
decimal The decimal numeral system (also called the base-ten positional numeral system and denary or decanary) is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral ...
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions (common in human-entered data, such as measurements or financial information) and binary (base-2) fractions. The advantage of decimal floating-point representation over decimal fixed-point and
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the languag ...
representation is that it supports a much wider range of values. For example, while a fixed-point representation that allocates 8 decimal digits and 2 decimal places can represent the numbers 123456.78, 8765.43, 123.00, and so on, a floating-point representation with 8 decimal digits could also represent 1.2345678, 1234567.8, 0.000012345678, 12345678000000000, and so on. This wider range can dramatically slow the accumulation of rounding errors during successive calculations; for example, the
Kahan summation algorithm In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite- precision floating-point numbers, compared to the obvious a ...
can be used in floating point to add many numbers with no asymptotic accumulation of rounding error.


Implementations

Early mechanical uses of decimal floating point are evident in the
abacus The abacus (''plural'' abaci or abacuses), also called a counting frame, is a calculating tool which has been used since ancient times. It was used in the ancient Near East, Europe, China, and Russia, centuries before the adoption of the Hi ...
, slide rule, the Smallwood calculator, and some other
calculator An electronic calculator is typically a portable electronic device used to perform calculations, ranging from basic arithmetic to complex mathematics. The first solid-state electronic calculator was created in the early 1960s. Pocket-sized ...
s that support entries in
scientific notation Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...
. In the case of the mechanical calculators, the exponent is often treated as side information that is accounted for separately. The
IBM 650 The IBM 650 Magnetic Drum Data-Processing Machine is an early digital computer produced by IBM in the mid-1950s. It was the first mass produced computer in the world. Almost 2,000 systems were produced, the last in 1962, and it was the fir ...
computer supported an 8-digit decimal floating-point format in 1953. The otherwise binary Wang VS machine supported a 64-bit decimal floating-point format in 1977. The floating-point support library for the
Motorola 68040 The Motorola 68040 ("''sixty-eight-oh-forty''") is a 32-bit microprocessor in the Motorola 68000 series, released in 1990. It is the successor to the 68030 and is followed by the 68060, skipping the 68050. In keeping with general Motorola na ...
processor provided a 96-bit decimal floating-point storage format in 1990. Some computer languages have implementations of decimal floating-point arithmetic, including
PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. I ...
, C#,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
with big decimal,
emacs Emacs , originally named EMACS (an acronym for "Editor MACroS"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, ...
with calc, and Python's decimal module. In 1987, the
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
released IEEE 854, a standard for computing with decimal floating point, which lacked a specification for how floating-point data should be encoded for interchange with other systems. This was subsequently addressed in IEEE 754-2008, which standardized the encoding of decimal floating-point data, albeit with two different alternative methods. IBM POWER6 and newer POWER processors include DFP in hardware, as does the IBM System z9 (and later zSeries machines). SilMinds offers SilAx, a configurable vector DFP
coprocessor A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I ...
. IEEE 754-2008 defines this in more detail.
Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...
also has 64-bit
Sparc SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
processors with DFP in hardware. Microsoft C#, or .NET, uses System.Decimal.


IEEE 754-2008 encoding

The IEEE 754-2008 standard defines 32-, 64- and 128-bit decimal floating-point representations. Like the binary floating-point formats, the number is divided into a sign, an exponent, and a
significand The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
. Unlike binary floating-point, numbers are not necessarily normalized; values with few
significant digits Significant figures (also known as the significant digits, ''precision'' or ''resolution'') of a number in positional notation are digits in the number that are reliable and necessary to indicate the quantity of something. If a number expres ...
have multiple possible representations: 1×102=0.1×103=0.01×104, etc. When the significand is zero, the exponent can be any value at all. The exponent ranges were chosen so that the range available to normalized values is approximately symmetrical. Since this cannot be done exactly with an even number of possible exponent values, the extra value was given to Emax. Two different representations are defined: * One with a binary integer significand field encodes the significand as a large binary integer between 0 and 10''p''−1. This is expected to be more convenient for software implementations using a binary ALU. * Another with a densely packed decimal significand field encodes decimal digits more directly. This makes conversion to and from binary floating-point form faster, but requires specialized hardware to manipulate efficiently. This is expected to be more convenient for hardware implementations. Both alternatives provide exactly the same range of representable values. The most significant two bits of the exponent are limited to the range of 0−2, and the most significant 4 bits of the significand are limited to the range of 0−9. The 30 possible combinations are encoded in a 5-bit field, along with special forms for infinity and NaN. If the most significant 4 bits of the significand are between 0 and 7, the encoded value begins as follows: s 00mmm xxx Exponent begins with 00, significand with 0mmm s 01mmm xxx Exponent begins with 01, significand with 0mmm s 10mmm xxx Exponent begins with 10, significand with 0mmm If the leading 4 bits of the significand are binary 1000 or 1001 (decimal 8 or 9), the number begins as follows: s 1100m xxx Exponent begins with 00, significand with 100m s 1101m xxx Exponent begins with 01, significand with 100m s 1110m xxx Exponent begins with 10, significand with 100m The leading bit (s in the above) is a sign bit, and the following bits (xxx in the above) encode the additional exponent bits and the remainder of the most significant digit, but the details vary depending on the encoding alternative used. The final combinations are used for infinities and NaNs, and are the same for both alternative encodings: s 11110 x ±Infinity (see
Extended real number line In mathematics, the affinely extended real number system is obtained from the real number system \R by adding two infinity elements: +\infty and -\infty, where the infinities are treated as actual numbers. It is useful in describing the algebra on ...
) s 11111 0 quiet NaN (sign bit ignored) s 11111 1 signaling NaN (sign bit ignored) In the latter cases, all other bits of the encoding are ignored. Thus, it is possible to initialize an array to NaNs by filling it with a single byte value.


Binary integer significand field

This format uses a binary significand from 0 to 10p−1. For example, the Decimal32 significand can be up to 107−1 = = 98967F16 = . While the encoding can represent larger significands, they are illegal and the standard requires implementations to treat them as 0, if encountered on input. As described above, the encoding varies depending on whether the most significant 4 bits of the significand are in the range 0 to 7 (00002 to 01112), or higher (10002 or 10012). If the 2 bits after the sign bit are "00", "01", or "10", then the exponent field consists of the 8 bits following the sign bit (the 2 bits mentioned plus 6 bits of "exponent continuation field"), and the significand is the remaining 23 bits, with an implicit leading 0 bit, shown here in parentheses:
 s 00eeeeee   (0)ttt tttttttttt tttttttttt
 s 01eeeeee   (0)ttt tttttttttt tttttttttt
 s 10eeeeee   (0)ttt tttttttttt tttttttttt
This includes subnormal numbers where the leading significand digit is 0. If the 2 bits after the sign bit are "11", then the 8-bit exponent field is shifted 2 bits to the right (after both the sign bit and the "11" bits thereafter), and the represented significand is in the remaining 21 bits. In this case there is an implicit (that is, not stored) leading 3-bit sequence "100" in the true significand:
 s 1100eeeeee (100)t tttttttttt tttttttttt
 s 1101eeeeee (100)t tttttttttt tttttttttt
 s 1110eeeeee (100)t tttttttttt tttttttttt
The "11" 2-bit sequence after the sign bit indicates that there is an ''implicit'' "100" 3-bit prefix to the significand. Note that the leading bits of the significand field do ''not'' encode the most significant decimal digit; they are simply part of a larger pure-binary number. For example, a significand of is encoded as binary , with the leading 4 bits encoding 7; the first significand which requires a 24th bit (and thus the second encoding form) is 223 = . In the above cases, the value represented is: : (−1)sign × 10exponent−101 × significand Decimal64 and Decimal128 operate analogously, but with larger exponent continuation and significand fields. For Decimal128, the second encoding form is actually never used; the largest valid significand of 1034−1 = 1ED09BEAD87C0378D8E63FFFFFFFF16 can be represented in 113 bits.


Densely packed decimal significand field

In this version, the significand is stored as a series of decimal digits. The leading digit is between 0 and 9 (3 or 4 binary bits), and the rest of the significand uses the densely packed decimal (DPD) encoding. The leading 2 bits of the exponent and the leading digit (3 or 4 bits) of the significand are combined into the five bits that follow the sign bit. This is followed by a fixed-offset exponent continuation field. Finally, the significand continuation field made of 2, 5, or 11 10-bit '' declets'', each encoding 3 decimal digits. If the first two bits after the sign bit are "00", "01", or "10", then those are the leading bits of the exponent, and the three bits after that are interpreted as the leading decimal digit (0 to 7):
    Comb.  Exponent          Significand
 s 00 TTT (00)eeeeee (0TTT) ttttttttttttttttttt]
 s 01 TTT (01)eeeeee (0TTT) ttttttttttttttttttt]
 s 10 TTT (10)eeeeee (0TTT) ttttttttttttttttttt]
If the first two bits after the sign bit are "11", then the second two bits are the leading bits of the exponent, and the last bit is prefixed with "100" to form the leading decimal digit (8 or 9):
    Comb.  Exponent          Significand
 s 1100 T (00)eeeeee (100T) ttttttttttttttttttt]
 s 1101 T (01)eeeeee (100T) ttttttttttttttttttt]
 s 1110 T (10)eeeeee (100T) ttttttttttttttttttt]
The remaining two combinations (11110 and 11111) of the 5-bit field are used to represent ±infinity and NaNs, respectively.


Floating-point arithmetic operations

The usual rule for performing floating-point arithmetic is that the exact mathematical value is calculated,Computer hardware doesn't necessarily compute the exact value; it simply has to produce the equivalent rounded result as though it had computed the infinitely precise result. and the result is then rounded to the nearest representable value in the specified precision. This is in fact the behavior mandated for IEEE-compliant computer hardware, under normal rounding behavior and in the absence of exceptional conditions. For ease of presentation and understanding, 7-digit precision will be used in the examples. The fundamental principles are the same in any precision.


Addition

A simple method to add floating-point numbers is to first represent them with the same exponent. In the example below, the second number is shifted right by 3 digits. We proceed with the usual addition method: The following example is decimal, which simply means the base is 10. 123456.7 = 1.234567 × 105 101.7654 = 1.017654 × 102 = 0.001017654 × 105 Hence: 123456.7 + 101.7654 = (1.234567 × 105) + (1.017654 × 102) = (1.234567 × 105) + (0.001017654 × 105) = 105 × (1.234567 + 0.001017654) = 105 × 1.235584654 This is nothing other than converting to
scientific notation Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...
. In detail: e=5; s=1.234567 (123456.7) + e=2; s=1.017654 (101.7654) e=5; s=1.234567 + e=5; s=0.001017654 (after shifting) -------------------- e=5; s=1.235584654 (true sum: 123558.4654) This is the true result, the exact sum of the operands. It will be rounded to 7 digits and then normalized if necessary. The final result is: e=5; s=1.235585 (final sum: 123558.5) Note that the low 3 digits of the second operand (654) are essentially lost. This is
round-off error A roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are d ...
. In extreme cases, the sum of two non-zero numbers may be equal to one of them: e=5; s=1.234567 + e=−3; s=9.876543 e=5; s=1.234567 + e=5; s=0.00000009876543 (after shifting) ---------------------- e=5; s=1.23456709876543 (true sum) e=5; s=1.234567 (after rounding/normalization) Another problem of loss of significance occurs when ''approximations'' to two nearly equal numbers are subtracted. In the following example ''e'' = 5; ''s'' = 1.234571 and ''e'' = 5; ''s'' = 1.234567 are approximations to the rationals 123457.1467 and 123456.659. e=5; s=1.234571 − e=5; s=1.234567 ---------------- e=5; s=0.000004 e=−1; s=4.000000 (after rounding and normalization) The floating-point difference is computed exactly because the numbers are close—the
Sterbenz lemma In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974. The Sterbe ...
guarantees this, even in case of underflow when
gradual underflow In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal ...
is supported. Despite this, the difference of the original numbers is ''e'' = −1; ''s'' = 4.877000, which differs more than 20% from the difference ''e'' = −1; ''s'' = 4.000000 of the approximations. In extreme cases, all significant digits of precision can be lost. This '' cancellation'' illustrates the danger in assuming that all of the digits of a computed result are meaningful. Dealing with the consequences of these errors is a topic in
numerical analysis Numerical analysis is the study of algorithms that use numerical approximation (as opposed to symbolic manipulations) for the problems of mathematical analysis (as distinguished from discrete mathematics). It is the study of numerical methods ...
; see also Accuracy problems.


Multiplication

To multiply, the significands are multiplied, while the exponents are added, and the result is rounded and normalized. e=3; s=4.734612 × e=5; s=5.417242 ----------------------- e=8; s=25.648538980104 (true product) e=8; s=25.64854 (after rounding) e=9; s=2.564854 (after normalization) Division is done similarly, but that is more complicated. There are no cancellation or absorption problems with multiplication or division, though small errors may accumulate as operations are performed repeatedly. In practice, the way these operations are carried out in digital logic can be quite complex.


See also

*
Binary-coded decimal In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used ...
(BCD)


References


Further reading


Decimal Floating-Point: Algorism for Computers
Proceedings of the 16th IEEE Symposium on Computer Arithmetic ( Cowlishaw, Mike F., 2003) {{DEFAULTSORT:Decimal Floating Point Computer arithmetic Floating point 10 (number)