In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, octuple precision is a binary
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
-based
computer number format that occupies 32
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s (256
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s) in computer memory. This 256-
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
octuple precision is for applications requiring results in higher than
quadruple precision
In computing, quadruple precision (or quad precision) is a binary floating point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit double precision.
This 128-bit quadruple precision is des ...
. This format is rarely (if ever) used and very few environments support it.
IEEE 754 octuple-precision binary floating-point format: binary256
In its 2008 revision, the
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
standard specifies a binary256 format among the ''interchange formats'' (it is not a basic format), as having:
*
Sign bit
In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the te ...
: 1 bit
*
Exponent
Exponentiation is a mathematical operation, written as , involving two numbers, the '' base'' and the ''exponent'' or ''power'' , and pronounced as " (raised) to the (power of) ". When is a positive integer, exponentiation corresponds to re ...
width: 19 bits
*
Significand
The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
precision: 237 bits (236 explicitly stored)
The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the
significand
The significand (also mantissa or coefficient, sometimes also argument, or ambiguously fraction or characteristic) is part of a number in scientific notation or in floating-point representation, consisting of its significant digits. Depending on ...
appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: ).
The bits are laid out as follows:
Exponent encoding
The octuple-precision binary floating-point exponent is encoded using an
offset binary representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard.
* E
min = −262142
* E
max = 262143
*
Exponent bias = 3FFFF
16 = 262143
Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 262143 has to be subtracted from the stored exponent.
The stored exponents 00000
16 and 7FFFF
16 are interpreted specially.
The minimum strictly positive (subnormal) value is and has a precision of only one bit.
The minimum positive normal value is 2
−262142 ≈ 2.4824 × 10
−78913.
The maximum representable value is 2
262144 − 2
261907 ≈ 1.6113 × 10
78913.
Octuple-precision examples
These examples are given in bit ''representation'', in
hexadecimal
In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, h ...
,
of the floating-point value. This includes the sign, (biased) exponent, and significand.
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16 = +0
8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16 = −0
7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16 = +infinity
ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16 = −infinity
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
16
= 2
−262142 × 2
−236 = 2
−262378
≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10
−78984
(smallest positive subnormal number)
0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff
16
= 2
−262142 × (1 − 2
−236)
≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10
−78913
(largest subnormal number)
0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16
= 2
−262142
≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10
−78913
(smallest positive normal number)
7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff
16
= 2
262143 × (2 − 2
−236)
≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 10
78913
(largest normal number)
3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff
16
= 1 − 2
−237
≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472
(largest number less than one)
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
16
= 1 (one)
3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001
16
= 1 + 2
−236
≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906
(smallest number larger than one)
By default, 1/3 rounds down like
double precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
F ...
, because of the odd number of bits in the significand.
So the bits beyond the rounding point are
0101...
which is less than 1/2 of a
unit in the last place
In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit (rightmost digit) represents if it is 1. ...
.
Implementations
Octuple precision is rarely implemented since usage of it is extremely rare.
Apple Inc. had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit
two's complement
Two's complement is a mathematical operation to reversibly convert a positive binary number into a negative binary number with equivalent (but negative) value, using the binary digit with the greatest place value (the leftmost bit in big- endian ...
significand and a 32-bit exponent.
One can use general
arbitrary-precision arithmetic
In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are l ...
libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.
Hardware support
There is no known hardware implementation of octuple precision.
See also
*
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
*
ISO/IEC 10967, Language-independent arithmetic
*
Primitive data type
In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled ...
*
Scientific notation
Scientific notation is a way of expressing numbers that are too large or too small (usually would result in a long string of digits) to be conveniently written in decimal form. It may be referred to as scientific form or standard index form, o ...
References
Further reading
*
{{data types
Binary arithmetic
Floating point types