Octuple-precision floating-point format
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, octuple precision is a binary
floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
-based
computer number format A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The ...
that occupies 32
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s (256
bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
s) in computer memory. This 256-
bit The bit is the most basic unit of information in computing and digital communication. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented as ...
octuple precision is for applications requiring results in higher than
quadruple precision In computing, quadruple precision (or quad precision) is a binary Floating-point arithmetic, floating-point–based computer number format that occupies 16 bytes (128 bits) with precision at least twice the 53-bit Double-precision floating-point ...
. The range greatly exceeds what is needed to describe all known physical limitations within the observable universe or precisions better than
Planck units In particle physics and physical cosmology, Planck units are a system of units of measurement defined exclusively in terms of four universal physical constants: ''Speed of light, c'', ''Gravitational constant, G'', ''Reduced Planck constant, ħ ...
.


IEEE 754 octuple-precision binary floating-point format: binary256

In its 2008 revision, the
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
standard specifies a binary256 format among the ''interchange formats'' (it is not a basic format), as having: *
Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...
: 1 bit *
Exponent In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...
width: 19 bits *
Significand The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
precision: 237 bits (236 explicitly stored) The format is written with an implicit lead bit with value 1 unless the exponent is all zeros. Thus only 236 bits of the
significand The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...
appear in the memory format, but the total precision is 237 bits (approximately 71 decimal digits: ). The bits are laid out as follows:


Exponent encoding

The octuple-precision binary floating-point exponent is encoded using an
offset binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numb ...
representation, with the zero offset being 262143; also known as exponent bias in the IEEE 754 standard. * Emin = −262142 * Emax = 262143 *
Exponent bias In IEEE 754 floating-point numbers, the exponent is biased in the engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is done because exponents have to be ...
= 3FFFF16 = 262143 Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 262143 has to be subtracted from the stored exponent. The stored exponents 0000016 and 7FFFF16 are interpreted specially. The minimum strictly positive (subnormal) value is and has a precision of only one bit. The minimum positive normal value is 2−262142 ≈ 2.4824 × 10−78913. The maximum representable value is 2262144 − 2261907 ≈ 1.6113 × 1078913.


Octuple-precision examples

These examples are given in bit ''representation'', in
hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...
, of the floating-point value. This includes the sign, (biased) exponent, and significand. 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = +0 8000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = −0 7fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = +infinity ffff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = −infinity 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000116 = 2−262142 × 2−236 = 2−262378 ≈ 2.24800708647703657297018614776265182597360918266100276294348974547709294462 × 10−78984 (smallest positive subnormal number) 0000 0fff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 2−262142 × (1 − 2−236) ≈ 2.4824279514643497882993282229138717236776877060796468692709532979137875392 × 10−78913 (largest subnormal number) 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = 2−262142 ≈ 2.48242795146434978829932822291387172367768770607964686927095329791378756168 × 10−78913 (smallest positive normal number) 7fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 2262143 × (2 − 2−236) ≈ 1.61132571748576047361957211845200501064402387454966951747637125049607182699 × 1078913 (largest normal number) 3fff efff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff16 = 1 − 2−237 ≈ 0.999999999999999999999999999999999999999999999999999999999999999999999995472 (largest number less than one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000016 = 1 (one) 3fff f000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 000116 = 1 + 2−236 ≈ 1.00000000000000000000000000000000000000000000000000000000000000000000000906 (smallest number larger than one) By default, 1/3 rounds down like
double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point arithmetic, floating-point computer number format, number format, usually occupying 64 Bit, bits in computer memory; it represents a wide range of numeri ...
, because of the odd number of bits in the significand. So the bits beyond the rounding point are 0101... which is less than 1/2 of a
unit in the last place In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the '' least significant digit'' (rightmost digit) represents if it ...
.


Implementations

Octuple precision is rarely implemented since usage of it is extremely rare.
Apple Inc. Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Comput ...
had an implementation of addition, subtraction and multiplication of octuple-precision numbers with a 224-bit
two's complement Two's complement is the most common method of representing signed (positive, negative, and zero) integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the ''greatest'' value as the ''s ...
significand and a 32-bit exponent. One can use general
arbitrary-precision arithmetic In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are po ...
libraries to obtain octuple (or higher) precision, but specialized octuple-precision implementations may achieve higher performance.


Hardware support

There is no known hardware implementation of octuple precision.


See also

*
IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...
*
ISO/IEC 10967 ISO/IEC 10967, Language independent arithmetic (LIA), is a series of standards on computer arithmetic. It is compatible with ISO/IEC/IEEE 60559:2011, more known as IEEE 754-2008, and much of the specifications are for IEEE 754 special values (tho ...
, Language-independent arithmetic *
Primitive data type In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled ...
*
Scientific notation Scientific notation is a way of expressing numbers that are too large or too small to be conveniently written in decimal form, since to do so would require writing out an inconveniently long string of digits. It may be referred to as scientif ...


References


Further reading

* {{data types Binary arithmetic Floating point types