The bfloat16 (brain floating point)

floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...

format is a

computer number format A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The ...

occupying 16 bits in

computer memory Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...

; it represents a wide

dynamic range Dynamics (from Greek δυναμικός ''dynamikos'' "powerful", from δύναμις ''dynamis'' " power") or dynamic may refer to: Physics and engineering * Dynamics (mechanics), the study of forces and their effect on motion Brands and ent ...

of numeric values by using a floating radix point. This format is a shortened (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of

accelerating In mechanics, acceleration is the rate of change of the velocity of an object with respect to time. Acceleration is one of several components of kinematics, the study of motion. Accelerations are vector quantities (in that they have magnit ...

machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...

and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit

significand The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...

of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use. Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms. The bfloat16 format was developed by

Google Brain Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence ...

, an artificial intelligence research group at Google. It is utilized in many CPUs, GPUs, and AI processors, such as Intel

Xeon Xeon (; ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same archite ...

processors (

AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200 (Knights Landing), and then ...

BF16 extensions), Intel Data Center GPU, Intel Nervana NNP-L1000, Intel

FPGA A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of a ...

AMD Zen Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a Information technology, hardware and F ...

AMD Instinct AMD Instinct is AMD's brand of data center Graphics processing unit, GPUs. It replaced AMD's AMD FirePro, FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to acce ...

, NVIDIA GPUs, Google Cloud TPUs, AWS Inferentia, AWS Trainium, ARMv8.6-A, and Apple's M2 and therefore A15 chips and later. Many libraries support bfloat16, such as

CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...

, Intel oneAPI Math Kernel Library, AMD ROCm, AMD Optimizing CPU Libraries,

PyTorch PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the mo ...

, and

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

. On these platforms, bfloat16 may also be used in

mixed-precision arithmetic Mixed-precision arithmetic is a form of floating-point arithmetic that uses numbers with varying widths in a single operation. Overview A common usage of mixed-precision arithmetic is for operating on inaccurate numbers with a small width and expan ...

, where bfloat16 numbers may be operated on and expanded to wider data types.

bfloat16 floating-point format

bfloat16 has the following format: *

Sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...

: 1 bit *

Exponent In mathematics, exponentiation, denoted , is an operation involving two numbers: the ''base'', , and the ''exponent'' or ''power'', . When is a positive integer, exponentiation corresponds to repeated multiplication of the base: that is, i ...

width: 8 bits *

Significand The significand (also coefficient, sometimes argument, or more ambiguously mantissa, fraction, or characteristic) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its s ...

precision Precision, precise or precisely may refer to: Arts and media * ''Precision'' (march), the official marching music of the Royal Military College of Canada * "Precision" (song), by Big Sean * ''Precisely'' (sketch), a dramatic sketch by the Eng ...

: 8 bits (7 explicitly stored, with an

implicit leading bit In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base. Numbers of this form ...

), as opposed to 24 bits in a classical single-precision floating-point format The bfloat16 format, being a shortened IEEE 754 single-precision 32-bit float, allows for fast

conversion Conversion or convert may refer to: Arts, entertainment, and media * ''The Convert'', a 2023 film produced by Jump Film & Television and Brouhaha Entertainment * "Conversion" (''Doctor Who'' audio), an episode of the audio drama ''Cyberman'' * ...

to and from an IEEE 754 single-precision 32-bit float; in conversion to the bfloat16 format, the exponent bits are preserved while the significand field can be reduced by truncation (thus corresponding to round toward 0) or other rounding mechanisms, ignoring the NaN special case. Preserving the exponent bits maintains the 32-bit float's range of ≈ 10⁻³⁸ to ≈ 3 × 10³⁸. The bits are laid out as follows:

Exponent encoding

The bfloat16 binary floating-point exponent is encoded using an

offset-binary Offset binary, also referred to as excess-K, excess-''N'', excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned numbe ...

representation, with the zero offset being 127; also known as exponent bias in the IEEE 754 standard. * E_min = 01_H−7F_H = −126 * E_max = FE_H−7F_H = 127 *

Exponent bias In IEEE 754 floating-point numbers, the exponent is biased in the engineering sense of the word – the value stored is offset from the actual value by the exponent bias, also called a biased exponent. Biasing is done because exponents have to be ...

= 7F_H = 127 Thus, in order to get the true exponent as defined by the offset-binary representation, the offset of 127 has to be subtracted from the value of the exponent field. The minimum and maximum values of the exponent field (00_H and FF_H) are interpreted specially, like in the IEEE 754 standard formats. The minimum positive normal value is 2⁻¹²⁶ ≈ 1.18 × 10⁻³⁸ and the minimum positive (subnormal) value is 2⁻¹²⁶⁻⁷ = 2⁻¹³³ ≈ 9.2 × 10⁻⁴¹.

Rounding and conversion

The most common use case is the conversion between IEEE 754 binary32 and bfloat16. The following section describes the conversion process and its rounding scheme in the conversion. Note that there are other possible scenarios of format conversions to or from bfloat16. For example, int16 and bfloat16. * From binary32 to bfloat16. When bfloat16 was first introduced as a storage format, the conversion from IEEE 754 binary32 (32-bit floating point) to bfloat16 is truncation ( round toward 0). Later on, when it becomes the input of matrix multiplication units, the conversion can have various rounding mechanisms depending on the hardware platforms. For example, for Google TPU, the rounding scheme in the conversion is round-to-nearest-even; ARM uses the non-IEEE Round-to-Odd mode; for NVIDIA, it supports converting float number to bfloat16 precision in round-to-nearest-even mode. * From bfloat16 to binary32. Since binary32 can represent all exact values in bfloat16, the conversion simply pads 16 zeros in the significand bits.

Encoding of special values

Positive and negative infinity

Just as in

IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...

, positive and negative infinity are represented with their corresponding

sign bit In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, so the term ...

s, all 8 exponent bits set (FF_hex) and all significand bits zero. Explicitly, val s_exponent_signcnd +inf = 0_11111111_0000000 -inf = 1_11111111_0000000

Not a Number

Just as in

, NaN values are represented with either sign bit, all 8 exponent bits set (FF_hex) and not all significand bits zero. Explicitly, val s_exponent_signcnd +NaN = 0_11111111_klmnopq -NaN = 1_11111111_klmnopq where at least one of ''k, l, m, n, o, p,'' or ''q'' is 1. As with IEEE 754, NaN values can be quiet or signaling, although there are no known uses of signaling bfloat16 NaNs as of September 2018.

Range and precision

Bfloat16 is designed to maintain the number range from the 32-bit IEEE 754 single-precision floating-point format (binary32), while reducing the precision from 24 bits to 8 bits. This means that the precision is between two and three decimal digits, and bfloat16 can represent finite values up to about 3.4 × 10³⁸.

Examples

These examples are given in bit ''representation'', in

hexadecimal Hexadecimal (also known as base-16 or simply hex) is a Numeral system#Positional systems in detail, positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbo ...

and

binary Binary may refer to: Science and technology Mathematics * Binary number, a representation of numbers using only two values (0 and 1) for each digit * Binary function, a function that takes two arguments * Binary operation, a mathematical op ...

, of the floating-point value. This includes the sign, (biased) exponent, and significand. 3f80 = 0 01111111 0000000 = 1 c000 = 1 10000000 0000000 = −2 7f7f = 0 11111110 1111111 = (2⁸ − 1) × 2⁻⁷ × 2¹²⁷ ≈ 3.38953139 × 10³⁸ (max finite positive value in bfloat16 precision) 0080 = 0 00000001 0000000 = 2⁻¹²⁶ ≈ 1.175494351 × 10⁻³⁸ (min normalized positive value in bfloat16 precision and single-precision floating point) The maximum positive finite value of a normal bfloat16 number is 3.38953139 × 10³⁸, slightly below (2²⁴ − 1) × 2⁻²³ × 2¹²⁷ = 3.402823466 × 10³⁸, the max finite positive value representable in single precision.

Zeros and infinities

0000 = 0 00000000 0000000 = 0 8000 = 1 00000000 0000000 = −0 7f80 = 0 11111111 0000000 = infinity ff80 = 1 11111111 0000000 = −infinity

Special values

4049 = 0 10000000 1001001 = 3.140625 ≈ π ( pi ) 3eab = 0 01111101 0101011 = 0.333984375 ≈ 1/3

NaNs

ffc1 = x 11111111 1000001 => qNaN ff81 = x 11111111 0000001 => sNaN

References

{{DEFAULTSORT:bfloat16 floating-point format Binary arithmetic Floating point types