HOME

TheInfoList



OR:

The FMA instruction set is an extension to the 128- and 256-bit
Streaming SIMD Extensions In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data ( SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in its Pentium III series of central processing units (CPU ...
instructions in the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
to perform fused multiply–add (FMA) operations. There are two variants: * FMA4 is supported in
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
processors starting with the
Bulldozer A bulldozer or dozer (also called a crawler) is a large tractor equipped with a metal #Blade, blade at the front for pushing material (soil, sand, snow, rubble, or rock) during construction work. It travels most commonly on continuous tracks, ...
architecture. FMA4 was performed in hardware before FMA3 was. Support for FMA4 has been removed since Zen 1. * FMA3 is supported in AMD processors starting with the
Piledriver Piledriver or pile driver may refer to: *Pile driver, a person trained to use the diesel hammer that drives piles into the ground for foundations and bridges *Piledriver (professional wrestling), a move used in professional wrestling Entertainme ...
architecture and
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
starting with Haswell processors and Broadwell processors since 2014.


Instructions

FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both contain fused multiply–add (FMA) instructions for
floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
scalar and
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the form ''d'' = round(''a'' · ''b'' + ''c''), where the round function performs a
rounding Rounding or rounding off is the process of adjusting a number to an approximate, more convenient value, often with a shorter or simpler representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression √2 with ...
to allow the result to fit within the destination register if there are too many significant bits to fit within the destination. The four-operand form (FMA4) allows ''a'', ''b'', ''c'' and ''d'' to be four different registers, while the three-operand form (FMA3) requires that ''d'' be the same register as ''a'', ''b'' or ''c''. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility. See
XOP instruction set The XOP (''eXtended Operations'') instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 201 ...
for more discussion of compatibility issues between Intel and AMD.


FMA3 instruction set


CPUs with FMA3

* AMD **
Piledriver Piledriver or pile driver may refer to: *Pile driver, a person trained to use the diesel hammer that drives piles into the ground for foundations and bridges *Piledriver (professional wrestling), a move used in professional wrestling Entertainme ...
(2012) and newer microarchitectures *** 2nd gen APUs, "Trinity" (32nm), May 15, 2012 *** 2nd gen "Bulldozer" (bdver2) with Piledriver cores, October 23, 2012 * Intel ** Haswell (2013) and newer processors, except
Pentium Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...
s and
Celeron Celeron is a series of IA-32 and x86-64 computer microprocessor, microprocessors targeted at low-cost Personal computer, personal computers, manufactured by Intel from 1998 until 2023. The first Celeron-branded CPU was introduced on April 15, ...
s


Excerpt from FMA3

Supported commands include ;Note: * VFNMADD is result = − a · b + c, not result = − (a · b + c). * VFNMSUB generates a −0 when all inputs are zero. Explicit order of operands is included in the mnemonic using numbers "132", "213", and "231": as well as operand format (packed or scalar) and size (single or double). This results in


FMA4 instruction set


CPUs with FMA4

* AMD ** "Heavy Equipment" processors *** Bulldozer-based processors, October 12, 2011 *** Piledriver-based processors *** Steamroller-based processors *** Excavator-based processors (including "v2") **
Zen Zen (; from Chinese: ''Chán''; in Korean: ''Sŏn'', and Vietnamese: ''Thiền'') is a Mahayana Buddhist tradition that developed in China during the Tang dynasty by blending Indian Mahayana Buddhism, particularly Yogacara and Madhyamaka phil ...
: WikiChip's testing shows FMA4 still appears to work (under the conditions of the tests) despite not being officially supported and not even reported by CPUID. This has also been confirmed by Agner Fog. But other tests gave wrong results. AMD Official Web Site FMA4 Support Note ZEN CPUs = AMD ThreadRipper 1900x, R7 Pro 1800, 1700, R5 Pro 1600, 1500, R3 Pro 1300, 1200, R3 2200G, R5 2400G. * Intel ** Intel has not released CPUs with support for FMA4.


Excerpt from FMA4


History

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows: * August 2007:
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
announces the SSE5 instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands. * April 2008:
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
announces their AVX and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEX coding scheme, which is more flexible than AMD's DREX scheme. * December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used. * May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification. * October 2011: AMD
Bulldozer A bulldozer or dozer (also called a crawler) is a large tractor equipped with a metal #Blade, blade at the front for pushing material (soil, sand, snow, rubble, or rock) during construction work. It travels most commonly on continuous tracks, ...
processor supports FMA4. * January 2012: AMD announces FMA3 support in future processors codenamed Trinity and Vishera; they are based on the
Piledriver Piledriver or pile driver may refer to: *Pile driver, a person trained to use the diesel hammer that drives piles into the ground for foundations and bridges *Piledriver (professional wrestling), a move used in professional wrestling Entertainme ...
architecture. * May 2012: AMD Piledriver processor supports both FMA3 and FMA4. * June 2013: Intel Haswell processor supports FMA3. * February 2017: The first generation of AMD
Ryzen Ryzen ( ) is a brand of multi-core x86-64 microprocessors, designed and marketed by AMD for desktop, mobile, server, and embedded platforms, based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mai ...
processors officially supports FMA3, but not FMA4 according to the
CPUID In the x86 architecture, the CPUID instruction (identified by a CPUID opcode) is a processor supplementary instruction (its name derived from " CPU Identification") allowing software to discover details of the processor. It was introduced by Int ...
instruction. There has been confusion regarding whether FMA4 was implemented or not on this processor due to errata in the initial patch to the GNU Binutils package that has since been rectified. One unconfirmed report of wrong results led to some doubt, but Mysticial (Alexander Yee, developer of y-cruncher) debunked it: FMA4 worked for bit-exact bignum calculations on his Zen 1 system for years, and the one report on Reddit never had any followup investigation to rule out mistakes in the testing software before being widely repeated. The initial Ryzen CPUs could be crashed by a particular sequence of FMA3 instructions, but updated CPU microcode fixes the problem. * July 2019: AMD Zen 2 and later Ryzen processors don't support FMA4 at all. They continue to support FMA3. Only Zen 1 and Zen+ have unofficial FMA4 support.


Compiler and assembler support

Different compilers provide different levels of support for FMA: * GCC supports FMA4 with -mfma4 since version 4.5.0 and FMA3 with -mfma since version 4.7.0. *
Microsoft Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++, C++/CLI and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available i ...
2010 SP1 supports FMA4 instructions. *
Microsoft Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++, C++/CLI and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available i ...
2012 supports FMA3 instructions (if the processor also supports AVX2 instruction set extension). *
Microsoft Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++, C++/CLI and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available i ...
since VC 2013 * PathScale supports FMA4 with -mfma. *
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
3.1 adds FMA4 support, along with preliminary FMA3 support. * Open64 5.0 adds "limited support". * Intel compilers support only FMA3 instructions. * NASM supports FMA3 instructions since version 2.03 and FMA4 instructions since 2.06. *
FASM FASM (''flat assembler'') is an assembler for x86 processors. It supports Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system (OS) portability, and macro abi ...
supports both FMA3 and FMA4 instructions.


References

{{DEFAULTSORT:Fma Instruction Set X86 instructions SIMD computing AMD technologies