Thomas Nicely
   HOME

TheInfoList



OR:

The Pentium FDIV bug is a
hardware bug A hardware bug is a bug in computer hardware. It is the hardware counterpart of software bug, a defect in software. A bug is different from a glitch which describes an undesirable behavior as more quick, transient and repeated than constant, and ...
affecting the
floating-point unit A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multip ...
(FPU) of the early Intel Pentium processors. Because of the bug, the processor would return incorrect binary
floating point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base. Numbers of this form ...
results when dividing certain pairs of high-precision numbers. The bug was discovered in 1994 by Thomas R. Nicely, a professor of mathematics at
Lynchburg College The University of Lynchburg, formerly Lynchburg College, is a private university associated with the Christian Church (Disciples of Christ) and located in Lynchburg, Virginia, United States. It has approximately 2,800 undergraduate and graduate ...
. Missing values in a lookup table used by the FPU's floating-point division algorithm led to calculations acquiring small errors. In certain circumstances the errors can occur frequently and lead to significant deviations. The severity of the FDIV bug is debated. Though rarely encountered by most users (''
Byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
'' magazine estimated that 1 in 9 billion floating point divides with random parameters would produce inaccurate results), both the flaw and Intel's initial handling of the matter were heavily criticized by the tech community. In December 1994, Intel recalled the defective processors in what was the first full recall of a computer chip. In its 1994 annual report, Intel said it incurred "a $475 million pre-tax charge ... to recover replacement and write-off of these microprocessors."


Description

In order to improve the speed of floating-point division calculations on the Pentium chip over the
486DX The Intel 486, officially named i486 and also known as 80486, is a microprocessor introduced in 1989. It is a higher-performance follow-up to the i386, Intel 386. It represents the fourth generation of binary compatible CPUs following the Inte ...
, Intel opted to replace the shift-and-subtract division algorithm with the Sweeney, Robertson, and Tocher (SRT) algorithm. The SRT algorithm can generate two bits of the division result per
clock cycle In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') is an electronic logic signal (voltage or current) which oscillates between a high and a low state at a constant frequency and ...
, whereas the 486's algorithm could only generate one. It is implemented using a
programmable logic array A programmable logic array (PLA) is a kind of programmable logic device used to implement combinational logic circuits. The PLA has a set of programmable AND gate planes, which link to a set of programmable OR gate planes, which can then be c ...
with 2,048 cells, of which 1,066 cells should have been populated with one of five values: . When the original array for the Pentium was compiled, five values were not correctly sent to the equipment that etches the arrays into the chips – thus five of the array cells contained zero when they should have contained +2. As a result, calculations that rely on these five cells acquire errors; these errors can accumulate repeatedly owing to the
recursive Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...
nature of the SRT algorithm. In pathological cases the error can reach the fourth significant digit of the result, although this is rare. The error is usually confined to the ninth or tenth significant digit. Only certain combinations of numerator and denominator trigger the bug. One commonly-reported example is dividing 4,195,835 by 3,145,727. Performing this calculation in any software that used the floating-point coprocessor, such as
Windows Calculator Windows Calculator is a software calculator developed by Microsoft and included in Windows. In its Windows 10 incarnation it has four modes: standard, scientific, programmer, and a graphing mode. The standard mode includes a number pad and butto ...
, would allow users to discover whether their Pentium chip was affected. The correct value of the calculation is: When converted to the hexadecimal value used by the processor, 4,195,835 = 0x4005FB and 3,145,727 = 0x2FFFFF. The "5" in 0x4005FB triggers the access to the "empty" array cells. As a result, the value returned by a flawed Pentium processor is incorrect at or beyond four digits: which is actually the value of \textstyle \dfrac = \dfrac.


Discovery and response

Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerate
primes A prime number (or a prime) is a natural number greater than 1 that is not a product of two smaller natural numbers. A natural number greater than 1 that is not prime is called a composite number. For example, 5 is prime because the only ways ...
,
twin prime A twin prime is a prime number that is either 2 less or 2 more than another prime number—for example, either member of the twin prime pair or In other words, a twin prime is a prime that has a prime gap of two. Sometimes the term ''twin prime' ...
s,
prime triplet In number theory, a prime triplet is a set of three prime numbers in which the smallest and largest of the three differ by 6. In particular, the sets must have the form or . With the exceptions of and , this is the closest possible grouping of ...
s, and
prime quadruplet In number theory, a prime quadruplet (sometimes called a prime quadruple) is a set of four prime numbers of the form This represents the closest possible grouping of four primes larger than 3, and is the only prime constellation of length 4. P ...
s. Nicely noticed some inconsistencies in the calculations on June 13, 1994, shortly after adding a Pentium system to his group of computers, but was unable to eliminate other factors (such as programming errors,
motherboard A motherboard, also called a mainboard, a system board, a logic board, and informally a mobo (see #Nomenclature, "Nomenclature" section), is the main printed circuit board (PCB) in general-purpose computers and other expandable systems. It ho ...
chipsets, etc.) until October 19, 1994. On October 24, 1994, he reported the issue to Intel. Intel had reportedly become aware of the issue independently by June 1994, and had begun fixing it at this point, but chose not to publicly disclose any details or recall affected CPUs. On October 30, 1994, Nicely sent an email describing the bug to various academic contacts, requesting reports of testing for the flaw on 486-DX4s, Pentiums and Pentium clones. The bug was quickly verified by others, and news of it spread quickly on the
Internet The Internet (or internet) is the Global network, global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a internetworking, network of networks ...
. The bug acquired the name "Pentium FDIV bug" from the x86 assembly language mnemonic for floating-point division, the most frequently used instruction affected. The story first appeared in the press on November 7, 1994, in an article in ''
Electronic Engineering Times ''EE Times'' (''Electronic Engineering Times'') is an electronics industry magazine published in the United States since 1972. EE Times is currently owned by AspenCore, a division of Arrow Electronics since August 2016. Ownership and status '' ...
'', "Intel fixes a Pentium FPU glitch" by Alexander Wolfe, and was subsequently picked up by
CNN Cable News Network (CNN) is a multinational news organization operating, most notably, a website and a TV channel headquartered in Atlanta. Founded in 1980 by American media proprietor Ted Turner and Reese Schonfeld as a 24-hour cable ne ...
in a segment aired on November 22, 1994. It was also reported on by the ''New York Times'' and the ''Boston Globe'', making the front page in the latter. At this point, Intel acknowledged the floating-point flaw, but claimed that it was not serious and would not affect most users. Intel offered to replace processors to users who could prove that they were affected. However, although most independent estimates found that the bug would have a very limited impact on most users, it caused significant negative press for the company. During a 2019 talk, while reflecting on development of '' Quake'',
John Romero Alfonso John Romero (born October 28, 1967) is an American video game developer. He co-founded id Software and designed their early games, including ''Wolfenstein 3D'' (1992), ''Doom (1993 video game), Doom'' (1993), ''Doom II'' (1994), ''Hexen ...
described how frequently and persistently this bug could be reproduced by
Michael Abrash Michael Abrash is an American programmer and technical writer. He has written dozens of magazine articles and multiple books on code optimization and software-rendered graphics for IBM PC compatibles. He worked at id Software in the mid-1990s on ...
. Abrash spent hours tracking down exact conditions needed to produce the bug, which would result in parts of a game level appearing unexpectedly when viewed from certain camera angles.
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
paused the sale of PCs containing Intel CPUs, and Intel's stock price decreased significantly. The motive behind IBM's decision was questioned by some in the industry; IBM produced the
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
CPUs at the time, and potentially stood to benefit from any reputational damage to the Pentium or Intel as a company. However, the decision led to corporate buyers of PC equipment demanding replacements of existing Pentium CPUs, and soon afterwards other PC manufacturers began offering "no questions asked" replacements of flawed Pentium chips. The growing dissatisfaction with Intel's response led to the company offering to replace all flawed Pentium processors on request on December 20. On January 17, 1995, Intel announced a pre-tax charge of $475 million against earnings, ostensibly the total cost associated with replacement of the flawed processors. This is equivalent to $ in . Intel was criticised for barring resellers and OEMs from participating in the recall program, requiring end-users to replace chips themselves. Intel's justification for this, posted on its support web page, was that "it is the individual decision of the end user to determine if the flaw is affecting their application accuracy". A 1995 article in ''
Science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
'' describes the value of number theory problems in discovering computer bugs and gives the mathematical background and history of
Brun's constant In number theory, Brun's theorem states that the sum of the reciprocals of the twin primes (pairs of prime numbers which differ by 2) converges to a finite value known as Brun's constant, usually denoted by ''B''2 . Brun's theorem was proved by ...
, the problem Nicely was working on when he discovered the bug. Intel's response to the FDIV bug has been cited as a case of the
public relations Public relations (PR) is the practice of managing and disseminating information from an individual or an organization (such as a business, government agency, or a nonprofit organization) to the public in order to influence their perception. Pu ...
impact of a problem eclipsing the practical impact of said problem on customers. While most users were unlikely to encounter the flaw in their day-to-day computing, the company's initial reaction to not replace chips unless customers could guarantee they were affected caused pushback from a vocal minority of industry experts. The subsequent publicity generated shook consumer confidence in the CPUs, and led to a demand for action even from people unlikely to be affected by the issue.
Andy Grove Andrew "Andy" Stephen Grove (born Gróf András István; 2 September 1936 – 21 March 2016) was a Hungarian-American businessman and engineer who served as the third CEO of Intel Corporation. He escaped from the Hungarian People's Republic dur ...
, Intel's CEO at the time was quoted in ''The Wall Street Journal'' as saying "I think the kernel of the issue we missed ... was that we presumed to tell somebody what they should or shouldn't worry about, or should or shouldn't do". In the aftermath of the bug and subsequent recall, there was a marked increase in the use of
formal verification In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics. Formal ver ...
of hardware floating point operations across the semiconductor industry. Prompted by the discovery of the bug, a technique applicable to the SRT algorithm called "word-level model checking" was developed in 1996. Intel went on to use formal verification extensively in the development of later CPU architectures. In the development of the
Pentium 4 Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 20 ...
, symbolic trajectory evaluation and theorem proving were used to find a number of bugs that could have led to a similar recall incident had they gone undetected. The first Intel microarchitecture to use formal verification as the primary method of validation was Nehalem, developed in 2008.


Affected models

The FDIV bug affects the 60 and 66 MHz Pentium P5 800 in
stepping level In integrated circuits, the stepping level or revision level is a version number that refers to the introduction or revision of one or more photolithographic photomasks within the set of photomasks that is used to pattern an integrated circuit. ...
s prior to D1, and the 75, 90, and 100 MHz Pentium P54C 600 in steppings prior to B5. The 120 MHz P54C and P54CQS CPUs are unaffected.


Software patches

Various
software patch Software consists of computer programs that instruct the execution of a computer. Software also includes design documents and specifications. The history of software is closely tied to the development of digital computers in the mid-20th cen ...
es were produced by manufacturers to work around the bug. One specific algorithm, outlined in a paper in ''IEEE Computational Science & Engineering'', is to check for divisors that can trigger the access to the programmable logic array cells that erroneously contain zero, and if found, multiply both numerator and denominator by 15/16. This takes them out of the 'buggy' range. This fix does carry a measurable speed penalty - worst case for a program doing nothing but FDIV operations with bad divisors the running time would double since each FDIV would take about 80 instead of 40 clock cycles. With more random divisors the average time per FDIV was approximately 50 clock cycles, i.e. 10 cycles added to check the divisor: Only 5 out of 1024 random divisors would trigger the scaling fixup. Since FDIV is a rare operation in most programs, the normal slowdown with the fix installed was typically a percent or less. The main challenge faced by software companies was implementing the fix in pre-existing software, much of which relied on
libraries A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
outside their control. Some companies, such as
Wolfram Research Wolfram Research, Inc. ( ) is an American Multinational corporation, multinational company that creates computational technology. Wolfram's flagship product is the technical computing program Wolfram Mathematica, first released on June 23, 1988. ...
, opted to directly patch the
machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
of existing executables to replace the FDIV opcode with an illegal instruction. This would then trigger an exception that an exception handler (also patched in) would catch. From here, arbitrary code could be executed to work around the bug. Microsoft offered operating system level workarounds in versions of
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
up to Windows XP. Utilities were included with the operating system to check for the presence of the bug and disable the FPU if found.


See also

* Pentium F00F bug * MOS Technology 6502 bugs and quirks * Accuracy problems in floating point operations * MaverickCrunch


References


External links


Personal website of Dr. Nicely, who discovered the bug

ZIP-file containing more details
(See
ZIP file format ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is t ...
for details on the file)
Archive of Intel's official information page about the bugUnopened Intel CPU box from the FDIV replacement program
{{DEFAULTSORT:Pentium Fdiv Bug X86 architecture Hardware bugs 1994 in computing Product recalls