The 801 was an experimental

central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

(CPU) design developed by

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

during the 1970s. It is considered to be the first modern

RISC In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...

design, relying on

processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...

s for all computations and eliminating the many variant

addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...

s found in CISC designs. Originally developed as the processor for a

telephone switch A telephone exchange, telephone switch, or central office is a central component of a telecommunications system in the public switched telephone network (PSTN) or in large enterprises. It facilitates the establishment of communication circuits ...

, it was later used as the basis for a

minicomputer A minicomputer, or colloquially mini, is a type of general-purpose computer mostly developed from the mid-1960s, built significantly smaller and sold at a much lower price than mainframe computers . By 21st century-standards however, a mini is ...

and a number of products for their

mainframe A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...

line. The initial design was a

24-bit Notable 24-bit machines include the CDC 924 – a 24-bit version of the CDC 1604, CDC lower 3000 series, SDS 930 and SDS 940, the ICT 1900 series, the Elliott 4100 series, and the Datacraft minicomputers/ Harris H series. The term SWORD ...

processor; that was soon replaced by

32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in a maximum of 32- bit units. Compared to smaller bit widths, 32-bit computers can perform la ...

implementations of the same concepts and the original 24-bit 801 was used only into the early 1980s. During the initial design the system was considered as a simple processor with limited functionality that would not compete with IBM's more complex systems like the System/370. But as they explored the concept using huge amounts of performance data collected from their customers, the team was able to demonstrate that the simple design was able to easily outperform even the most powerful classic CPU designs. Applying these same techniques to existing machines like the S/370, that is, ignoring the many complex

opcode In computing, an opcode (abbreviated from operation code) is an enumerated value that specifies the operation to be performed. Opcodes are employed in hardware devices such as arithmetic logic units (ALUs), central processing units (CPUs), and ...

s in favour of doing everything it could in registers, doubled the performance of those systems as well. The complexity of these older designs actually led to slower and much more costly machines, spending considerable complexity implementing codes that were better left unused. This demonstrated the value of the RISC concept, and all of IBM's future systems were based on the principles developed during the 801 project. For his work on the 801, John Cocke was recognized with several awards and medals, including the

Turing Award The ACM A. M. Turing Award is an annual prize given by the Association for Computing Machinery (ACM) for contributions of lasting and major technical importance to computer science. It is generally recognized as the highest distinction in the fi ...

in 1987, National Medal of Technology in 1991, and the

National Medal of Science The National Medal of Science is an honor bestowed by the President of the United States to individuals in science and engineering who have made important contributions to the advancement of knowledge in the fields of behavioral science, behavior ...

in 1994.

History

Original concept

In 1974, IBM began examining the possibility of constructing a

to handle one million calls an hour, or about 300 calls per second. They calculated that each call would require 20,000 instructions to complete, and when timing overhead and other considerations were added, such a machine required performance of about 12 MIPS. This would require a significant advance in performance; their current top-of-the-line machine, the IBM System/370 Model 168 of late 1972, offered about 3 MIPS. The group working on this project at the Thomas J. Watson Research Center, including John Cocke, designed a processor for this purpose. To reach the required performance, they considered the sort of operations such a machine required and removed any that were not appropriate. This led to the removal of a

floating-point unit A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multip ...

for instance, which would not be needed in this application. More critically, they also removed many of the instructions that worked on data in

main memory Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processin ...

and left only those instructions that worked on the internal

s, as these were much faster to use and the simple code in a telephone switch could be written to use only these types of instructions. The result of this work was a conceptual design for a simplified processor with the required performance. The telephone switch project was canceled in 1975, but the team had made considerable progress on the concept and in October IBM decided to continue it as a general-purpose design. With no obvious project to attach it to, the team decided to call it the "801" after the building they worked in. For the general-purpose role, as opposed to the dedicated telephone system, the team began to consider real-world programs that would be run on a typical

. IBM had collected enormous amounts of statistical data on the performance of real-world workloads on their machines and this data demonstrated that over half the time in a typical program was spent performing only five instructions: load value from memory, store value to memory, adding fixed-point numbers, comparing fixed-point numbers, and branching based on the result of those comparisons. This suggested that the same simplified processor design would work just as well for a general-purpose minicomputer as a special-purpose switch.

Rationale against use of microcode

This conclusion flew in the face of contemporary processor design, which was based on the concept of using

microcode In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. It consists of a set of hardware-level instructions ...

. IBM had been among the first to make widespread use of this technique as part of their

System/360 The IBM System/360 (S/360) is a family of mainframe computer systems announced by IBM on April 7, 1964, and delivered between 1965 and 1978. System/360 was the first family of computers designed to cover both commercial and scientific applicati ...

series. The 360s, and later the 370s, came in a variety of performance levels that all ran the same

machine language In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

code. On the high-end machines, many of these instructions were implemented directly in hardware, like a floating point unit, while low-end machines could instead simulate those instructions using a sequence of other instructions encoded in microcode. This allowed a single

application binary interface An application binary interface (ABI) is an interface exposed by software that is defined for in-process machine code access. Often, the exposing software is a library, and the consumer is a program. An ABI is at a relatively low-level of a ...

to run across the entire lineup and allowed the customers to feel confident that if more performance was ever needed they could move up to a faster machine without any other changes. Microcode allowed a simple processor to offer many instructions, which had been used by the designers to implement a wide variety of

s. For instance, an instruction like ADD might have a dozen variations, one that adds two numbers in internal registers, one that adds a register to a value in memory, one that adds two values from memory, etc. This allowed the programmer to select the exact version they needed for any particular task. The processor would read that instruction and use microcode to break it into a series of internal instructions. For instance, adding two numbers in memory might be implemented by loading those two numbers from memory into registers, adding them, and then storing the sum back to memory. The idea of offering all possible addressing modes for all instructions became a goal of processor designers, the concept becoming known as an

orthogonal instruction set In computer engineering, an orthogonal instruction set is an instruction set architecture where all instruction types can use all addressing modes. It is "Orthogonality, orthogonal" in the sense that the instruction type and the addressing mode ma ...

. The 801 team noticed a side-effect of this concept; when faced with the plethora of possible versions of a given instruction,

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

authors would usually pick a single version. This was typically the one that was implemented in hardware on the low-end machines. That ensured that the machine code generated by the compiler would run as fast as possible on the entire lineup. While using other versions of instructions might run even faster on a machine that implemented them in hardware, the complexity of knowing which one to pick on an ever-changing list of machines made this extremely unattractive, and compiler authors largely ignored these possibilities. As a result, the majority of the instructions available in the instruction set were never used in compiled programs. And it was here that the team made the key realization of the 801 project:

Imposing microcode between a computer and its users imposes an expensive overhead in performing the most frequently executed instructions.

Microcode takes a non-zero time to examine the instruction before it is performed. The same underlying processor with the microcode removed would eliminate this overhead and run those instructions faster. Since microcode essentially ran small

subroutine In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a ...

s dedicated to a particular hardware implementation, it was ultimately performing the same basic task that the compiler was, implementing higher-level instructions as a sequence of machine-specific instructions. Simply removing the microcode and implementing that in the compiler could result in a faster machine. One concern was that programs written for such a machine would take up more memory; some tasks that could be accomplished with a single instruction on the 370 would have to be expressed as multiple instructions on the 801. For instance, adding two numbers from memory would require two load-to-register instructions, a register-to-register add, and then a store-to-memory. This could potentially slow the system overall if it had to spend more time reading instructions from memory than it formerly took to decode them. As they continued work on the design and improved their compilers, they found that overall program length continued to fall, eventually becoming roughly the same length as those written for the 370.

First implementations

The initially proposed architecture was a machine with sixteen

registers and without

virtual memory In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...

. It used a two-operand format in the instruction, so that instructions were generally of the form A = A + B, as opposed to the three-operand format, A = B + C. The resulting CPU was operational by the summer of 1980 and was implemented using Motorola MECL-10K discrete component technology on large wire-wrapped custom boards. The CPU was clocked at 66 ns cycles (approximately 15.15 MHz) and could compute at the fast speed of approximately 15 MIPS. The 801 architecture was used in a variety of IBM devices, including

channel controller In computing, channel I/O is a high-performance input/output (I/O) architecture that is implemented in various forms on a number of computer architectures, especially on mainframe computers. In the past, channels were generally implemented with cu ...

s for their S/370 mainframes (such as the IBM 3090), various networking devices, and as a vertical microcode execution unit in the 9373 and 9375 processors of the IBM 9370 mainframe family. The original version of the 801 architecture was the basis for the architecture of the IBM ROMP

microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...

used in the IBM RT PC workstation computer and several experimental computers from

IBM Research IBM Research is the research and development division for IBM, an American Multinational corporation, multinational information technology company. IBM Research is headquartered at the Thomas J. Watson Research Center in Yorktown Heights, New York ...

. A derivative of the 801 architecture with

addressing named ''Iliad'' was intended to serve as the primary processor of the unsuccessful

Fort Knox Fort Knox is a United States Army installation in Kentucky, south of Louisville and north of Elizabethtown, Kentucky, Elizabethtown. It is adjacent to the United States Bullion Depository (also known as Fort Knox), which is used to house a larg ...

midrange system project.

Later modifications

Having been originally designed for a limited-function system, the 801 design lacked a number of features seen on larger machines. Notable among these was the lack of hardware support for

, which was not needed for the controller role and had been implemented in software on early 801 systems that needed it. For more widespread use, hardware support was a must-have feature. Additionally, by the 1980s the computer world as a whole was moving towards

systems, and there was a desire to do the same with the 801. Moving to a 32-bit format had another significant advantage. In practice, it was found that the two-operand format was difficult to use in typical math code. Ideally, both input operands would remain in registers where they could be reused in subsequent operations. In the two-operand format, one of the two values was overwritten with the result, and it was often the case that one of the values had to be re-loaded from memory. By moving to a 32-bit format, the extra bits in the instruction words allowed an additional register to be specified, so that the output of such operations could be directed to a separate register. The larger instruction word also allowed the number of registers to be increased from sixteen to thirty-two, a change that had been obvious from the examination of 801 code. Despite the expansion of the instruction words from 24 to 32-bits, programs did not grow by the corresponding 33% due to avoided loads and saves due to these two changes. Other desirable additions include instructions for working with string data that was encoded in "packed" format with several characters in a single memory word, and additions for working with binary-coded decimal, including an adder that could carry across four-bit decimal numbers. When the new version of the 801 was run as a simulator on the 370, the team was surprised to find that code compiled to the 801 and run in the simulator would often run faster than the same

source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...

compiled directly to 370

machine code In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...

using the 370's

PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...

compiler. When they ported their experimental "PL.8" language back to the 370 and compiled applications using it, those applications ran as much as three times as fast as the PL/I versions. This was due to the compiler making RISC-like decisions about how the generated code uses the processor registers, thereby optimizing out as many memory accesses as possible. These were just as expensive on the 370 as the 801, but this cost was normally hidden by the simplicity of a single line of CISC code. The PL.8 compiler was much more aggressive about avoiding loads and saves, thereby resulting in higher performance even on a CISC processor.

The Cheetah, Panther, and America projects

In the early 1980s, the lessons learned on the 801 were combined with those from the IBM Advanced Computer Systems project, resulting in an experimental processor called "Cheetah". Cheetah was a 2-way

superscalar processor A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single ins ...

, which evolved into a processor called "Panther" in 1985, and finally into a 4-way superscalar design called "America" in 1986. This was a three-chip processor set including an instruction processor that fetches and decodes instructions, a fixed-point processor that shares duty with the instruction processor, and a floating-point processor for those systems that require it. Designed by the 801 team, the final design was sent to IBM's Austin office in 1986, where it was developed into the IBM RS/6000 system. The RS/6000 running at 25 MHz was one of the fastest machines of its era. It outperformed other RISC machines by two to three times on common tests, and easily outperformed older CISC systems. After the RS/6000, the company turned its attention to a version of the 801 concepts that could be efficiently fabricated at various scales. The result was the IBM POWER instruction set architecture and the

PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...

offshoot.

Recognition

For his work on the 801, John Cocke was awarded several awards and medals: * 1985: Eckert–Mauchly Award * 1987: A.M. Turing Award * 1989: Computer Pioneer Award * 1991: National Medal of Technology * 1994: IEEE John von Neumann Medal * 1994:

* 2000: Benjamin Franklin Medal (The Franklin Institute) Michael J. Flynn views the 801 as the first RISC.

References

Citations

Bibliography

* * *

External links

The 801 Minicomputer - An OverviewIBM System 801 Principles of Operation, Version 2801 I/O Subsystem Definition
*IBM Archives: {{IBM midrange computers 801 Minicomputers