The Alpha 21464 is an unfinished
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
that implements the
Alpha
Alpha (uppercase , lowercase ) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter ''aleph'' , whose name comes from the West Semitic word for ' ...
instruction set architecture
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
(ISA) developed by
Digital Equipment Corporation
Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...
and later by
Compaq
Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
after it acquired Digital. The microprocessor was also known as EV8 (codenamed Araña). Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of
Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been
taped out.
The 21464's origins began in the mid-1990s when computer scientist
Joel Emer was inspired by Dean Tullsen's research into
simultaneous multithreading
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
(SMT) at the
University of Washington
The University of Washington (UW and informally U-Dub or U Dub) is a public research university in Seattle, Washington, United States. Founded in 1861, the University of Washington is one of the oldest universities on the West Coast of the Uni ...
. Emer had researched the technology in the late 1990s and began to promote it once he was convinced of its value. Compaq made the announcement that the next Alpha microprocessor would use SMT in October 1999 at Microprocessor Forum 1999.
At that time, it was expected that systems using the Alpha 21464 would ship in 2003.
Description
The microprocessor was an eight-issue
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
design with
out-of-order execution
In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...
, four-way SMT and a deep
pipeline
A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
. It fetches 16 instructions from a 64 KB two-way
set-associative instruction cache. The branch predictor then selected the "good" instructions and entered them into a collapsing buffer. (This allowed for a fetch bandwidth of up to 16 instructions per cycle, depending on the taken branch density.) The front-end had significantly more stages than previous Alpha implementation and as a result, the 21464 had a significant minimum
branch misprediction
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
penalty of 14 cycles.
[ The microprocessor used an advanced ]branch prediction
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
algorithm to minimize these costly penalties.
Implementing SMT required the replication of certain resources such as the program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, ...
. Instead of one program counter, there were four program counters, one for each thread. However, very little logic after the front-end needed to be expanded for SMT support. The register file contained 512 entries, but its size was determined by the maximum number of in-flight instructions, not SMT. Access to the register file required three pipeline stages due to the physical size of the circuit. Up to eight instructions from four threads could be dispatched to eight integer and four floating-point execution units every cycle. The 21464 had a 64 KB data cache (Dcache), organized as eight banks to support dual-porting. This was backed by an on-die 3 MB, six-way set-associative unified secondary cache (Scache).
The integer execution unit made use of a new structure: the register cache. The register cache was not meant to mitigate the three tick register file latency (as some reports have claimed), but to reduce the complexity of operand bypass management. The register cache held all the results produced by the ALU and Load pipes for the previous N cycles. (N was something like 8.) The register cache structure was an architectural relabeling of what previous processors had implemented as a distributed mux.
The system interface was similar to that of the Alpha 21364. There were integrated memory controller
A memory controller, also known as memory chip controller (MCC) or a memory controller unit (MCU), is a digital circuit that manages the flow of data going to and from a computer's main memory. When a memory controller is integrated into anothe ...
s that provided ten RDRAM
Rambus DRAM (RDRAM), and its successors Concurrent Rambus DRAM (CRDRAM) and Direct Rambus DRAM (DRDRAM), are types of synchronous dynamic random-access memory (SDRAM) developed by Rambus from the 1990s through to the early 2000s. The third-generati ...
channels. Multiprocessing was facilitated by a router that provided links to other 21464s, and it architecturally supported 512-way multiprocessing
Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...
without glue logic.
It was to be implemented in a 0.125 μm (sometimes referred to as 0.13 μm) complementary metal–oxide–semiconductor (CMOS) process with seven layers of copper interconnect
Copper interconnects are used in integrated circuits to reduce propagation delays and power consumption. Since copper is a better conductor than aluminium, ICs using copper for their interconnects can have interconnects with narrower dimensions, ...
, partially depleted silicon-on-insulator
In semiconductor manufacturing, silicon on insulator (SOI) technology is fabrication of silicon semiconductor devices in a layered silicon–insulator–silicon substrate (materials science), substrate, to reduce parasitic capacitance within the d ...
(PD-SOI), and low-K dielectric
In electromagnetism, a dielectric (or dielectric medium) is an Insulator (electricity), electrical insulator that can be Polarisability, polarised by an applied electric field. When a dielectric material is placed in an electric field, electric ...
. The transistor count was estimated to be 250 million and die size was estimated to be 420 mm2.
Tarantula
Tarantula was the code-name for an extension of the Alpha architecture under consideration and a derivative of the Alpha 21464 that implemented the aforementioned extension. It was canceled while still in development, before any implementation work had started, and before the 21464 was finished. The extension was to provide Alpha with a vector processing capability. It specified thirty-two 64 by 128-bit (8,192-bit or 1 KB) vector registers, approximately 50 vector instructions, and an unspecified number of instructions for moving data to and from the vector registers. Other EV8 follow-up candidates included a multicore
A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...
design with two EV8 cores and a 4.0 GHz operating frequency.
Notes
References
*
*
*
*
*
Further reading
*
{{Digital Equipment Corporation
DEC microprocessors
Superscalar microprocessors