iWarp was an experimental
parallel supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
architecture developed as a joint project by
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
and
Carnegie Mellon University
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
. The project started in 1988, as a follow-up to CMU's previous
WARP research project, in order to explore building an entire parallel-computing "node" in a single
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
, complete with memory and communications links. In this respect the iWarp is very similar to the
INMOS transputer and
nCUBE.
Intel announced iWarp in 1989. The first iWarp prototype was delivered to Carnegie Mellon in summer of 1990, and in fall they received the first 64-cell production systems, followed by two more in 1991. With the creation of the Intel Supercomputing Systems Division in the summer of 1992, the iWarp was merged into the
iPSC product line. Intel kept iWarp as a product but stopped actively marketing it.
Each iWarp CPU included a
32-bit
In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in a maximum of 32- bit units. Compared to smaller bit widths, 32-bit computers can perform la ...
ALU with a
64-bit
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, a ...
FPU running at 20 MHz. It was purely scalar and completed one instruction per cycle, so the performance was 20
MIPS or 20
megaflops for
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floa ...
and 10 MFLOPS for double. The communications were handled by a separate unit on the CPU that drove four
serial channels at 40 MB/s, and included networking support in hardware that allowed for up to 20 ''virtual channels'' (similar to the system added to the INMOS T9000).
iWarp processors were combined onto boards along with memory, but unlike other systems Intel chose the faster, but more expensive,
static RAM
Static random-access memory (static RAM or SRAM) is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.
The ''static'' qualifier differ ...
for use on the iWarp. Boards typically included four CPUs and anywhere from 512 kB to 4 MB of SRAM.
Another difference in the iWarp was that the systems were connected together as a n-by-m
torus
In geometry, a torus (: tori or toruses) is a surface of revolution generated by revolving a circle in three-dimensional space one full revolution about an axis that is coplanarity, coplanar with the circle. The main types of toruses inclu ...
, instead of the more common
hypercube
In geometry, a hypercube is an ''n''-dimensional analogue of a square ( ) and a cube ( ); the special case for is known as a ''tesseract''. It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel l ...
. A typical system included 64 CPUs connected as an 8×8 torus, which could deliver 1.2
gigaflops peak.
George Cox was the lead architect of the iWarp project.
Steven McGeady (later an Intel Vice-president and witness in the
Microsoft antitrust case) wrote an innovative development environment that allowed software to be written for the array before it was completed. Each node of the array was represented by a different
Sun
The Sun is the star at the centre of the Solar System. It is a massive, nearly perfect sphere of hot plasma, heated to incandescence by nuclear fusion reactions in its core, radiating the energy from its surface mainly as visible light a ...
workstation on a
LAN, with the iWarp's unique inter-node communication protocol simulated over
sockets. Unlike the chip-level simulator, which could not simulate a multi-node array, and which ran very slowly, this environment allowed in-depth development of array software to begin.
The production compiler for iWarp was a C and Fortran compiler based on the
AT&T
AT&T Inc., an abbreviation for its predecessor's former name, the American Telephone and Telegraph Company, is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the w ...
pcc compiler for UNIX, ported under contract for Intel by the Canadian firm
HCR Corporation
Human Computing Resources Corporation, later HCR Corporation, was a Canadian software company that worked on the Unix operating system and system software and business applications for it. Founded in 1976, it was based in Toronto.
By a desc ...
and then extensively modified and extended by Intel.
[Ali-Reza Adl-Tabatabai, Thomas Gross, Guei-Yuan Lueh and James Reinders. Modeling Instruction-Level Parallelism for Software Pipelining. In Proceedings of the IFIP WG10.3 Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, Orlando, FL, pages 321-330.]
See also
*
Systolic array
In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received fro ...
Notes
External links
iWarp Project at CMU
{{Authority control
Supercomputers
Parallel computing
Massively parallel computers