vector
Vector most often refers to:
* Euclidean vector, a quantity with a magnitude and a direction
* Disease vector, an agent that carries and transmits an infectious pathogen into another living organism
Vector may also refer to:
Mathematics a ...
supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
,
Seymour Cray
Seymour Roger Cray (September 28, 1925 – October 5, 1996) – was an American
Cray-2
The Cray-2 is a supercomputer with four vector processors made by Cray Research starting in 1985. At 1.9 GFLOPS peak performance, it was the fastest machine in the world when it was released, replacing the Cray X-MP in that spot. It was, ...
. The system was one of the first major applications of
gallium arsenide
Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a Zincblende (crystal structure), zinc blende crystal structure.
Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monoli ...
(GaAs) semiconductors in computing, using hundreds of custom built ICs packed into a CPU. The design goal was performance around 16
GFLOPS
Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
For such cases, it is a more accurate measu ...
, about 12 times that of the Cray-2.
Work started on the Cray-3 in 1988 at
Cray Research
Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed i ...
's (CRI) development labs in Chippewa Falls, Wisconsin. Other teams at the lab were working on designs with similar performance. To focus the teams, the Cray-3 effort was moved to a new lab in
Colorado Springs, Colorado
Colorado Springs is the most populous city in El Paso County, Colorado, United States, and its county seat. The city had a population of 478,961 at the 2020 United States census, 2020 census, a 15.02% increase since 2010 United States Census, 2 ...
later that year. Shortly thereafter, the corporate headquarters in
Minneapolis
Minneapolis is a city in Hennepin County, Minnesota, United States, and its county seat. With a population of 429,954 as of the 2020 United States census, 2020 census, it is the state's List of cities in Minnesota, most populous city. Locat ...
decided to end work on the Cray-3 in favor of another design, the
Cray C90
The Cray C90 series (initially named the Y-MP C90) was a vector processor supercomputer launched by Cray Research in 1991. The C90 was a development of the Cray Y-MP architecture. Compared to the Y-MP, the C90 processor had a dual vector pipeline ...
. In 1989 the Cray-3 effort was spun off to a newly formed company, Cray Computer Corporation (CCC).
The launch customer,
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
, cancelled their order in 1991 and a number of company executives left shortly thereafter. The first machine was finally ready in 1993, but with no launch customer, it was instead loaned as a demonstration unit to the nearby
National Center for Atmospheric Research
The US National Center for Atmospheric Research (NCAR ) is a US federally funded research and development center (FFRDC) managed by the nonprofit University Corporation for Atmospheric Research (UCAR) and funded by the National Science Foundat ...
in
Boulder
In geology, a boulder (or rarely bowlder) is a rock fragment with size greater than in diameter. Smaller pieces are called cobbles and pebbles. While a boulder may be small enough to move or roll manually, others are extremely massive. In ...
. The company went bankrupt in May 1995, and the machine was officially decommissioned.
With the delivery of the first Cray-3,
Seymour Cray
Seymour Roger Cray (September 28, 1925 – October 5, 1996) – was an American
Cray-4
The Cray-4 was intended to be Cray Computer Corporation's successor to the failed Cray-3 supercomputer. It was marketed to compete with the T90 from Cray Research. CCC went bankrupt in 1995 before any Cray-4 had been delivered.
Design
The earlie ...
design, but the company went bankrupt before it was completely tested. The Cray-3 was Cray's last completed design; with CCC's bankruptcy, he formed SRC Computers to concentrate on parallel designs, but died in a car accident in 1996 before this work was delivered.
History
Background
Seymour Cray began the design of the Cray-3 in 1985, as soon as the
Cray-2
The Cray-2 is a supercomputer with four vector processors made by Cray Research starting in 1985. At 1.9 GFLOPS peak performance, it was the fastest machine in the world when it was released, replacing the Cray X-MP in that spot. It was, ...
reached production. Cray generally set himself the goal of producing new machines with ten times the performance of the previous models. Although the machines did not always meet this goal, this was a useful technique in defining the project and clarifying what sort of process improvements would be needed to meet it. For the Cray-3, he decided to set an even higher performance improvement goal, an increase of 12x over the Cray-2.
Cray had always attacked the problem of increased speed with three simultaneous advances; more
execution unit
In computer engineering, an execution unit (E-unit or EU) is a part of a processing unit that performs the operations and calculations forwarded from the instruction unit. It may have its own internal control sequence unit (not to be confused w ...
s to give the system higher parallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. Of the three, Cray was normally least aggressive on the last; his designs tended to use components that were already in widespread use, as opposed to leading-edge designs.
For the Cray-2, he introduced a novel 3D-packaging system for its
integrated circuit
An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components a ...
s to allow higher densities, and it appeared that there was some room for improvement in this process. For the new design, he stated that all wires would be limited to a maximum length of . This would demand the processor be able to fit into a block, about that of the Cray-2 CPU. This would not only increase performance but make the system 27 times smaller.
For a 12x performance increase, the packaging alone would not be enough, the circuits on the chips themselves would also have to speed up. The Cray-2 appeared to be pushing the limits of the speed of
silicon
Silicon is a chemical element; it has symbol Si and atomic number 14. It is a hard, brittle crystalline solid with a blue-grey metallic lustre, and is a tetravalent metalloid (sometimes considered a non-metal) and semiconductor. It is a membe ...
-based
transistor
A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch electrical signals and electric power, power. It is one of the basic building blocks of modern electronics. It is composed of semicondu ...
s at 4.1 ns (244 MHz), and it did not appear that anything more than another 2x would be possible. If the goal of 12x was to be met, more radical changes would be needed, and a "high tech" approach would have to be used.
Cray had intended to use
gallium arsenide
Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a Zincblende (crystal structure), zinc blende crystal structure.
Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monoli ...
circuitry in the Cray-2, which would not only offer much higher switching speeds but also used less energy and thus ran cooler as well. At the time the Cray-2 was being designed, the state of GaAs manufacturing simply was not up to the task of supplying a supercomputer. By the mid-1980s, things had changed and Cray decided it was the only way forward. Given a lack of investment on the part of large chip makers, Cray decided to invest in a GaAs chipmaking startup, GigaBit Logic, and use them as an internal supplier.
Describing the system in November 1988, Cray stated that the 12 times performance increase would be made up of a three times increase due to GaAs circuits, and four times due to the use of more processors. One of the problems with the Cray-2 had been poor multiprocessing performance due to limited
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
between the processors, and to address this the Cray-3 would adopt the much faster architecture used in the Cray Y-MP. This would provide a design performance of 8000 MIPS, or 16
GFLOPS
Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
For such cases, it is a more accurate measu ...
.
Development
The Cray-3 was originally slated for delivery in 1991. This was during a time when the supercomputer market was rapidly shrinking from 50% annual growth in 1980, to 10% in 1988. At the same time, Cray Research was also working on the Y-MP, a faster multi-processor version of the system architecture tracing its ancestry to the original
Cray-1
The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the ...
. In order to focus the Y-MP and Cray-3 groups, and with Cray's personal support, the Cray-3 project moved to a new research center in
Colorado Springs
Colorado Springs is the most populous city in El Paso County, Colorado, United States, and its county seat. The city had a population of 478,961 at the 2020 census, a 15.02% increase since 2010. Colorado Springs is the second-most populous c ...
.
By 1989, the Y-MP was starting deliveries, and the main CRI lab in Chippewa Falls, Wisconsin, moved on to the C90, a further improvement in the Y-MP series. With only 25 Cray-2s sold, management decided that the Cray-3 should be put on "low priority" development. In November 1988, the Colorado Springs lab was spun off as Cray Computer Corporation (CCC), with CRI retaining 10% of the new company's stock and providing an $85 million promissory note to fund development. Cray himself was not a shareholder in the new company, and worked under contract. As CRI retained the lease on the original building, the new company had to move once again, introducing further delays.
By 1991, development was behind schedule. Development slowed even more when
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
cancelled its order for the first machine, in favor of the C90. Several executives, including the CEO, left the company. The company then announced they would be looking for a customer that needed a smaller version of the machine, with four to eight processors.
The first (and only) production model (serial number S5, named ''Graywolf'') was loaned to
NCAR
The US National Center for Atmospheric Research (NCAR ) is a US federally funded research and development center (FFRDC) managed by the nonprofit University Corporation for Atmospheric Research (UCAR) and funded by the National Science Foundat ...
as a demonstration system in May 1993. NCAR's version was configured with 4 processors and a 128 MWord (64-bit words, 1 GB) common memory. In service, the
static RAM
Static random-access memory (static RAM or SRAM) is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.
The ''static'' qualifier differ ...
proved to be problematic. It was also discovered that the
square root
In mathematics, a square root of a number is a number such that y^2 = x; in other words, a number whose ''square'' (the result of multiplying the number by itself, or y \cdot y) is . For example, 4 and −4 are square roots of 16 because 4 ...
code contained a bug that resulted in 1 in 60 million calculations being wrong. Additionally, one of the four CPUs was not running reliably.
CCC declared bankruptcy in March 1995, after spending about $300 million of financing. NCAR's machine was officially decommissioned the next day. Seven system cabinets, or "tanks", serial numbers S1 to S7, were built for Cray-3 machines. Most were for smaller two-CPU machines. Three of the smaller tanks were used on the
Cray-4
The Cray-4 was intended to be Cray Computer Corporation's successor to the failed Cray-3 supercomputer. It was marketed to compete with the T90 from Cray Research. CCC went bankrupt in 1995 before any Cray-4 had been delivered.
Design
The earlie ...
project, essentially a Cray-3 with 64 faster CPUs running at 1 ns (1 GHz) and packed into an even smaller space. Another was used for the
Cray-3/SSS The Cray-3/SSS (Super Scalable System) was a pioneering massively parallel supercomputer project that bonded a two-processor Cray-3 to a new SIMD processing unit based entirely in the computer's main memory.http://www.secinfo.com/dsVQy.a1u4.htm CCC ...
project.
The failure of the Cray-3 was in large part due to the changing political and technical climate. The machine was being designed during the collapse of the
Warsaw Pact
The Warsaw Pact (WP), formally the Treaty of Friendship, Co-operation and Mutual Assistance (TFCMA), was a Collective security#Collective defense, collective defense treaty signed in Warsaw, Polish People's Republic, Poland, between the Sovi ...
and ending of the
Cold War
The Cold War was a period of global Geopolitics, geopolitical rivalry between the United States (US) and the Soviet Union (USSR) and their respective allies, the capitalist Western Bloc and communist Eastern Bloc, which lasted from 1947 unt ...
, which led to a massive downsizing in supercomputer purchases. At the same time, the market was increasingly investing in
massively parallel
Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of ...
(MP or MPP) designs. Cray was critical of this approach, and was quoted by ''
The Wall Street Journal
''The Wall Street Journal'' (''WSJ''), also referred to simply as the ''Journal,'' is an American newspaper based in New York City. The newspaper provides extensive coverage of news, especially business and finance. It operates on a subscriptio ...
'' as saying that MPP systems had not yet proven their supremacy over vector computers, noting the difficulty many users have had programming for large parallel machines. "I don't think they'll ever be universally successful, at least not in my lifetime".
Architecture
Logical design
The Cray-3 system architecture comprised a ''foreground processing system'', up to 16 ''background processors'' and up to 2 gigawords (16 GB) of ''common memory''. The foreground system was dedicated to
input/output
In computing, input/output (I/O, i/o, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, such as another computer system, peripherals, or a human operator. Inputs a ...
and system management. It included a 32-bit processor and four synchronous data channels for
mass storage
In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term ''mass'' in ''mass storage'' is used to mean ''large'' in relation to contemporaneous hard disk drive ...
and network devices, primarily via HiPPI channels.
Each background processor consisted of a ''computation section'', a ''control section'' and ''local memory''. The computation section performed
64-bit
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, a ...
scalar,
floating point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base.
Numbers of this form ...
and vector arithmetic. The control section provided instruction buffers, memory management functions, and a
real-time clock
A real-time clock (RTC) is an electronic device (most often in the form of an integrated circuit) that measures the passage of time.
Although the term often refers to the devices in personal computers, server (computing), servers and embedded ...
. 16 kilowords (128 kbytes) of high-speed local memory was incorporated into each background processor for use as temporary scratch memory.
Common memory consisted of silicon
CMOS
Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss
", , ) is a type of MOSFET, metal–oxide–semiconductor field-effect transistor (MOSFET) semiconductor device fabrication, fabrication process that uses complementary an ...
SRAM, organized into ''octants'' of 64 banks each, with up to eight octants possible. The word size was 64-bits plus eight error-correction bits, and total memory bandwidth was rated at 128 gigabytes per second.
CPU design
As with previous designs, the core of the Cray-3 consisted of a number of modules, each containing several circuit boards packed with parts. In order to increase density, the individual
GaAs
Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a zinc blende crystal structure.
Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monolithic microwave integrated circui ...
chips were not packaged, and instead several were mounted directly with ultrasonic gold bonding to a board approximately square. The boards were then turned over and mated to a second board carrying the electrical wiring, with wires on this card running through holes to the "bottom" (opposite the chips) side of the chip carrier where they were bonded, hence sandwiching the chip between the two layers of board. These ''submodules'' were then stacked four-deep and, as in the Cray-2, wired to each other to make a 3D circuit.
Unlike the Cray-2, the Cray-3 modules also included
edge connector
An edge connector is the portion of a printed circuit board (PCB) consisting of signal trace, traces leading to the edge of the board that are intended to plug into a matching jack (connector), socket. The edge connector is a money-saving devic ...
s. Sixteen such submodules were connected together in a 4×4 array to make a single module measuring . Even with this advanced packaging the circuit density was low even by 1990s standards, at about 96,000 gates per cubic inch. Modern CPUs offer gate counts of millions per square inch, and the move to 3D circuits was still just being considered .
Thirty-two such modules were then stacked and wired together with a mass of twisted-pair wires into a single processor. The basic cycle time was 2.11 ns, or 474 MHz, allowing each processor to reach about 0.948
GFLOPS
Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
For such cases, it is a more accurate measu ...
, and a 16 processor machine a theoretical 15.17 GFLOP. Key to the high performance was the high-speed access to main memory, which allowed each process to burst up to 8 GB/s.
Mechanical design
The modules were held together in an aluminum chassis known as a "brick". The bricks were immersed in liquid
fluorinert
Fluorinert is the trademarked brand name for the line of electronics coolant liquids sold commercially by 3M. As perfluorinated compounds (PFCs), all Fluorinert variants have an extremely high global warming potential (GWP), so should be used with ...
for cooling, as in the Cray-2. A four-processor system with 64 memory modules dissipated about 88 kW of power. The entire four-processor system was about tall and front-to-back, and a little over wide.
For systems with up to four processors, the processor assembly sat under a translucent bronzed acrylic cover at the top of a cabinet wide, deep and high, with the memory below it, and then the power supplies and cooling systems on the bottom. Eight and 16-processors system would have been housed in a larger octagonal cabinet. All in all, the Cray-3 was considerably smaller than the Cray-2, itself relatively small compared to other supercomputers.
In addition to the system cabinet, a Cray-3 system also needed one or two (depending on number of processors) ''system control pods'' (or "C-Pods"), square and high, containing power and cooling control equipment.
System configurations
The following possible Cray-3 configurations were officially specified:
Software
The Cray-3 ran the Colorado Springs Operating System (''CSOS'') which was based upon Cray Research's UNICOS
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
version 5.0.
A major difference between CSOS and UNICOS was that CSOS was ported to standard C with all PCC extensions that were used in UNICOS removed.
Much of the software available under the Cray-3 was derived from Cray Research and included for instance the
X Window System
The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems.
X originated as part of Project Athena at Massachusetts Institute of Technology (MIT) in 1984. The X protocol has been at ...
TCP/IP
The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...