Intel Teraflops Research Chip (codenamed ''Polaris'') is a research
manycore processor
Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are use ...
containing 80
cores, using a
network-on-chip
A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as a ...
architecture, developed by
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
's
Tera-Scale Computing Research Program.
It was manufactured using a 65 nm
CMOS process with eight layers of
copper interconnect
In semiconductor technology, copper interconnects are interconnects made of copper. They are used in silicon integrated circuits (ICs) to reduce propagation delays and power consumption. Since copper is a better conductor than aluminium, ICs ...
and contains 100 million
transistors
upright=1.4, gate (G), body (B), source (S) and drain (D) terminals. The gate is separated from the body by an insulating layer (pink).
A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch e ...
on a 275 mm
2 die.
Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0
TFLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
while dissipating less than 100 W.
Research from the project was later incorporated into
Xeon Phi
Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application progra ...
. The technical lead of the project was Sriram R. Vangal.
The processor was initially presented at the
Intel Developer Forum
The Intel Developer Forum (IDF) was a biannual gathering of technologists to discuss Intel products and products based on Intel products. The first IDF was held in 1997.
To emphasize the importance of China, the Spring 2007 IDF was held in Beij ...
on September 26, 2006 and officially announced on February 11, 2007. A working chip was presented at the 2007
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
International Solid-State Circuits Conference
International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state circuits and Systems-on-a-Chip. The conference is held every year in February at the San Francisco Marriott Marquis in downtown San Fr ...
, alongside technical specifications.
Architecture
The chip consists of a 10x8 2D
mesh network
A mesh network is a local area network topology in which the infrastructure nodes (i.e. bridges, switches, and other infrastructure devices) connect directly, dynamically and non-hierarchically to as many other nodes as possible and cooperate wit ...
of cores and nominally operates at 4 GHz.
[Though the chip was later shown by Intel to run as high as 5.67 GHz.] Each core, called a ''tile'' (3 mm
2), contains a processing engine and a 5-port
wormhole-switched router (0.34 mm
2) with
mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.
The processing engine in each tile contains two independent, 9-stage
pipeline
Pipeline may refer to:
Electronics, computers and computing
* Pipeline (computing), a chain of data-processing stages or a CPU optimization found on
** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
,
single-precision floating-point multiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.
Each FPMAC unit is capable of performing 2 single-precision floating-point operations per
cycle
Cycle, cycles, or cyclic may refer to:
Anthropology and social sciences
* Cyclic history, a theory of history
* Cyclical theory, a theory of American political history associated with Arthur Schlesinger, Sr.
* Social cycle, various cycles in so ...
. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit
very long instruction word
Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
(VLIW) encodes up to eight operations per cycle.
The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.
Underneath each tile, a 256 KB
SRAM module (codenamed ''Freya'') was
3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB. The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.
Issues
Intel aimed to help software development for the new exotic architecture by creating a new
programming model
A programming model is an execution model coupled to an API or a particular pattern of code. In this style, there are actually two execution models in play: the execution model of the base programming language and the execution model of the pro ...
, especially for the chip, called
Ct. The model never gained the following Intel hoped for and has been eventually incorporated into
Intel Array Building Blocks
Intel Array Building Blocks (also known as ArBB) was a C++ library developed by Intel Corporation for exploiting data parallel portions of programs to take advantage of multi-core processors, graphics processing units and Intel Many Integrated Cor ...
, a now defunct C++ library.
See also
*
Single-chip Cloud Computer
*
Xeon Phi
Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application progra ...
Notes
References
{{Reflist, 30em
Intel microprocessors
Manycore processors
Very long instruction word computing