POWER7 is a family of
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
multi-core
A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
s based on the
Power ISA 2.06 instruction set architecture
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
released in 2010 that succeeded the
POWER6
The POWER6 is a microprocessor developed by IBM that implemented the Power ISA#Power ISA v.2.05, Power ISA v.2.05. When it became available in systems in 2007, it succeeded the POWER5#POWER5+, POWER5+ as IBM's flagship Power microprocessor. It i ...
and
POWER6+. POWER7 was developed by
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
at several sites including IBM's
Rochester, MN; Austin, TX;
Essex Junction, VT;
T. J. Watson Research Center, NY;
Bromont, QC and IBM Deutschland Research & Development GmbH,
Böblingen
Böblingen (; ) is a town in Baden-Württemberg, Germany, seat of Böblingen (district), Böblingen District. Sindelfingen and Böblingen are Geographic contiguity, contiguous.
History
Böblingen was founded by Count Wilhelm von Tübingen-Bö ...
, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010.
History
IBM won a $244 million
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
contract in November 2006 to develop a
petascale supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
architecture before the end of 2010 in the
HPCS project. The contract also states that the architecture shall be available commercially. IBM's proposal,
PERCS (Productive, Easy-to-use, Reliable Computer System), which won them the contract, is based on the POWER7 processor,
AIX operating system and
General Parallel File System.
One feature that IBM and DARPA collaborated on is modifying the addressing and page table hardware to support global shared memory space for POWER7 clusters. This enables research scientists to program a cluster as if it were a single system, without using message passing. From a productivity standpoint, this is essential since some scientists are not conversant with
MPI
MPI or Mpi may refer to:
Science and technology Biology and medicine
* Magnetic particle imaging, a tomographic technique
* Myocardial perfusion imaging, a medical procedure that illustrates heart function
* Mannose phosphate isomerase, an enzyme ...
or other parallel programming techniques used in clusters.
Design
The POWER7
superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...
multi-core architecture was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores and
simultaneous multithreading
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
(SMT). The POWER6 architecture was built from the ground up to maximize processor frequency at the cost of power efficiency. It achieved a remarkable 5 GHz. While the POWER6 features a
dual-core
A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...
processor, each capable of two-way
simultaneous multithreading
Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern proces ...
(SMT), the IBM POWER 7 processor has up to eight cores, and four threads per core, for a total capacity of 32 simultaneous threads.
IBM stated at ISCA 29 that peak performance was achieved by high frequency designs with 10–20
FO4 delays per
pipeline
A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...
stage at the cost of power efficiency. However, the POWER6 binary floating-point unit achieves a "6-cycle, 13-
FO4 pipeline".
Therefore, the pipeline for the POWER7 CPU has been changed again, just as it was for the POWER5 and POWER6 designs. In some respects, this rework is similar to Intel's turn in 2005 that left the P4 7th-generation x86 microarchitecture.
Specifications
The POWER7 is available with 4, 6, or 8 physical cores per microchip, in a 1 to 32-way design, with up to 1024 SMTs and a slightly different
microarchitecture
In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular ...
and interfaces for supporting extended/Sub-Specifications in reference to the Power ISA and/or different system architectures. For example, in the Supercomputing (HPC) System Power 775 it is packaged as a 32-way quad-chip-module (QCM) with 256 physical cores and 1024 SMTs. There is also a special
TurboCore mode that can turn off half of the cores from an eight-core processor, but those 4 cores have access to all the memory controllers and L3
cache at increased clock speeds. This makes each core's performance higher which is important for workloads which require the fastest sequential performance at the cost of reduced parallel performance. TurboCore mode can reduce "software costs in half for those applications that are licensed per core, while increasing per core performance from that software."
The new IBM Power 780 scalable, high-end servers featuring the new TurboCore workload optimizing mode and delivering up to double performance per core of POWER6 based systems.
Each core is capable of four-way simultaneous multithreading (SMT). The POWER7 has approximately 1.2 billion transistors and is 567 mm
2 large fabricated on a 45 nm process. A notable difference from POWER6 is that the POWER7 executes instructions out-of-order instead of in-order. Despite the decrease in maximum frequency compared to POWER6 (4.25 GHz vs 5.0 GHz), each core has higher performance than the POWER6, while each processor has up to 4 times the number of cores.
POWER7 has these specifications:
*
45 nm SOI
In Thailand, a ''soi'' ( ) is a side street that branches off of a major street (''thanon'', ). An alley is called a ''trok'' ().
Overview
Sois are usually numbered, and are referred to by the name of the major street and the number, as in "S ...
process, 567 mm
2
* 1.2 billion
transistor
A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch electrical signals and electric power, power. It is one of the basic building blocks of modern electronics. It is composed of semicondu ...
s
* 3.0–4.25 GHz clock speed
* max 4 chips per
quad-chip module
** 4, 6 or 8 C1 cores per chip
*** 4 SMT
threads per C1 core (available in AIX 6.1 TL05 (releases in April 2010) and above)
*** 12 execution units per C1 core:
**** 2 fixed-point units
**** 2 load/store units
**** 4 double-precision floating-point units
**** 1 vector unit supporting
VSX
**** 1 decimal floating-point unit
**** 1 branch unit
**** 1 condition register unit
** 32+32 KB L1 instruction and data cache (per core)
** 256 KB L2 Cache (per C1 core)
** 4 MB L3 cache per C1 core with maximum up to 32 MB supported. The cache is implemented in
eDRAM
Embedded DRAM (eDRAM) is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor. eDRAM's cost-per-bit is higher when compared to equivale ...
, which does not require as many transistors per cell as a standard
SRAM so it allows for a larger cache while using the same area as SRAM.
The technical specification further specifies:
Each POWER7 processor core implements aggressive out-of-order (OoO) instruction execution to drive high efficiency in the use of available execution paths. The POWER7 processor has an Instruction Sequence Unit that is capable of dispatching up to six instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to the Instruction Execution units.
This gives the following theoretical
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floa ...
(SP) performance figures (based on a 4.14 GHz 8 core implementation):
* max 99.36 GFLOPS per core
* max 794.88 GFLOPS per chip
4 64-bit SIMD units per core, and a 128-bit SIMD VMX unit per core, can do 12 Multiply-Adds per cycle, giving 24 SP FP ops per cycle. At 4.14 GHz, that gives 4.14 billion * 24 = 99.36 SP GFLOPS, and at 8 cores, 794.88 SP GFLOPS.
Peak
double precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point arithmetic, floating-point computer number format, number format, usually occupying 64 Bit, bits in computer memory; it represents a wide range of numeri ...
(DP) performance is roughly half of peak SP performance.
For comparison, Intel's 2013
Haswell architecture CPUs can do 16 DP FLOPs or 32 SP FLOPs per cycle (8/16 DP/SP
fused multiply-add spread across 2× 256-bit
AVX2
Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
FP vector units). At 3.4 GHz (i7-4770) this translates into 108.8 SP GFLOPS per core and 435.2 SP GFLOPS peak performance across the 4-core chip, giving roughly similar levels of performance per core, without taking into account the effects or benefits of Intel's
Turbo Boost technology.
This theoretical peak performance comparison holds in practice too, with the POWER7 and the i7-4770 obtaining similar scores in the
SPEC CPU2006
floating point
In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base.
Numbers of this form ...
benchmarks (single-threaded): 71.5 for POWER7 versus 74.0 for i7-4770.
Notice that the POWER7 chip significantly outperformed (2×–5×) the i7 in some benchmarks (bwaves, cactusADM, lbm) while also being significantly slower (2x-3x) in most others. This is indicative of major architectural differences between the two chips / mainboards / memory systems etc.: they were designed with different workloads in mind.
However, overall, in a very broad sense, one can say that the floating-point performance of the POWER7 is similar to that of the Haswell i7.
POWER7+
IBM introduced the POWER7+ processor at the Hot Chips 24 conference in August 2012. It is an updated version with higher speeds, more cache and integrated accelerators. It is manufactured on a 32 nm fabrication process.
The first boxes to ship with the POWER7+ processors were IBM Power 770 and 780 servers. The chips have up to 80 MB of L3 cache (10 MB/core), improved clock speeds (up to 4.4 GHz) and 20
LPAR
A logical partition (LPAR) is a subset of a computer's hardware resources, virtualized as a separate computer. In effect, a physical machine can be partitioned into multiple logical partitions, each hosting a separate instance of an operating ...
s per core.
Products
, the range of POWER7-based systems including
IBM Power Systems
IBM Power Systems is a family of server computers from IBM that are based on its Power processors. It was created in 2008 as a merger of the System p and System i product lines.
History
IBM had two distinct POWER- and PowerPC-based hardwa ...
"Express" models (710, 720, 730, 740 and 750), Enterprise models (770, 780 and 795) and High Performance computing models (755 and 775). Enterprise models differ in having Capacity on Demand capabilities. Maximum specifications are shown in the table below.
IBM also offers 5 POWER7 based
BladeCenters.
Specifications are shown in the table below.
The following are supercomputer projects that use the POWER7 processor:
*
PERCS
*
Watson
See also
*
POWER6
The POWER6 is a microprocessor developed by IBM that implemented the Power ISA#Power ISA v.2.05, Power ISA v.2.05. When it became available in systems in 2007, it succeeded the POWER5#POWER5+, POWER5+ as IBM's flagship Power microprocessor. It i ...
*
IBM Power microprocessors
Power microprocessors (originally POWER prior to Power10) are designed and sold by IBM for Server (computing), servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC ...
References
External links
IBM POWER7 Systems- IBM POWER7 product page
IBM POWER7 Technology and Systems- IBM Journal of Research and Development (published by IEEE Xplore)
IBM Won DARPA HPCS Phase-IIIIBM Won DARPA HPCS Phase-IIIBM Has Its PERCS
{{DEFAULTSORT:Power7
IBM microprocessors
Power microprocessors
64-bit microprocessors