
POWER8 is a family of
superscalar
A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
multi-core
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
s based on the
Power ISA
Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
, announced in August 2013 at the
Hot Chips
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operati ...
conference. The designs are available for licensing under the
OpenPOWER Foundation
The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as proc ...
, which is the first time for such availability of IBM's highest-end processors.
Systems based on POWER8 became available from IBM in June 2014.
Systems and POWER8 processor designs made by other OpenPOWER members were available in early 2015.
Design
POWER8 is designed to be a massively multithreaded chip, with each of its cores capable of handling eight hardware threads simultaneously, for a total of 96 threads executed simultaneously on a 12-core chip. The processor makes use of very large amounts of on- and off-chip
eDRAM
Embedded DRAM (eDRAM) is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor. eDRAM's cost-per-bit is higher when compared to equivale ...
caches, and on-chip memory controllers enable very high bandwidth to memory and system I/O. For most workloads, the chip is said to perform two to three times as fast as its predecessor, the
POWER7
POWER7 is a family of superscalar multi-core microprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6 and POWER6+. POWER7 was developed by IBM at several sites including IBM's Roches ...
.
POWER8 chips comes in 6- or 12-core variants;
[ each version is fabricated in a 22 nm ]silicon on insulator
In semiconductor manufacturing, silicon on insulator (SOI) technology is fabrication of silicon semiconductor devices in a layered silicon–insulator–silicon substrate, to reduce parasitic capacitance within the device, thereby improving perfo ...
(SOI) process using 15 metal layers. The 12-core version consists of 4.2 billion transistors and is 650 mm2 large while the 6-core version is only 362 mm2 large.[ However the 6- and 12-core variants can have all or just some cores active, so POWER8 processors come with 4, 6, 8, 10 or 12 cores activated.
]
CAPI
Where previous POWER processors use the GX++ bus for external communication, POWER8 removes this from the design and replaces it with the CAPI port (Coherent Accelerator Processor Interface) that is layered on top of PCI Express 3.0. The CAPI port is used to connect auxiliary specialized processors such as GPUs, ASICs and FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term ''Field-programmability, field-programmable''. The FPGA configuration is generally specifi ...
s. Units attached to the CAPI bus can use the same memory address space as the CPU, thereby reducing the computing path length. At the 2013 ACM/IEEE Supercomputing Conference, IBM and Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
announced an engineering partnership to closely couple POWER8 with Nvidia GPUs in future HPC systems, with the first of them announced as the Power Systems S824L.
On October 14, 2016, IBM announced the formation of OpenCAPI, a new organization to spread adoption of CAPI to other platforms. Initial members are Google, AMD, Xilinx, Micron and Mellanox.
OCC
POWER8 also contains a so-called ''on-chip controller'' (OCC), which is a power and thermal management microcontroller based on a PowerPC 405
The PowerPC 400 family is a line of 32-bit embedded RISC processor cores based on the PowerPC or Power ISA instruction set architectures. The cores are designed to fit inside specialized applications ranging from system-on-a-chip (SoC) microcont ...
processor. It has two general-purpose offload engines (GPEs) and 512 KB of embedded static RAM
Static random-access memory (static RAM or SRAM) is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed.
The term ''static'' differe ...
(SRAM) (1 KB = 1024 bytes), together with the possibility to access the main memory
Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processing unit (CPU) of a comput ...
directly, while running an open-source firmware
In computing, firmware is a specific class of computer software that provides the low-level control for a device's specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide ...
. OCC manages POWER8's operating frequency, voltage, memory bandwidth, and thermal control for both the processor and memory; it can regulate voltages through 1,764 integrated voltage regulator
A voltage regulator is a system designed to automatically maintain a constant voltage. A voltage regulator may use a simple feed-forward design or may include negative feedback. It may use an electromechanical mechanism, or electronic components ...
s (IVRs) on the fly. Also, the OCC can be programmed to overclock
In computing, overclocking is the practice of increasing the clock rate of a computer to exceed that certified by the manufacturer. Commonly, operating voltage is also increased to maintain a component's operational stability at accelerated spe ...
the POWER8 processor, or to lower its power consumption by reducing the operating frequency (which is similar to the configurable TDP
The thermal design power (TDP), sometimes called thermal design point, is the maximum amount of heat generated by a computer chip or component (often a CPU, GPU or system on a chip) that the cooling system in a computer is designed to dissipat ...
found in some of the Intel and AMD processors).
Memory Buffer chip
POWER8 splits the memory controller functions by moving some of them away from the processor and closer to the memory. The scheduling logic, the memory energy management, and the RAS decision point are moved to a so-called Memory Buffer chip (a.k.a. Centaur). Offloading certain memory processes to the Memory Buffer chip enables memory access optimizations, saving bandwidth and allowing for faster processor to memory communication. It also contains caching structures for an additional 16 MB of L4 cache per chip (up to 128 MB per processor) (1 MB = 1024 KB). Depending on the system architecture the Memory Buffer chips are placed either on the memory modules (Custom DIMM/CDIMM, for example in S824 and E880 models), or on the memory riser card holding standard DIMMs (for example in S822LC models).
The Memory Buffer chip is connected to the processor using a high-speed multi-lane serial link. The memory channel connecting each buffer chip is capable of writing 2 bytes and reading 1 byte at a time. It runs at 8 GB/s in the early Entry models,[ later increased in the high-end and the HPC models to 9.6 GB/s with a 40-ns latency,] for a sustained bandwidth of 24 GB/s and 28.8 GB/s per channel respectively. Each processor has two memory controllers with four memory channels each, and the maximum processor to memory buffer bandwidth is 230.4 GB/s per processor. Depending on the model only one controller might be enabled,[ or only two channels per controller could be in use.][ For increased availability the link provides "on-the-fly" lane isolation and repair.][
Each Memory Buffer chip has four interfaces allowing to use either ]DDR3
Double Data Rate 3 Synchronous Dynamic Random-Access Memory (DDR3 SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth ("double data rate") interface, and has been in use since 2007. It is the higher-speed ...
or DDR4
Double Data Rate 4 Synchronous Dynamic Random-Access Memory (DDR4 SDRAM) is a type of synchronous dynamic random-access memory with a high bandwidth ("double data rate") interface.
Released to the market in 2014, it is a variant of dynamic rand ...
memory at 1600 MHz with no change to the processor link interface. The resulting 32 memory channels per processor allow peak access rate of 409.6 GB/s between the Memory Buffer chips and the DRAM banks. Initially support was limited to 16 GB, 32 GB and 64 GB DIMMs, allowing up to 1 TB to be addressed by the processor. Later support for 128 GB and 256 GB DIMMs was announced, allowing up to 4 TB per processor.
Specifications
The POWER8 core has 64 KB L1 data cache contained in the load-store unit and 32 KB L1 instruction cache contained in the instruction fetch unit, along with a tightly integrated 512 KB L2 cache. In a single cycle each core can fetch up to eight instructions, decode and dispatch up to eight instructions, issue and execute up to ten instructions and commit up to eight instructions.
Each POWER8 core consist of primarily the following six execution unit
In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s:
* Instruction fetch unit (IFU)
* Instruction sequencing unit (ISU)
* Load–store unit
* Fixed-point unit Fixed point may refer to:
* Fixed point (mathematics), a value that does not change under a given transformation
* Fixed-point arithmetic
In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fix ...
(FXU)
* Vector and scalar unit (VSU)
* Decimal floating-point unit
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
(DFU)
Each core has sixteen execution pipelines:
* Two fixed-point pipelines
* Two load-store pipelines
* Two load pipelines
* Four double-precision
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Fl ...
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
pipelines, which can also act as eight single-precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floati ...
pipelines
* Two fully symmetric vector pipelines with support for VMX and VSX AltiVec instructions.
* One cryptographic pipeline (AES
AES may refer to:
Businesses and organizations Companies
* AES Corporation, an American electricity company
* AES Data, former owner of Daisy Systems Holland
* AES Eletropaulo, a former Brazilian electricity company
* AES Andes, formerly AES Gener ...
, Galois Counter Mode, SHA-2
SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...
)
* One branch execution pipeline
* One condition register logical pipeline
* One decimal floating-point pipeline
It has a larger issue queue with 4×16 entries, improved branch predictors and can handle twice as many cache misses. Each core is eight-way hardware multithreaded and can be dynamically and automatically partitioned to have either one, two, four or all eight threads active.[ POWER8 also added support for hardware ]transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database trans ...
. IBM estimates that each core is 1.6 times as fast as the POWER7 in single-threaded operations.
A POWER8 processor is a 6- or 12-chiplet design with variants of either 4, 6, 8, 10 or 12 activated chiplets, in which one chiplet consists of one processing core, 512 KB of SRAM L2 cache on a 64-byte wide bus (which is twice as wide as on its predecessor[), and 8 MB of L3 eDRAM cache per chiplet shareable among all chiplets.] Thus, a six-chiplet processor would have 48 MB of L3 eDRAM cache, while a 12-chiplet processor would have a total of 96 MB of L3 eDRAM cache. The chip can also utilize an up to 128 MB of off-chip eDRAM L4 cache using Centaur companion chips. The on-chip memory controllers can handle 1 TB of RAM and 230 GB/s sustained memory bandwidth. The on-board PCI Express
PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common m ...
controllers can handle 48 GB/s of I/O to other parts of the system. The cores are designed to operate at clock rates between 2.5 and 5 GHz.[
The six-core chips are mounted in pairs on dual-chip modules (DCM) in IBM's scale out servers. In most configurations not all cores are active, resulting in a variety of configurations where the actual core count differs. The 12-core version is used in the high-end E880 and E880C models.
IBM's single-chip POWER8 module is called Turismo] and the dual-chip variant is called Murano. PowerCore's modified version is called CP1.
POWER8 with NVLink
This is a revised version of the original 12-core POWER8 from IBM, and used to be called ''POWER8+''. The main new feature is that it has support for Nvidia's bus technology NVLink
NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was ...
, connecting up to four NVLink devices directly to the chip. IBM removed the ''A Bus'' and PCI interfaces for SMP connections to other POWER8 sockets and replaced them with NVLink interfaces. Connection to a second CPU socket are now provided via the ''X Bus''. Besides that and a slight size increase to 659 mm2, the differences seem minimal compared to previous POWER8 processors.
Licensees
On 19 January 2014, the Suzhou PowerCore Technology Company announced that they will join the OpenPOWER Foundation
The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as proc ...
and license the POWER8 core to design custom-made processors for use in big data and cloud computing
Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over m ...
applications.
Variants
* IBM '' Murano'' a 12-core processor with two six-core chips. Scale-out
Scalability is the property of a system to handle a growing amount of work by adding resources to the system.
In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
processor is available in configurations with disabled cores.
* IBM ''Turismo'' a single-chip 12-core processor. Scale-up processor is commercially available for licensing and purchase in configurations with disabled cores.
* PowerCore ''CP1'' a POWER8 variant with revised security features due to export restrictions between United States and China that will be manufactured in GlobalFoundries
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD ...
(formerly IBM's plant) factory in East Fishkill, New York
East Fishkill is a town on the southern border of Dutchess County, New York, United States. The population was 29,707 at the 2020 census. The town was once the eastern portion of the town of Fishkill.
Hudson Valley Research Park is located in th ...
. Released in 2015.
Systems
; IBM
: Scale Out servers, supporting one or two sockets each carrying a dual-chip module with two six-core POWER8 processors. They come in either 2U or 4U form factors, and one tower configuration. The "L" versions run only Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
, while the others run AIX, IBM i
IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS i ...
and Linux. The "LC" versions are built by OpenPOWER partners.
:* ''Power Systems S812L'' 1× POWER8 DCM (4, 6 or 8 cores), 2U
:* ''Power Systems S814'' 1× POWER8 DCM (6 or 8 cores), 4U or tower
:* ''Power Systems S822'' and ''S822L'' 1× or 2× POWER8 DCM (6, 10, 12 or 20 cores), 2U
:* ''Power Systems S824'' and ''S824L'' 1× or 2× POWER8 DCM (6, 8, 12, 16 or 24 cores), 4U
:* ''Power Systems S821LC "Stratton"'' 2× POWER8 SCM (8 or 10 cores), 1U. Up to 512 GB DDR4 RAM buffered by four Centaur L4 chips. Manufactured by Supermicro
Super Micro Computer, Inc., dba Supermicro, is an information technology company based in San Jose, California. It has manufacturing operations in the Silicon Valley, the Netherlands and at its Science and Technology Park in Taiwan. Founded on ...
.
:* ''Power Systems S822LC for Big Data "Briggs"'' 2× POWER8 SCM (8 or 10 cores), 2U. Up to 512 GB DDR4 RAM buffered by four Centaur L4 chips. Manufactured by Supermicro.[
: Enterprise servers, supporting nodes with four sockets, each carrying 8-, 10- or 12-core modules, for a maximum of 16 sockets, 128 cores and 16 TB of RAM. These machines can run AIX, IBM i, or ]Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
.[
:* ''Power Systems E850'' 2×, 3× or 4× POWER8 DCM (8, 10 or 12 cores), 4U
:* ''Power Systems E870'' 1× or 2× 5U nodes, each with four sockets with 8- or 10-core POWER8 single-chip modules, for up to a total of 80 cores
:* ''Power Systems E880'' 1x, 2x, 3x or 4x 5U nodes, each with four sockets with 8- or 12-core POWER8 single-chip modules for up to a total of 192 cores
: ]High performance computing
High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into a multi ...
:
:* ''Power Systems S812LC'' 1× POWER8 SCM (8 or 10 cores), 2U. Manufactured by Tyan.
:* ''Power Systems S822LC "Firestone"'' 2× POWER8 SCM (8 or 10 cores), 2U. Two Nvidia Tesla K80 GPUs and up to 1 TB commodity DDR3 RAM. Manufactured by Wistron
Wistron Corporation () is an electronics manufacturer based in Taiwan. It was the manufacturing arm of Acer Inc. before being spun off in 2000. As an original design manufacturer, the company designs and manufactures products for other compani ...
.
:* ''Power Systems S822LC for HPC "Minsky"'' 2× POWER8+ SCM (8 or 10 cores), 2U. Up to four NVLink
NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was ...
ed Nvidia Tesla P100 GPUs and up to 1 TB commodity DDR4 RAM. Manufactured by Wistron
Wistron Corporation () is an electronics manufacturer based in Taiwan. It was the manufacturing arm of Acer Inc. before being spun off in 2000. As an original design manufacturer, the company designs and manufactures products for other compani ...
.[
: Hardware Management Console
:* ''7063-CR1 HMC'' 1× POWER8 SCM (6 cores), 1U. Based on the SuperMicro "Stratton" design.
; Tyan
:* An ATX motherboard with one single-chip POWER8 socket called the SP010GM2NR.][
:* ''Palmetto GN70-BP010'', OpenPower reference system. 2U server, with one four-core POWER8 SCM, four RAM sockets, based on a Tyan's motherboard.]
:* ''Habanero TN-71-BP012''. 2U, with one 8 core POWER8 SCM, 32 RAM sockets[
:* ''GT75-BP012''. 1U, with a single 8- or 10-core POWER8 SCM and 32 sockets for RAM modules
; Google
: ]Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
has shown a motherboard with two sockets, intended for internal use only.
; StackVelocity
: StackVelocity has designed a high-performance reference platform, Saba.
; Inspur
: Inspur
Inspur, whose full name is Inspur Group (Chinese: 浪潮集团; pinyin: Làngcháo Jítuán), is an information technology conglomerate in mainland China focusing on cloud computing, big data, key application hosts, servers, storage, artific ...
has made a deal with IBM to develop server hardware based on POWER8 and related technologies.
:* 4U server, two POWER8 sockets.
; Cirrascale
: ''RM4950'' 4U, 4-core POWER8 SCM with four Nvidia Tesla K40 accelerators. Based on Tyan's motherboard.[
; Zoom Netcom
: ''RedPOWER C210'' and ''C220'' 2U and 4U servers with two POWER8 sockets and 64 sockets for RAM modules.]
: ''RedPOWER C310'' and ''C320'' 2U and 4U servers with two CP1 sockets.[
; ChuangHe
: ''OP-1X'' 1U, single socket, 32 RAM slots.]
; Rackspace
Rackspace Technology, Inc. is an American cloud computing company based in Windcrest, Texas, an inner suburb of San Antonio, Texas. The company also has offices in Blacksburg, Virginia, and Austin, Texas, as well as in Australia, Canada, United ...
: ''Barreleye'' 1U, 2 socket, 32 RAM slots. Based on the Open Compute Project
The Open Compute Project (OCP) is an organization that shares designs of data center products and best practices among companies, including ARM, Meta, IBM, Wiwynn, Intel, Nokia, Google, Microsoft, Seagate Technology, Dell, Rackspace, Hewlet ...
platform for use in their OnMetal service.[
; Raptor Computing Systems / Raptor Engineering
: ''Talos I'' unreleased 4U server or workstation, 1 socket, 8 RAM slots.
; Penguin Computing
: ''Magna'' product series
:* ''Magna 2001'' (software development)
:* ''Magna 1015'' (virtualisation)]
:* ''Magna 2002'' and ''Magna 2002S'' (machine learning)
See also
* IBM Power microprocessors
IBM Power microprocessors (originally POWER prior to Power10) are designed and sold by IBM for servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power l ...
* OpenPOWER Foundation
The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as proc ...
* POWER7
POWER7 is a family of superscalar multi-core microprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6 and POWER6+. POWER7 was developed by IBM at several sites including IBM's Roches ...
* POWER9
POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process ...
* IBM A2
The IBM A2 is an open source massively multicore capable and multithreaded 64-bit Power ISA processor core designed by IBM using the Power ISA v.2.06 specification. Versions of processors based on the A2 core range from a 2.3 GHz version wit ...
References
External links
POWER8 Overview, IBM Power Systems
(PDF)
{{DEFAULTSORT:Power8
Computer-related introductions in 2013
IBM microprocessors
OpenPower IP cores
Power microprocessors
Transactional memory
64-bit microprocessors