HOME

TheInfoList



OR:

Xeon Phi is a discontinued series of
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
manycore processor Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are use ...
s designed and made by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
s (APIs) such as
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
. Xeon Phi launched in 2010. Since it was originally based on an earlier GPU design ( codenamed "Larrabee") by Intel that was cancelled in 2009, it shared application areas with GPUs. The main difference between Xeon Phi and a
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
like
Nvidia Tesla Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or GPGPU, general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began us ...
was that Xeon Phi, with an x86-compatible core, could, with less modification, run software that was originally targeted to a standard x86 CPU. Initially in the form of
PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as Peripher ...
-based add-on cards, a second-generation product, codenamed ''Knights Landing'', was announced in June 2013. These second-generation chips could be used as a standalone CPU, rather than just as an add-in card. In June 2013, the
Tianhe-2 Tianhe-2 or TH-2 (, i.e. 'Milky Way 2') is a 33.86- petaflop supercomputer located in the National Supercomputer Center in Guangzhou, China. It was developed by a team of 1,300 scientists and engineers. It was the world's fastest supercomputer ...
supercomputer at the
National Supercomputer Center in Guangzhou The National Supercomputer Center in Guangzhou houses Tianhe-2, which is currently the seventh fastest supercomputer in the world, with a measured 33.86 petaflop/s (quadrillions of calculations per second). Tianhe-2 is operated by the National ...
(NSCC-GZ) was announced as the world's fastest supercomputer (, it is ). It used Intel Xeon Phi coprocessors and Ivy Bridge-EP Xeon E5 v2 processors to achieve 33.86 petaFLOPS. The Xeon Phi product line directly competed with
Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
's Tesla and AMD Radeon Instinct lines of deep learning and GPGPU cards. It was discontinued due to a lack of demand and Intel's problems with its 10 nm node.


History


Background

The Larrabee microarchitecture (in development since 2006) introduced very wide (512-bit)
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
units to an
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
architecture based processor design, extended to a cache-coherent multiprocessor system connected via a ring bus to memory; each core was capable of four-way multithreading. Due to the design being intended for GPU as well as general purpose computing, the Larrabee chips also included specialised hardware for texture sampling. The project to produce a retail GPU product directly from the Larrabee research project was terminated in May 2010. Another contemporary Intel research project implementing x86 architecture on a many-multicore processor was the ' Single-chip Cloud Computer' (prototype introduced 2009), a design mimicking a
cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...
computer datacentre on a single chip with multiple independent cores: the prototype design included 48 cores per chip with hardware support for selective frequency and voltage control of cores to maximize energy efficiency, and incorporated a mesh network for inter-chip messaging. The design lacked cache-coherent cores and focused on principles that would allow the design to scale to many more cores. The Teraflops Research Chip (prototype unveiled 2007) is an experimental 80-core chip with two
floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
units per core, implementing a 96-bit
VLIW Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction-level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel computing, para ...
architecture instead of the x86 architecture. The project investigated intercore communication methods, per-chip power management, and achieved 1.01 
TFLOPS Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measur ...
at 3.16 GHz consuming 62 W of power.


Knights Ferry

Intel's Many Integrated Core (MIC) prototype board, named ''Knights Ferry'', incorporating a processor codenamed ''Aubrey Isle'' was announced 31 May 2010. The product was stated to be a derivative of the ''Larrabee'' project and other Intel research including the ''Single-chip Cloud Computer''. The development product was offered as a PCIe card with 32 in-order cores at up to 1.2 GHz with four threads per core, 2 GB GDDR5 memory, and 8 MB coherent L2 cache (256 KB per core with 32 KB L1 cache), and a power requirement of ≈300 W, built at a 45 nm process. In the ''Aubrey Isle'' core a 1,024-bit ring bus (512-bit bi-directional) connects processors to main memory. Single-board performance has exceeded 750 GFLOPS. The prototype boards only support
single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floati ...
floating-point instructions. Initial developers included
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in Meyrin, western suburb of Gene ...
,
Korea Institute of Science and Technology Information The Korea Institute of Science and Technology Information (KISTI; ) is a government-funded research institute under the Ministry of Science and ICT of South Korea. Established in 1962 and headquartered in Daedeok Science Town, Daejeon. KISTI ...
(KISTI) and Leibniz Supercomputing Centre. Hardware vendors for prototype boards included IBM, SGI, HP, Dell and others.


Knights Corner

The ''Knights Corner'' product line is made at a 22 nm process size, using Intel's Tri-gate technology with more than 50 cores per chip, and is Intel's first many-cores commercial product. In June 2011, SGI announced a partnership with Intel to use the MIC architecture in its high-performance computing products. In September 2011, it was announced that the Texas Advanced Computing Center (TACC) will use Knights Corner cards in their 10-petaFLOPS "Stampede" supercomputer, providing 8 petaFLOPS of compute power. According to "Stampede: A Comprehensive Petascale Computing Environment" the "second-generation Intel (Knights Landing) MICs will be added when they become available, increasing Stampede's aggregate peak performance to at least 15 PetaFLOPS." On 15 November 2011, Intel showed an early silicon version of a Knights Corner processor. On 5 June 2012, Intel released open source software and documentation regarding Knights Corner. On 18 June 2012, Intel announced at the 2012 Hamburg
International Supercomputing Conference The ISC High Performance, formerly known as the International Supercomputing Conference, is a yearly conference on supercomputing which has been held in Europe since 1986. It stands as the oldest supercomputing conference in the world. History ...
that ''Xeon Phi'' will be the
brand name A brand is a name, term, design, symbol or any other feature that distinguishes one seller's goods or service from those of other sellers. Brands are used in business, marketing, and advertising for recognition and, importantly, to create and ...
used for all products based on their Many Integrated Core architecture. In June 2012,
Cray Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed ...
announced it would be offering 22 nm 'Knight's Corner' chips (branded as 'Xeon Phi') as a co-processor in its 'Cascade' systems. In June 2012, ScaleMP announced a virtualization update allowing Xeon Phi as a transparent processor extension, allowing legacy MMX/ SSE code to run without code changes. An important component of the Intel Xeon Phi coprocessor's core is its vector processing unit (VPU). The VPU features a novel 512-bit SIMD instruction set, officially known as Intel Initial Many Core Instructions (Intel IMCI). Thus, the VPU can execute 16
single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floati ...
(SP) or 8
double-precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point. Double prec ...
(DP) operations per cycle. The VPU also supports Fused Multiply-Add (FMA) instructions and hence can execute 32 SP or 16 DP floating point operations per cycle. It also provides support for integers. The VPU also features an Extended Math Unit (EMU) that can execute operations such as reciprocal, square root, and logarithm, thereby allowing these operations to be executed in a vector fashion with high bandwidth. The EMU operates by calculating polynomial approximations of these functions. On 12 November 2012, Intel announced two Xeon Phi coprocessor families using the 22 nm process size: the Xeon Phi 3100 and the Xeon Phi 5110P. The Xeon Phi 3100 will be capable of more than 1 teraFLOPS of
double-precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point. Double prec ...
floating-point instructions with 240 GB/s memory bandwidth at 300 W. The Xeon Phi 5110P will be capable of 1.01 teraFLOPS of double-precision floating-point instructions with 320 GB/s memory bandwidth at 225 W. The Xeon Phi 7120P will be capable of 1.2 teraFLOPS of double-precision floating-point instructions with 352 GB/s memory bandwidth at 300 W. On 17 June 2013, the
Tianhe-2 Tianhe-2 or TH-2 (, i.e. 'Milky Way 2') is a 33.86- petaflop supercomputer located in the National Supercomputer Center in Guangzhou, China. It was developed by a team of 1,300 scientists and engineers. It was the world's fastest supercomputer ...
supercomputer was announced by
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
as the world's fastest. Tianhe-2 used Intel Ivy Bridge Xeon and Xeon Phi processors to achieve 33.86 petaFLOPS. It was the fastest on the list for two and a half years, lastly in November 2015.


Design and programming

The cores of Knights Corner are based on a modified version of P54C design, used in the original Pentium. The basis of the Intel MIC architecture is to leverage x86 legacy by creating an x86-compatible multiprocessor architecture that can use existing parallelization software tools. Programming tools include
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
,
OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...
,
Cilk Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loop ...
/ Cilk Plus and specialised versions of Intel's Fortran, C++ and math libraries. Design elements inherited from the Larrabee project include x86 ISA, 4-way SMT per core, 512-bit SIMD units, 32 KB L1 instruction cache, 32 KB L1 data cache, coherent L2 cache (512 KB per core), and ultra-wide ring bus connecting processors and memory. The Knights Corner 512-bit SIMD instructions share many intrinsic functions with AVX-512 extension . The instruction set documentation is available from Intel under the extension name of KNC.


Knights Landing

Code name for the second-generation MIC architecture product from Intel. Intel officially first revealed details of its second-generation Intel Xeon Phi products on 17 June 2013. Intel said that the next generation of Intel MIC Architecture-based products will be available in two forms, as a coprocessor or a host processor (CPU), and be manufactured using Intel's 14 nm process technology. Knights Landing products will include integrated on-package memory for significantly higher memory bandwidth. Knights Landing contains up to 72 Airmont (Atom) cores with four threads per core, using
LGA 3647 LGA 3647 is an Intel microprocessor compatible socket Socket may refer to: Mechanics * Socket wrench, a type of wrench that uses separate, removable sockets to fit different sizes of nuts and bolts * Socket head screw, a screw (or bolt) with a ...
socketTom's Hardware: Intel Xeon Phi Knights Landing Now Shipping; Omni Path Update, Too
20 June 2016
supporting up to 384 GB of "far" DDR4 2133 RAM and 8–16 GB of stacked "near" 3D  MCDRAM, a version of the Hybrid Memory Cube. Each core has two 512-bit vector units and supports
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200 (Knights Landing), and then ...
SIMD instructions, specifically the Intel AVX-512 Foundational Instructions (AVX-512F) with Intel AVX-512 Conflict Detection Instructions (AVX-512CD), Intel AVX-512 Exponential and Reciprocal Instructions (AVX-512ER), and Intel AVX-512 Prefetch Instructions (AVX-512PF). Support for IMCI has been removed in favor of AVX-512. The
National Energy Research Scientific Computing Center The National Energy Research Scientific Computing Center (NERSC) is a high-performance computing (supercomputer) research facility that was founded in 1974. The National User Facility is operated by Lawrence Berkeley National Laboratory for th ...
announced that Phase 2 of its newest supercomputing system "Cori" would use Knights Landing Xeon Phi coprocessors. On 20 June 2016, Intel launched the Intel Xeon Phi product family x200 based on the Knights Landing architecture, stressing its applicability to not just traditional simulation workloads, but also to
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. The model lineup announced at launch included only Xeon Phi of bootable form-factor, but two versions of it: standard processors and processors with integrated Intel Omni-Path architecture fabric. The latter is denoted by the suffix F in the model number. Integrated fabric is expected to provide better latency at a lower cost than discrete high-performance network cards. On 14 November 2016, the 48th list of
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
contained two systems using Knights Landing in the Top 10. The
PCIe PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as Peripher ...
based co-processor variant of Knight's Landing was never offered to the general market and was discontinued by August 2017. This included the 7220A, 7240P and 7220P coprocessor cards. Intel announced they were discontinuing Knights Landing in summer 2018.


Models

All models can boost to their peak speeds, adding 200 MHz to their base frequency when running just one or two cores. When running from three to the maximum number of cores, the chips can only boost 100 MHz above the base frequency. All chips run high-AVX code at a frequency reduced by 200 MHz.


Knights Mill

Knights Mill is Intel's codename for a Xeon Phi product specialized in
deep learning Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
, initially released in December 2017. Nearly identical in specifications to Knights Landing, Knights Mill includes optimizations for better utilization of AVX-512 instructions. Single-precision and variable-precision floating-point performance increased, at the expense of double-precision floating-point performance.


Models


Knights Hill

Knights Hill was the codename for the third-generation MIC architecture, for which Intel announced the first details at SC14. It was to be manufactured in a 10 nm process. Knights Hill was expected to be used in the
United States Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and energy production, the research and development of nuclear power, the military's nuclear w ...
Aurora supercomputer, to be deployed at
Argonne National Laboratory Argonne National Laboratory is a Federally funded research and development centers, federally funded research and development center in Lemont, Illinois, Lemont, Illinois, United States. Founded in 1946, the laboratory is owned by the United Sta ...
. However, Aurora was delayed in favor of using an "advanced architecture" with a focus on machine learning. In 2017, Intel announced that Knights Hill had been canceled in favor of another architecture built from the ground up to enable
Exascale computing Exascale computing refers to Computer system, computing systems capable of calculating at least 1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer perform ...
in the future. This new architecture was expected for 2020–2021; however, this was also cancelled due to the discontinuation of the Xeon Phi.


Programming

One performance and programmability study reported that achieving high performance with Xeon Phi still needs help from programmers and that merely relying on compilers with traditional programming models is insufficient. Other studies in various domains, such as life sciences and deep learning, have shown that exploiting the thread- and SIMD-parallelism of Xeon Phi achieves significant speed-ups.


Competitors

*
Nvidia Tesla Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or GPGPU, general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began us ...
, a direct competitor in the HPC market * AMD Radeon Pro and AMD Radeon Instinct direct competitors in the HPC market


See also

* Texas Advanced Computing Center "Stampede" supercomputer incorporates Xeon Phi chips. Stampede is capable of 10 petaFLOPS. *
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200 (Knights Landing), and then ...
*
Cell (microprocessor) The Cell Broadband Engine (Cell/B.E.) is a 64-bit multi-core processor and microarchitecture developed by Sony, Toshiba, and IBM—an alliance known as "STI". It combines a general-purpose PowerPC core, called the Power Processing Element (PPE) ...
* Intel Tera-Scale *
Massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of ...
*
Xeon Xeon (; ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same archite ...


References


External links

* Intel pages
Intel Xeon Phi Processors
* Chips and Cheese, December 8, 2022
Knight's Landing: Atom with AVX-512
* {{Intel processors Coprocessors Intel Intel microprocessors Parallel computing Manycore processors X86 microarchitectures Intel x86 microprocessors