Manycore processors are special kinds of

multi-core processor A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...

s designed for a high degree of parallel processing, containing numerous simpler, independent

processor core A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

s (from a few tens of cores to thousands or more). Manycore processors are used extensively in

embedded computer An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is em ...

s and

high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

Contrast with multicore architecture

Manycore processors are distinct from

s in being optimized from the outset for a higher degree of

explicit parallelism In computer programming, explicit parallelism is the representation of concurrent computations using primitives in the form of operators, function calls or special-purpose directives. Most parallel primitives are related to process synchronizati ...

, and for higher throughput (or lower power consumption) at the expense of latency and lower single-thread performance. The broader category of

s, by contrast, are usually designed to efficiently run ''both'' parallel ''and'' serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to

out-of-order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In t ...

, deeper

pipeline A pipeline is a system of Pipe (fluid conveyance), pipes for long-distance transportation of a liquid or gas, typically to a market area for consumption. The latest data from 2014 gives a total of slightly less than of pipeline in 120 countries ...

s, more

superscalar A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...

execution units, and larger, more general caches), and shared memory. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2, 4, 8) and may be complemented by a manycore accelerator (such as a

GPU A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

) in a heterogeneous system.

Motivation

Cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that is stored in multiple local caches. In a cache coherent system, if multiple clients have a cached copy of the same region of a shared memory resource, all ...

is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as

message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting ...

scratchpad memory Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is an internal memory, usually high-speed, used for temporary storage of calculations, data, and other work in progress. In reference to a m ...

DMA DMA may refer to: Arts * ''DMA'' (magazine), a defunct dance music magazine * Dallas Museum of Art, in Texas, US * BT Digital Music Awards, an annual event in the UK * Danish Music Awards * Detroit Music Awards * Doctor of musical arts, a degree ...

partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...

, or read-only/non-coherent caches. A manycore processor using a

network on a chip A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as an ...

and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for

TrueNorth A cognitive computer is a computer that hardwires artificial intelligence and machine learning algorithms into an integrated circuit that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic engineering approach. ...

). Manycore processors may have more in common (conceptually) with technologies originating in

such as

clusters may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Cluster II (spacecraft), a European Space Agency mission to study the magnetosphere * Asteroid cluster, a small ...

and

vector processors In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...

. GPUs may be considered a form of manycore processor having multiple

shader processing units Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was l ...

, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).

Programming models

Message passing interface The Message Passing Interface (MPI) is a portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of use ...

OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...

or other APIs supporting

compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main pro ...

s *

Partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...

Actor model The actor model in computer science is a mathematical model of concurrent computation that treats an ''actor'' as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create ...

OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...

Dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...

Classes of manycore systems

s, which can be described as manycore

Massively parallel processor array A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one an ...

Asynchronous array of simple processors The asynchronous array of simple processors (AsAP) architecture comprises a 2-D array of reduced complexity programmable processors with small scratchpad memories interconnected by a reconfigurable mesh network. AsAP was developed by researchers ...

Specific manycore architectures

* ZettaScale

Japanese

PEZY Computing PEZY Computing is a Japanese fabless computer chip design company specialising in the design of manycore processors for supercomputers. History PEZY Computing was founded in 2010 and it is headquartered in Tokyo, Japan. The company's first manyc ...

2,048-core modules *

Xeon Phi Xeon Phi is a discontinued series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and applicati ...

coprocessor, which has MIC (''Many Integrated Cores'') architecture *

Tilera Tilera Corporation was a fabless semiconductor company focusing on manycore embedded processor design. The company shipped multiple processors in the TILE64, TILE''Pro''64, and TILE-Gx lines. After a series of company acquisitions, Tilera's in ...

Adapteva Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single ...

Epiphany Architecture, a manycore chip using PGAS

* Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on HyperX Architecture * Movidius Myriad 2, a manycore

vision processing unit A vision processing unit (VPU) is (as of 2023) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from graphics processing un ...

(VPU) *

Kalray Kalray is a French fabless semiconductor company headquartered in Montbonnot, France. Corporate history Kalray was founded in 2008 as a spin-off of CEA French lab, with investors such as Renault–Nissan–Mitsubishi Alliance, Safran, NXP S ...

, a manycore

PCI-e PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe, is a high-speed standard used to connect hardware components inside computers. It is designed to replace older expansion bus standards such as PCI, PCI ...

accelerator for data-intensive tasks *

Teraflops Research Chip Intel Teraflops Research Chip (codenamed ''Polaris'') is a research manycore processor containing 80 Multi-core processor, cores, using a Network on a chip, network-on-chip architecture, developed by Intel's Intel Tera-Scale, Tera-Scale Computing ...

, a manycore processor using message passing *

, an

AI accelerator A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence (AI) and machine learning applications, inc ...

with a manycore network on a chip architecture * Green arrays, a manycore processor using message passing aimed at low power applications *

Sunway SW26010 The SW26010 is a 260-core manycore processor designed by the Shanghai Integrated Circuit Technology and Industry Promotion Center (ICC for short)( Chinese: 上海集成电路技术与产业促进中心 (简称ICC)). It implements the Sunway archite ...

, a 260-core manycore processor used in the then top 1 supercomputer

Sunway TaihuLight The Sunway TaihuLight ( ''Shénwēi·tàihú zhī guāng'') is a Chinese supercomputer which, , is ranked 11th in the TOP500 list, with a LINPACK benchmark rating of 93 petaflops. The name is translated as ''divine power, the light of Taihu Lake ...

** SW52020, an improved 520-core variant of SW26010, with 512-bit SIMD (also adding support for half-precision), used in a prototype, meant for an exascale system (and in the future 10 exascale system), and according to datacenterdynamics China is rumored to already have two separate exascale systems secretly * Eyeriss, a manycore processor designed for running convolutional neural nets for embedded vision applications *

Graphcore Graphcore Limited is a British semiconductor company that develops AI accelerator, accelerators for AI and machine learning. It has introduced a massively parallel ''Intelligence Processing Unit'' (IPU) that holds the complete machine learning mo ...

, a manycore

Specific manycore computers with 1M+ CPU cores

A number of computers built from multicore processors have one million or more individual CPU cores. Examples include: *

Gyoukou is a supercomputer developed by and PEZY Computing, based around ExaScaler's ZettaScaler immersion cooling system. It was deployed at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) Yokohama Institute for Earth Sciences, th ...

(

Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...

: 暁光 Hepburn: ''gyōkō'', dawn light), a

supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

developed by ExaScaler and

, with 20,480,000 processing elements total plus the 1,250 Intel Xeon D host processors. *

SpiNNaker A spinnaker is a sail designed specifically for sailing off the wind on courses between a Point of sail#Reaching, reach (wind at 90° to the course) to Point of sail#Running downwind, downwind (course in the same direction as the wind). Spinna ...

, a massively parallel (1 million CPU cores) manycore processor (ARM-based) built as part of the

Human Brain Project The Human Brain Project (HBP) was a €1-billion EU scientific research project that ran for ten years from 2013 to 2023. Using high-performance exascale supercomputers it built infrastructure that allowed researchers to advance knowledge in ...

Specific computers with 5 million or more CPU cores

Quite a few

s have over 5 million CPU cores. When there are also coprocessors, e.g. GPUs used with, then those cores are not listed in the core-count, then quite a few more computers would hit those targets. *

Frontier A frontier is a political and geographical term referring to areas near or beyond a boundary. Australia The term "frontier" was frequently used in colonial Australia in the meaning of country that borders the unknown or uncivilised, th ...

* Fugaku, a Japanese

using

Fujitsu A64FX The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. It powers the Fugaku supercomputer, ranked in the TOP500 as the fastest ...

ARM-based cores, 7,630,848 in total. *

, a massively parallel (10 million CPU cores) Chinese

, once one of the fastest supercomputers in the world, using a custom manycore architecture. As of November 2018, it was the world's third fastest supercomputer (as ranked by the

TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...

list), obtaining its performance from 40,960

SW26010 The SW26010 is a 260-core manycore processor designed by the Shanghai Integrated Circuit Technology and Industry Promotion Center (ICC for short)(Chinese: 上海集成电路技术与产业促进中心 (简称ICC)). It implements the Sunway architec ...

manycore processors, each containing 256 cores.

References

External links

Architecting solutions for the Manycore future
published on Feb 19, 2010 (more than one dead link in the slide)
Eyeriss architecture
{{Parallel computing Computer architecture Manycore processors Parallel computing