computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...

, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism whereby multiple processors cooperate in the execution of a program in order to obtain results faster. The term SPMD was introduced in 1983 and was used to denote two different computational models: # by Michel Auguin (University of Nice Sophia-Antipolis) and François Larbey (Thomson/Sintra), as a "''fork-and-join''" and data-parallel approach where the ''parallel tasks ("single program")'' are split-up and run simultaneously ''in lockstep'' on multiple ''SIMD processors'' with different inputs, and # by Frederica Darema (IBM), where "''all (processors) processes begin executing the same program... but through synchronization directives ... self-schedule themselves to execute different instructions and act on different data''" and enabling MIMD parallelization of a given program, and is a more general approach than data-parallel and more efficient than the fork-and-join for parallel execution on general purpose multiprocessors. The (IBM) SPMD is the most common style of parallel programming and can be considered a subcategory of MIMD in that it refers to MIMD execution of a given ("single") program. It is also a prerequisite for research concepts such as active messages and

distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memo ...

SPMD vs SIMD

In SPMD parallel execution, multiple autonomous processors simultaneously execute the same program at independent points, rather than in the lockstep that

SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...

or SIMT imposes on different data. With SPMD, tasks can be executed on general purpose CPUs. In SIMD the same operation (instruction) is applied on multiple data to manipulate data streams (a version of SIMD is

vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array d ...

where the data are organized as vectors). Another class of processors,

GPUs A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...

encompass multiple SIMD streams processing. SPMD and SIMD are not mutually exclusive; SPMD parallel execution can include SIMD, or vector, or GPU sub-processing. SPMD has been used for parallel programming of both message passing and shared-memory machine architectures.

Distributed memory

distributed memory In computer science, distributed memory refers to a Multiprocessing, multiprocessor computer system in which each Central processing unit, processor has its own private Computer memory, memory. Computational tasks can only operate on local data ...

computer architectures, SPMD implementations usually employ

message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting ...

programming. A distributed memory computer consists of a collection of interconnected, independent computers, called nodes. For parallel execution, each node starts its own program and communicates with other nodes by sending and receiving messages, calling send/receive routines for that purpose. Other ''parallelization directives'' such as Barrier

synchronization Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...

may also be implemented by messages. The messages can be sent by a number of communication mechanisms, such as

TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...

over

Ethernet Ethernet ( ) is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...

, or specialized high-speed interconnects such as Myrinet and Supercomputer Interconnect. For distributed memory environments, serial sections of the program can be implemented by identical computation of the serial section on all nodes rather than computing the result on one node and sending it to the others, if that improves performance by reducing communication overhead. Nowadays, the programmer is isolated from the details of the message passing by standard interfaces, such as PVM and

MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, a tomographic technique * Myocardial perfusion imaging, a medical procedure that illustrates heart function * Mannose phosphate isomerase, an enzyme ...

. Distributed memory is the programming style used on parallel supercomputers from homegrown

Beowulf cluster A Beowulf cluster is a computer cluster of normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed that allow processing to be shared among them. The result is a high-performa ...

s to the largest clusters on the

Teragrid TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011. The TeraGrid integrated high-performance computers, data resources and tools, an ...

, as well as present GPU-based supercomputers.

Shared memory

On a shared memory machine (a computer with several interconnected CPUs that access the same memory space), the sharing can be implemented in the context of either physically shared memory or logically shared (but physically distributed) memory; in addition to the shared memory, the CPUs in the computer system can also include local (or private) memory. For either of these contexts, synchronization can be enabled with hardware enabled primitives (such as

compare-and-swap In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given (the previous) value and, only if they are the same, modifies the ...

, or

fetch-and-add In computer science, the fetch-and-add (FAA) CPU instruction atomically increments the contents of a memory location by a specified value. That is, fetch-and-add performs the following operation: increment the value at address by , where is ...

. For machines that do not have such hardware support, locks can be used and data can be "exchanged" across processors (or, more generally, '' processes'' or threads) by depositing the sharable data in a shared memory area. When the hardware does not support shared memory, packing the data as a "message" is often the most efficient way to program (logically) shared memory computers with large number of processors, where the physical memory is local to processors and accessing the memory of another processor takes longer. SPMD on a shared memory machine can be implemented by standard processes (heavyweight) or threads (lightweight). Shared memory

multiprocessing Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...

(both

symmetric multiprocessing Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...

, SMP, and

non-uniform memory access Non-uniform memory access (NUMA) is a computer storage, computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory ...

, NUMA) presents the programmer with a common memory space and the possibility to parallelize execution. With the (IBM) SPMD model the cooperating processors (or processes) take different paths through the program, using parallel directives (''parallelization and synchronization directives'', which can utilize compare-and-swap and fetch-and-add operations on shared memory synchronization variables), and perform operations on data in the shared memory ("shared data"); the processors (or processes) can also have access and perform operations on data in their local memory ("private data"). In contrast, with fork-and-join approaches, the program starts executing on one processor and the execution splits in a parallel region, which is started when parallel directives are encountered; in a parallel region, the processors execute a parallel task on different data. A typical example is the parallel DO loop, where different processors work on separate parts of the arrays involved in the loop. At the end of the loop, execution is synchronized (with soft- or hard-barriers), and processors (processes) continue to the next available section of the program to execute. The (IBM) SPMD has been implemented in the current standard interface for shared memory multiprocessing,

OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...

, which uses multithreading, usually implemented by lightweight processes, called threads.

Combination of levels of parallelism

Current computers allow exploiting many parallel modes at the same time for maximum combined effect. A distributed memory program using

may run on a collection of nodes. Each node may be a shared memory computer and execute in parallel on multiple CPUs using OpenMP. Within each CPU, SIMD vector instructions (usually generated automatically by the compiler) and

superscalar A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single in ...

instruction execution (usually handled transparently by the CPU itself), such as pipelining and the use of multiple parallel functional units, are used for maximum single CPU speed.

History

The acronym SPMD for "Single-Program Multiple-Data" has been used to describe two different computational models for exploiting parallel computing, and this is due to both terms being natural extensions of Flynn's taxonomy. The two respective groups of researchers were unaware of each other's use of the term SPMD to independently describe different models of parallel programming. The term SPMD was proposed first in 1983 by Michel Auguin (University of Nice Sophia-Antipolis) and François Larbey (Thomson/Sintra) in the context of the OPSILA parallel computer and in the context of a fork-and-join and data parallel computational model approach. This computer consisted of a master (controller processor) and SIMD processors (or vector processor mode as proposed by Flynn). In Auguin's SPMD model, the same (parallel) task ("''same program''") is executed on different (SIMD) processors ("''operating in lock-step mode''" acting on a part ("slice") of the data-vector. Specifically, their 1985 paper and others stated:

We consider the SPMD (Single Program, Multiple Data) operating mode. This mode allows simultaneous execution of the same task (one per processor) but prevents data exchange between processors. Data exchanges are only performed under SIMD mode by means of vector assignments. We assume synchronizations are summed-up to switchings (sic) between SIMD and SPMD operatings icmodes using global fork-join primitives.

Starting around the same timeframe (in late 1983 – early 1984), the SPMD term was proposed by Frederica Darema (at IBM at that time, and part of the RP3 group) to define a different SPMD computational model that she proposed, as a programming model which in the intervening years has been applied to a wide range of general-purpose high-performance computers (including RP3 - the 512-processor IBM Research Parallel Processor Prototype) and has led to the current parallel computing standards. The (IBM) SPMD programming model assumes a multiplicity of processors which operate cooperatively, all executing the same program but can take different paths through the program based on parallelization directives embedded in the program:

All processes participating in the parallel computation are created at the beginning of the execution and remain in existence until the end ... he processors/processesexecute different instructions and act on different data ... the job work)to be done by each process is allocated dynamically ... .e. the processesself-schedule themselves to execute different instructions and act on different data hus self-assign themselves to cooperate in execution of serial and parallel tasks (as well as replicate tasks) in the program./blockquote>The notion ''
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...
'' generalized the term ''processor'' in the sense that multiple processes can execute on a processor (to for example exploit larger degrees of parallelism for more efficiency and load-balancing). The (IBM) SPMD model was proposed by Darema as an approach different and more efficient than the fork-and-join that was pursued by all others in the community at that time; it is also more general than just "data-parallel" computational model and can encompass fork-and-join (as a subcategory implementation). The original context of the (IBM) SPMD was the RP3 computer (the 512-prosessor IBM Research Parallel Processor Prototype), which supported general purpose computing, with both distributed and (logically) shared memory. The (IBM) SPMD model was implemented by Darema and IBM colleagues into the EPEX (Environment for Parallel Execution), one of the first prototype programming environments. The effectiveness of the (IBM) SPMD was demonstrated for a wide class of applications, and was implemented in the IBM FORTRAN in 1988, the first vendor-product in parallel programming; and in
MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, a tomographic technique * Myocardial perfusion imaging, a medical procedure that illustrates heart function * Mannose phosphate isomerase, an enzyme ...
(1991 and on),
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
(1997 and on), and other environments which have adopted and cite the (IBM) SPMD Computational Model. By the late 1980s, there were many distributed computers with proprietary message passing libraries. The first SPMD standard was PVM. The current de facto standard is
MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, a tomographic technique * Myocardial perfusion imaging, a medical procedure that illustrates heart function * Mannose phosphate isomerase, an enzyme ...
. The
Cray Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed ...
parallel directives were a direct predecessor of
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
.

References

External links

Parallel job management and message passing

SPMD
{{Parallel computing Parallel computing Flynn's taxonomy