The Message Passing Interface (MPI) is a portable
message-passing
In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting ...
standard designed to function on
parallel computing
Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
architectures. The MPI standard defines the
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
and
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
of
library routine
In computing, a library is a collection of resources that can be leveraged during software development to implement a computer program. Commonly, a library consists of executable code such as compiled functions and classes, or a library can ...
s that are useful to a wide range of users writing
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide a ...
message-passing programs in
C,
C++
C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...
, and
Fortran. There are several
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
MPI
implementations, which fostered the development of a parallel
software industry
The software industry includes businesses for development, maintenance and publication of software that are using different business models, mainly either "license/maintenance based" (on-premises) or " Cloud based" (such as SaaS, PaaS, IaaS, ...
, and encouraged development of portable and scalable large-scale parallel applications.
History
The message passing interface effort began in the summer of 1991 when a small group of researchers started discussions at a mountain retreat in Austria. Out of that discussion came a Workshop on Standards for Message Passing in a Distributed Memory Environment, held on April 29–30, 1992 in
Williamsburg, Virginia
Williamsburg is an Independent city (United States), independent city in Virginia, United States. It had a population of 15,425 at the 2020 United States census, 2020 census. Located on the Virginia Peninsula, Williamsburg is in the northern par ...
. Attendees at Williamsburg discussed the basic features essential to a standard message-passing interface and established a working group to continue the standardization process.
Jack Dongarra
Jack Joseph Dongarra (born July 18, 1950) is an American computer scientist and mathematician. He is a University Distinguished Professor Emeritus of Computer Science in the Electrical Engineering and Computer Science Department at the Univers ...
,
Tony Hey
Anthony John Grenville Hey (born 17 August 1946) was vice-president of Microsoft Research Connections, a division of Microsoft Research, until his departure in 2014.
Education
Hey was educated at King Edward's School, Birmingham and the Univer ...
, and David W. Walker put forward a preliminary draft proposal, "MPI1", in November 1992. In November 1992 a meeting of the MPI working group took place in Minneapolis and decided to place the standardization process on a more formal footing. The MPI working group met every 6 weeks throughout the first 9 months of 1993. The draft MPI standard was presented at the Supercomputing '93 conference in November 1993. After a period of public comments, which resulted in some changes in MPI, version 1.0 of MPI was released in June 1994. These meetings and the email discussion together constituted the MPI Forum, membership of which has been open to all members of the
high-performance-computing community.
The MPI effort involved about 80 people from 40 organizations, mainly in the United States and Europe. Most of the major vendors of
concurrent computer
Concurrency refers to the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions. Concurrency improves responsiveness, throughput, and scalabi ...
s were involved in the MPI effort, collaborating with researchers from universities, government laboratories, and
industry
Industry may refer to:
Economics
* Industry (economics), a generally categorized branch of economic activity
* Industry (manufacturing), a specific branch of economic activity, typically in factories with machinery
* The wider industrial sector ...
.
MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard
low-level routines to create
higher-level routines for the distributed-memory communication environment supplied with their
parallel machines. MPI provides a simple-to-use portable interface for the basic user, yet one powerful enough to allow programmers to use the high-performance message passing operations available on advanced machines.
In an effort to create a universal standard for message passing, researchers did not base it off of a single system but it incorporated the most useful features of several systems, including those designed by IBM,
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
,
nCUBE NCUBE may refer to:
* Ncube (surname), a South African surname (includes a list of people with the name)
* NCUBE Corporation, was a parallel supercomputers maker, and later, provider of video on demand solutions, now a subsidiary of Arris Group vi ...
,
PVM, Express, P4 and PARMACS. The message-passing paradigm is attractive because of wide portability and can be used in communication for distributed-memory and shared-memory multiprocessors, networks of workstations, and a combination of these elements. The paradigm can apply in multiple settings, independent of network speed or memory architecture.
Support for MPI meetings came in part from
DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
and from the U.S.
National Science Foundation
The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...
(NSF) under grant ASC-9310330, NSF Science and Technology Center Cooperative agreement number CCR-8809615, and from the
European Commission
The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...
through Esprit Project P6643. The
University of Tennessee
The University of Tennessee, Knoxville (or The University of Tennessee; UT; UT Knoxville; or colloquially UTK or Tennessee) is a Public university, public Land-grant university, land-grant research university in Knoxville, Tennessee, United St ...
also made financial contributions to the MPI Forum.
Overview
MPI is a
communication protocol
A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any variation of a physical quantity. The protocol defines the rules, syntax, semantics (computer science), sem ...
for programming
parallel computers
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in
high-performance computing
High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
as of 2006.
MPI is not sanctioned by any major standards body; nevertheless, it has become a
''de facto'' standard for
communication
Communication is commonly defined as the transmission of information. Its precise definition is disputed and there are disagreements about whether Intention, unintentional or failed transmissions are included and whether communication not onl ...
among processes that model a
parallel program running on a
distributed memory
In computer science, distributed memory refers to a Multiprocessing, multiprocessor computer system in which each Central processing unit, processor has its own private Computer memory, memory. Computational tasks can only operate on local data ...
system. Actual distributed memory supercomputers such as computer clusters often run such programs.
The principal MPI-1 model has no
shared memory concept, and MPI-2 has only a limited
distributed shared memory
In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memo ...
concept. Nonetheless, MPI programs are regularly run on shared memory computers, and both
MPICH
MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain c ...
and
Open MPI
Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was th ...
can use shared memory for message transfer if it is available. Designing programs around the MPI model (contrary to explicit
shared memory models) has advantages when running on
NUMA
Numa or NUMA may refer to:
* Non-uniform memory access (NUMA), in computing
Places
* Numa Falls, a waterfall in Kootenay National Park, Canada
* 15854 Numa, a main-belt asteroid
United States
* Numa, Indiana
* Numa, Iowa
* Numa, Oklahoma
* ...
architectures since MPI encourages
memory locality
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
. Explicit shared memory programming was introduced in MPI-3.
Although MPI belongs in layers 5 and higher of the
OSI Reference Model, implementations may cover most layers, with
sockets and
Transmission Control Protocol
The Transmission Control Protocol (TCP) is one of the main communications protocol, protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, th ...
(TCP) used in the transport layer.
Most MPI implementations consist of a specific set of routines directly callable from
C,
C++
C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...
,
Fortran (i.e., an API) and any language able to interface with such libraries, including
C#,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
or
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (prog ...
. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).
MPI uses
Language Independent Specification
A language-independent specification (LIS) is a programming language specification providing a common interface usable for defining semantics applicable toward arbitrary language bindings.
LIS's are language-agnostic; they mitigate the risk th ...
s (LIS) for calls and language bindings. The first MPI standard specified
ANSI C
ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...
and Fortran-77 bindings together with the LIS. The draft was presented at Supercomputing 1994 (November 1994)
[Table of Contents — September 1994, 8 (3-4)](_blank)
Hpc.sagepub.com. Retrieved on 2014-03-24. and finalized soon thereafter. About 128 functions constitute the MPI-1.3 standard which was released as the final end of the MPI-1 series in 2008.
[MPI Documents](_blank)
Mpi-forum.org. Retrieved on 2014-03-24.
At present, the standard has several versions: version 1.3 (commonly abbreviated ''MPI-1''), which emphasizes message passing and has a static runtime environment, MPI-2.2 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations,
and MPI-3.1 (MPI-3), which includes extensions to the collective operations with non-blocking versions and extensions to the one-sided operations.
[MPI: A Message-Passing Interface Standard](_blank)
Version 3.1, Message Passing Interface Forum, June 4, 2015
http://www.mpi-forum.org. Retrieved on 2015-06-16.
MPI-2's LIS specifies over 500 functions and provides language bindings for ISO
C, ISO
C++
C++ (, pronounced "C plus plus" and sometimes abbreviated as CPP or CXX) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup. First released in 1985 as an extension of the C programmin ...
, and
Fortran 90. Object interoperability was also added to allow easier mixed-language message passing programming. A side-effect of standardizing MPI-2, completed in 1996, was clarifying the MPI-1 standard, creating the MPI-1.2.
''MPI-2'' is mostly a superset of MPI-1, although some functions have been deprecated. MPI-1.3 programs still work under MPI implementations compliant with the MPI-2 standard.
''MPI-3.0'' introduces significant updates to the MPI standard, including nonblocking versions of collective operations, enhancements to one-sided operations, and a Fortran 2008 binding. It removes deprecated C++ bindings and various obsolete routines and objects. Importantly, any valid MPI-2.2 program that avoids the removed elements is also valid in MPI-3.0.
''MPI-3.1'' is a minor update focused on corrections and clarifications, particularly for Fortran bindings. It introduces new functions for manipulating MPI_Aint values, nonblocking collective I/O routines, and methods for retrieving index values by name for MPI_T performance variables. Additionally, a general index was added. All valid MPI-3.0 programs are also valid in MPI-3.1.
''MPI-4.0'' is a major update that introduces large-count versions of many routines, persistent collective operations, partitioned communications, and a new MPI initialization method. It also adds application info assertions and improves error handling definitions, along with various smaller enhancements. Any valid MPI-3.1 program is compatible with MPI-4.0.
MPI-4.1 is a minor update focused on corrections and clarifications to the MPI-4.0 standard. It deprecates several routines, the MPI_HOST attribute key, and the mpif.h Fortran include file. A new routine has been added to inquire about the hardware running the MPI program. Any valid MPI-4.0 program remains valid in MPI-4.1.
MPI is often compared with
Parallel Virtual Machine
Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computa ...
(PVM), which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing. Threaded shared memory programming models (such as
Pthreads
In computing, POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a programming language, as well as a parallel execution model. It allows a program to control multiple different flows of work that ov ...
and
OpenMP
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
) and message passing programming (MPI/PVM) can be considered complementary and have been used together on occasion in, for example, servers with multiple large shared-memory nodes.
Functionality
The MPI interface is meant to provide essential virtual topology,
synchronization
Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...
, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few language-specific features. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each
CPU
A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, log ...
(or
core
Core or cores may refer to:
Science and technology
* Core (anatomy), everything except the appendages
* Core (laboratory), a highly specialized shared research resource
* Core (manufacturing), used in casting and molding
* Core (optical fiber ...
in a multi-core machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.
MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a
Cartesian or
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discret ...
-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gather and reduce operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in
synchronous
Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchrono ...
,
asynchronous
Asynchrony is any dynamic far from synchronization. If and as parts of an asynchronous system become more synchronized, those parts or even the whole system can be said to be in sync.
Asynchrony or asynchronous may refer to:
Electronics and com ...
, buffered, and ''ready'' forms, to allow both relatively stronger and weaker
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
for the synchronization aspects of a rendezvous-send. Many pending operations are possible in asynchronous mode, in most implementations.
MPI-1 and MPI-2 both enable implementations that overlap communication and computation, but practice and theory differ. MPI also specifies ''
thread safe
In multi-threaded computer programming, a function is thread-safe when it can be invoked or accessed concurrently by multiple threads without causing unexpected behavior, race conditions, or data corruption. As in the multi-threaded context where ...
'' interfaces, which have
cohesion and
coupling
A coupling is a device used to connect two shafts together at their ends for the purpose of transmitting power. The primary purpose of couplings is to join two pieces of rotating equipment while permitting some degree of misalignment or end mo ...
strategies that help avoid hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code.
Multithreaded collective communication is best accomplished with multiple copies of Communicators, as described below.
Concepts
MPI provides several features. The following concepts provide context for all of those abilities and help the programmer to decide what functionality to use in their application programs. Four of MPI's eight basic concepts are unique to MPI-2.
Communicator
Communicator objects connect groups of processes in the MPI session. Each communicator gives each contained process an independent identifier and arranges its contained processes in an ordered
topology
Topology (from the Greek language, Greek words , and ) is the branch of mathematics concerned with the properties of a Mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformat ...
. MPI also has explicit groups, but these are mainly good for organizing and reorganizing groups of processes before another communicator is made. MPI understands single group intracommunicator operations, and bilateral intercommunicator communication. In MPI-1, single group operations are most prevalent.
Bilateral
Bilateral may refer to any concept including two sides, in particular:
*Bilateria, bilateral animals
*Bilateralism, the political and cultural relations between two states
*Bilateral, occurring on both sides of an organism ( Anatomical terms of l ...
operations mostly appear in MPI-2 where they include collective communication and dynamic in-process management.
Communicators can be partitioned using several MPI commands. These commands include
MPI_COMM_SPLIT
, where each process joins one of several colored sub-communicators by declaring itself to have that color.
Point-to-point basics
A number of important MPI functions involve communication between two specific processes. A popular example is
MPI_Send
, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in patterned or irregular communication, for example, a
data-parallel architecture in which each processor routinely swaps regions of data with specific other processors between calculation steps, or a
master–slave architecture in which the master sends new task data to a slave whenever the prior task is completed.
MPI-1 specifies mechanisms for both
blocking and non-blocking point-to-point communication mechanisms, as well as the so-called 'ready-send' mechanism whereby a send request can be made only when the matching receive request has already been made.
Collective basics
Collective functions involve communication among all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the
MPI_Bcast
call (short for "
broadcast
Broadcasting is the data distribution, distribution of sound, audio audiovisual content to dispersed audiences via a electronic medium (communication), mass communications medium, typically one using the electromagnetic spectrum (radio waves), ...
"). This function takes data from one node and sends it to all processes in the process group. A reverse operation is the
MPI_Reduce
call, which takes data from all processes in a group, performs an operation (such as summing), and stores the results on one node.
MPI_Reduce
is often useful at the start or end of a large distributed calculation, where each processor operates on a part of the data and then combines it into a result.
Other operations perform more sophisticated tasks, such as
MPI_Alltoall
which rearranges ''n'' items of data such that the ''n''th node gets the ''n''th item of data from each.
Derived data types
Many MPI functions require specifing the type of data which is sent between processes. This is because MPI aims to support heterogeneous environments where types might be represented differently on the different nodes
(for example they might be running different CPU architectures that have different
endianness
file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined
In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...
), in which case MPI implementations can perform ''data conversion''.
Since the C language does not allow a type itself to be passed as a parameter, MPI predefines the constants
MPI_INT
,
MPI_CHAR
,
MPI_DOUBLE
to correspond with
int
,
char
,
double
, etc.
Here is an example in C that passes arrays of
int
s from all processes to one. The one receiving process is called the "root" process, and it can be any designated process but normally it will be process 0. All the processes ask to send their arrays to the root with
MPI_Gather
, which is equivalent to having each process (including the root itself) call
MPI_Send
and the root make the corresponding number of ordered
MPI_Recv
calls to assemble all of these arrays into a larger one:
int send_array 00
int root = 0; /* or whatever */
int num_procs, *recv_array;
MPI_Comm_size(comm, &num_procs);
recv_array = malloc(num_procs * sizeof(send_array));
MPI_Gather(send_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
recv_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
root, comm);
However, it may be instead desirable to send data as one block as opposed to 100
int
s. To do this define a "contiguous block" derived data type:
MPI_Datatype newtype;
MPI_Type_contiguous(100, MPI_INT, &newtype);
MPI_Type_commit(&newtype);
MPI_Gather(array, 1, newtype, receive_array, 1, newtype, root, comm);
For passing a class or a data structure,
MPI_Type_create_struct
creates an MPI derived data type from
MPI_predefined
data types, as follows:
int MPI_Type_create_struct(int count,
int *blocklen,
MPI_Aint *disp,
MPI_Datatype *type,
MPI_Datatype *newtype)
where:
*
count
is a number of blocks, and specifies the length (in elements) of the arrays
blocklen
,
disp
, and
type
.
*
blocklen
contains numbers of elements in each block,
*
disp
contains byte displacements of each block,
*
type
contains types of element in each block.
*
newtype
(an output) contains the new derived type created by this function
The
disp
(displacements) array is needed for
data structure alignment
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing.
The CPU in modern computer hardware performs reads ...
, since the compiler may pad the variables in a class or data structure. The safest way to find the distance between different fields is by obtaining their addresses in memory. This is done with
MPI_Get_address
, which is normally the same as C's
&
operator but that might not be true when dealing with
memory segmentation
Memory segmentation is an operating system memory management technique of dividing a computer's primary memory into segments or sections. In a computer system using segmentation, a reference to a memory location includes a value that identifies ...
.
Passing a data structure as one block is significantly faster than passing one item at a time, especially if the operation is to be repeated. This is because fixed-size blocks do not require
serialization
In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
during transfer.
Given the following data structures:
struct A ;
struct B ;
Here's the C code for building an MPI-derived data type:
static const int blocklen[] = ;
static const MPI_Aint disp[] = ;
static MPI_Datatype type[] = ;
MPI_Datatype newtype;
MPI_Type_create_struct(sizeof(type) / sizeof(*type), blocklen, disp, type, &newtype);
MPI_Type_commit(&newtype);
MPI-2 concepts
One-sided communication
MPI-2 defines three one-sided communications operations,
MPI_Put
,
MPI_Get
, and
MPI_Accumulate
, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks, respectively. Also defined are three different methods to synchronize this communication (global, pairwise, and remote locks) as the specification does not guarantee that these operations have taken place until a synchronization point.
These types of call can often be useful for algorithms in which synchronization would be inconvenient (e.g. distributed
matrix multiplication
In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix (mathematics), matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the n ...
), or where it is desirable for tasks to be able to balance their load while other processors are operating on data.
Dynamic process management
The key aspect is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." The MPI-2 specification describes three main interfaces by which MPI processes can dynamically establish communications,
MPI_Comm_spawn
,
MPI_Comm_accept
/
MPI_Comm_connect
and
MPI_Comm_join
. The
MPI_Comm_spawn
interface allows an MPI process to spawn a number of instances of the named MPI process. The newly spawned set of MPI processes form a new
MPI_COMM_WORLD
intracommunicator but can communicate with the parent and the intercommunicator the function returns.
MPI_Comm_spawn_multiple
is an alternate interface that allows the different instances spawned to be different binaries with different arguments.
I/O
The parallel I/O feature is sometimes called MPI-IO,
and refers to a set of functions designed to abstract I/O management on distributed systems to MPI, and allow files to be easily accessed in a patterned way using the existing derived datatype functionality.
The little research that has been done on this feature indicates that it may not be trivial to get high performance gains by using MPI-IO. For example, an implementation of sparse
matrix-vector multiplications using the MPI I/O library shows a general behavior of minor performance gain, but these results are inconclusive. It was not
until the idea of collective I/O implemented into MPI-IO that MPI-IO started to reach widespread adoption. Collective I/O substantially boosts applications' I/O bandwidth by having processes collectively transform the small and noncontiguous I/O operations into large and contiguous ones, thereby reducing the
locking and disk seek overhead. Due to its vast performance benefits, MPI-IO also became the underlying I/O layer for many state-of-the-art I/O libraries, such as
HDF5
Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S. National Center for Supercomputing Applications, it is supported by The HDF Group, a non- ...
and
Parallel NetCDF. Its popularity also triggered research on collective I/O optimizations, such as layout-aware I/O and cross-file aggregation.
Official implementations
* The initial implementation of the MPI 1.x standard was
MPICH
MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain c ...
, from
Argonne National Laboratory
Argonne National Laboratory is a Federally funded research and development centers, federally funded research and development center in Lemont, Illinois, Lemont, Illinois, United States. Founded in 1946, the laboratory is owned by the United Sta ...
(ANL) and
Mississippi State University
Mississippi State University for Agriculture and Applied Science, commonly known as Mississippi State University (MSU), is a Public university, public land-grant university, land-grant research university in Mississippi State, Mississippi, Un ...
.
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
also was an early implementor, and most early 90s
supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
companies either commercialized MPICH, or built their own implementation.
LAM/MPI from
Ohio Supercomputer Center
The Ohio Supercomputer Center (OSC) is a supercomputer facility located on the western end of the Ohio State University campus, just north of Columbus. Established in 1987, the OSC partners with Ohio universities, labs and industries, providing s ...
was another early open implementation. ANL has continued developing MPICH for over a decade, and now offers MPICH-4.3.0, implementing the MPI-4.1 standard.
*
Open MPI
Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was th ...
(not to be confused with
OpenMP
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
) was formed by the merging FT-MPI, LA-MPI,
LAM/MPI, and PACX-MPI, and is found in many
TOP-500 supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
s.
Many other efforts are derivatives of MPICH, LAM, and other works, including, but not limited to, commercial implementations from
HPE,
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
,
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, and
NEC
is a Japanese multinational information technology and electronics corporation, headquartered at the NEC Supertower in Minato, Tokyo, Japan. It provides IT and network solutions, including cloud computing, artificial intelligence (AI), Inte ...
.
While the specifications mandate a C and Fortran interface, the language used to implement MPI is not constrained to match the language or languages it seeks to support at runtime. Most implementations combine C, C++ and assembly language, and target C, C++, and Fortran programmers. Bindings are available for many other languages, including Perl, Python, R, Ruby, Java, and
CL (see
#Language bindings).
The
ABI of MPI implementations are roughly split between
MPICH
MPICH, formerly known as MPICH2, is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing. MPICH is Free and open source software with some public domain c ...
and
Open MPI
Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was th ...
derivatives, so that a library from one family works as a drop-in replacement of one from the same family, but direct replacement across families is impossible. The French
CEA maintains a wrapper interface to facilitate such switches.
Hardware
MPI hardware research focuses on implementing MPI directly in hardware, for example via
processor-in-memory
Computational RAM (C-RAM) is random-access memory with processing elements integrated on the same chip. This enables C-RAM to be used as a SIMD computer. It also can be used to more efficiently use memory bandwidth within a memory chip. The gene ...
, building MPI operations into the microcircuitry of the
RAM
Ram, ram, or RAM most commonly refers to:
* A male sheep
* Random-access memory, computer memory
* Ram Trucks, US, since 2009
** List of vehicles named Dodge Ram, trucks and vans
** Ram Pickup, produced by Ram Trucks
Ram, ram, or RAM may also ref ...
chips in each node. By implication, this approach is independent of language, operating system, and CPU, but cannot be readily updated or removed.
Another approach has been to add hardware acceleration to one or more parts of the operation, including hardware processing of MPI queues and using
RDMA to directly transfer data between memory and the
network interface controller
A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter and physical network interface) is a computer hardware component that connects a computer to a computer network.
Early network interface ...
without CPU or OS kernel intervention.
Compiler wrappers
mpicc (and similarly mpic++, mpif90, etc.) is a program that wraps over an existing compiler to set the necessary command-line flags when compiling code that uses MPI. Typically, it adds a few flags that enable the code to be the compiled and linked against the MPI library.
Language bindings
Bindings are libraries that extend MPI support to other languages by wrapping an existing MPI implementation such as MPICH or Open MPI.
Common Language Infrastructure
The two managed
Common Language Infrastructure
The Common Language Infrastructure (CLI) is an open specification and technical standard originally developed by Microsoft and standardized by International Organization for Standardization, ISO/International Electrotechnical Commission, IEC (ISO/ ...
.NET
The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
implementations are Pure Mpi.NET and MPI.NET, a research effort at
Indiana University
Indiana University (IU) is a state university system, system of Public university, public universities in the U.S. state of Indiana. The system has two core campuses, five regional campuses, and two regional centers under the administration o ...
licensed under a
BSD
The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
-style license. It is compatible with
Mono
Mono may refer to:
Biology
* Infectious mononucleosis, "the kissing disease"
* Monocyte, a type of leukocyte (white blood cell)
* Monodactylidae, members of which are referred to as monos
Technology and computing
* Mono (audio), single-c ...
, and can make full use of underlying low-latency MPI network fabrics.
Java
Although
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
does not have an official MPI binding, several groups attempt to bridge the two, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava, essentially a set of
Java Native Interface
The Java Native Interface (JNI) is a foreign function interface programming framework that enables Java code running in a Java virtual machine (JVM) to call and be called by
native applications (programs specific to a hardware and operating s ...
(JNI) wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be compiled against the specific MPI library being used.
However, this original project also defined the mpiJava API (a
de facto MPI
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
for Java that closely followed the equivalent C++ bindings) which other subsequent Java MPI projects adopted. One less-used API is MPJ API, which was designed to be more
object-oriented
Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impleme ...
and closer to
Sun Microsystems
Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
' coding conventions. Beyond the API, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like
P2P-MPI
P, or p, is the sixteenth letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''pee'' (pronounced ), plural ''pees''.
History
T ...
also provide
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...
functionality and allow mixed-platform operation.
Some of the most challenging parts of Java/MPI arise from Java characteristics such as the lack of explicit
pointers
Pointer may refer to:
People with the name
* Pointer (surname), a surname (including a list of people with the name)
* Pointer Williams (born 1974), American former basketball player
Arts, entertainment, and media
* ''Pointer'' (journal), the ...
and the
linear memory address space for its objects, which make transferring multidimensional arrays and complex objects inefficient. Workarounds usually involve transferring one line at a time and/or performing explicit de-
serialization
In computing, serialization (or serialisation, also referred to as pickling in Python (programming language), Python) is the process of translating a data structure or object (computer science), object state into a format that can be stored (e. ...
and
casting
Casting is a manufacturing process in which a liquid material is usually poured into a mold, which contains a hollow cavity of the desired shape, and then allowed to solidify. The solidified part is also known as a casting, which is ejected or ...
at both the sending and receiving ends, simulating C or Fortran-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite far from Java conventions.
Another Java message passing system is MPJ Express. Recent versions can be executed in cluster and multicore configurations. In the cluster configuration, it can execute parallel Java applications on clusters and clouds. Here Java sockets or specialized I/O interconnects like
Myrinet
Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by the company Myricom to be used as an interconnect between multiple machines to form computer clusters.
Description
Myrinet was promoted as having lower protocol ...
can support messaging between MPJ Express processes. It can also utilize native C implementation of MPI using its native device. In the multicore configuration, a parallel Java application is executed on multicore processors. In this mode, MPJ Express processes are represented by Java threads.
Julia
There is a
Julia
Julia may refer to:
People
*Julia (given name), including a list of people with the name
*Julia (surname), including a list of people with the name
*Julia gens, a patrician family of Ancient Rome
*Julia (clairvoyant) (fl. 1689), lady's maid of Qu ...
language wrapper for MPI.
MATLAB
There are a few academic implementations of MPI using
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
. MATLAB has its own parallel extension library implemented using MPI and
PVM.
OCaml
The OCamlMPI Module implements a large subset of MPI functions and is in active use in scientific computing. An 11,000-line
OCaml
OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...
program was "MPI-ified" using the module, with an additional 500 lines of code and slight restructuring and ran with excellent results on up to 170 nodes in a supercomputer.
PARI/GP
PARI/GP
PARI/GP is a computer algebra system with the main aim of facilitating number theory computations. Versions 2.1.0 and higher are distributed under the GNU General Public License. It runs on most common operating systems.
System overview
The P ...
can be built to use MPI as its multi-thread engine, allowing to run parallel PARI and GP programs on MPI clusters unmodified.
Python
Actively maintained MPI wrappers for
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (prog ...
include: mpi4py, numba-mpi and numba-jax.
Discontinued developments include: pyMPI, pypar, MYMPI and the MPI submodule in
ScientificPython
SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier ...
.
R
R bindings of MPI include
Rmpi and
pbdMPI, where Rmpi focuses on
manager-workers parallelism while pbdMPI focuses on
SPMD
In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism whereby multiple processors cooperate in the execution of a program in order to obtain results faster. ...
parallelism. Both implementations fully support
Open MPI
Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was th ...
or
MPICH2.
Example program
Here is a
"Hello, World!" program
A "Hello, World!" program is usually a simple computer program that emits (or displays) to the screen (often the Console application, console) a message similar to "Hello, World!". A small piece of code in most general-purpose programming languag ...
in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, return the results to the main process, and print the messages.
/*
"Hello World" MPI Test Program
*/
#include
#include
#include
#include
int main(int argc, char **argv)
When run with 4 processes, it should produce the following output:
$ mpicc example.c && mpiexec -n 4 ./a.out
We have 4 processes.
Process 1 reporting for duty.
Process 2 reporting for duty.
Process 3 reporting for duty.
Here,
mpiexec
is a command used to execute the example program with 4
processes, each of which is an independent instance of the program at run time and assigned ranks (i.e. numeric IDs) 0, 1, 2, and 3. The name
mpiexec
is recommended by the MPI standard, although some implementations provide a similar command under the name
mpirun
. The
MPI_COMM_WORLD
is the communicator that consists of all the processes.
A single program, multiple data (
SPMD
In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism whereby multiple processors cooperate in the execution of a program in order to obtain results faster. ...
) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job. Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with
MPI_COMM_WORLD
, its own rank, and the size of the world to allow algorithms to decide what to do. In more realistic situations, I/O is more carefully managed than in this example. MPI does not stipulate how standard I/O (stdin, stdout, stderr) should work on a given system. It generally works as expected on the rank-0 process, and some implementations also capture and funnel the output from other processes.
MPI uses the notion of process rather than processor. Program copies are ''mapped'' to processors by the MPI
runtime. In that sense, the parallel machine can map to one physical processor, or to ''N'' processors, where ''N'' is the number of available processors, or even something in between. For maximum parallel speedup, more physical processors are used. This example adjusts its behavior to the size of the world ''N'', so it also seeks to scale to the runtime configuration without compilation for each size variation, although runtime decisions might vary depending on that absolute amount of concurrency available.
MPI-2 adoption
Adoption of MPI-1.2 has been universal, particularly in cluster computing, but acceptance of MPI-2.1 has been more limited. Issues include:
# MPI-2 implementations include I/O and dynamic process management, and the size of the middleware is substantially larger. Most sites that use batch scheduling systems cannot support dynamic process management. MPI-2's parallel I/O is well accepted.
# Many MPI-1.2 programs were developed before MPI-2. Portability concerns initially slowed adoption, although wider support has lessened this.
# Many MPI-1.2 applications use only a subset of that standard (16–25 functions) with no real need for MPI-2 functionality.
Future
Some aspects of the MPI's future appear solid; others less so. The MPI Forum reconvened in 2007 to clarify some MPI-2 issues and explore developments for a possible MPI-3, which resulted in versions MPI-3.0 (September 2012) and MPI-3.1 (June 2015). The development continued with the approval of MPI-4.0 on June 9, 2021, and most recently, MPI-4.1 was approved on November 2, 2023.
Architectures are changing, with greater internal concurrency (
multi-core
A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...
), better fine-grained concurrency control (threading, affinity), and more levels of
memory hierarchy
In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...
.
Multithreaded programs can take advantage of these developments more easily than single-threaded applications. This has already yielded separate, complementary standards for
symmetric multiprocessing
Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
, namely
OpenMP
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
. MPI-2 defines how standard-conforming implementations should deal with multithreaded issues, but does not require that implementations be multithreaded, or even thread-safe. MPI-3 adds the ability to use shared-memory parallelism within a node. Implementations of MPI such as Adaptive MPI, Hybrid MPI, Fine-Grained MPI, MPC and others offer extensions to the MPI standard that address different challenges in MPI.
Astrophysicist Jonathan Dursi wrote an opinion piece calling MPI obsolescent, pointing to newer technologies like the
Chapel
A chapel (from , a diminutive of ''cappa'', meaning "little cape") is a Christianity, Christian place of prayer and worship that is usually relatively small. The term has several meanings. First, smaller spaces inside a church that have their o ...
language,
Unified Parallel C
Unified Parallel C (UPC) is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space ( SMP and NUMA) and those with distributed me ...
,
Hadoop
Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
,
Spark
Spark commonly refers to:
* Spark (fire), a small glowing particle or ember
* Electric spark, a form of electrical discharge
Spark may also refer to:
People
* Spark (surname)
* Jessica Morgan (born 1992; formerly known as Spark), female singe ...
and
Flink.
At the same time, nearly all of the projects in the
Exascale Computing Project build explicitly on MPI; MPI has been shown to scale to the largest machines as of the early 2020s and is widely considered to stay relevant for a long time to come.
See also
*
Actor model
The actor model in computer science is a mathematical model of concurrent computation that treats an ''actor'' as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create ...
*
Bulk synchronous parallel
The bulk synchronous parallel (BSP) abstract computer is a bridging model for designing parallel algorithms. It is similar to the parallel random access machine (PRAM) model, but unlike PRAM, BSP does not take communication and synchronization ...
*
Caltech Cosmic Cube
The Caltech Cosmic Cube was a parallel computer, developed by Charles Seitz and Geoffrey C Fox from 1981 onward. It was the first working hypercube built.
It was an early attempt to capitalise on VLSI to speed up scientific calculations at a rea ...
*
Charm++
Charm++ is a parallel object-oriented programming paradigm based on C++ and developed in the Parallel Programming Laboratory at the University of Illinois at Urbana–Champaign. Charm++ is designed with the goal of enhancing programmer productivi ...
*
Co-array Fortran
Coarray Fortran (CAF), formerly known as F--, started as an extension of Fortran 95/2003 for parallel processing created by Robert Numrich and John Reid in the 1990s. The Fortran 2008 standard (ISO/IEC 1539-1:2010) now includes coarrays (spel ...
*
Global Arrays
Global Arrays, or GA, is the library developed by scientists at Pacific Northwest National Laboratory for parallel computing. GA provides a friendly API for shared-memory programming on distributed-memory computers for multidimensional arrays. Th ...
*
Microsoft Messaging Passing Interface
*
MVAPICH
*
OpenHMPP
OpenHMPP (HMPP for Hybrid Multicore Parallel Programming) - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity a ...
*
Parallel Virtual Machine
Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computa ...
(PVM)
*
Partitioned global address space
In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...
*
Unified Parallel C
Unified Parallel C (UPC) is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space ( SMP and NUMA) and those with distributed me ...
*
X10 (programming language)
X10 is a programming language being developed by IBM at the Thomas J. Watson Research Center as part of the Productive, Easy-to-use, Reliable Computing System ( PERCS) project funded by DARPA's High Productivity Computing Systems (HPCS) program. ...
References
Further reading
*
* Aoyama, Yukiya; Nakano, Jun (1999)
RS/6000 SP: Practical MPI Programming', ITSO
* Foster, Ian (1995) ''Designing and Building Parallel Programs (Online)'' Addison-Wesley , chapter 8
'
* Wijesuriya, Viraj Brian (2010-12-29
* ''Using MPI'' series:
**
**
**
**
*
* Pacheco, Peter S. (1997)
Parallel Programming with MPI500 pp. Morgan Kaufmann .
* ''MPI—The Complete Reference'' series:
** Snir, Marc; Otto, Steve W.; Huss-Lederman, Steven; Walker, David W.; Dongarra, Jack J. (1995)
MPI: The Complete Reference'. MIT Press Cambridge, MA, USA.
** Snir, Marc; Otto, Steve W.; Huss-Lederman, Steven; Walker, David W.; Dongarra, Jack J. (1998) ''MPI—The Complete Reference: Volume 1, The MPI Core''. MIT Press, Cambridge, MA.
** Gropp, William; Huss-Lederman, Steven; Lumsdaine, Andrew; Lusk, Ewing; Nitzberg, Bill; Saphir, William; and Snir, Marc (1998)
MPI—The Complete Reference: Volume 2, The MPI-2 Extensions'. MIT Press, Cambridge, MA
* Firuziaan, Mohammad; Nommensen, O. (2002) ''Parallel Processing via MPI & OpenMP'', Linux Enterprise, 10/2002
* Vanneschi, Marco (1999) ''Parallel paradigms for scientific computing'' In Proceedings of the European School on Computational Chemistry (1999, Perugia, Italy), number 75 in
Lecture Notes in Chemistry', pages 170–183. Springer, 2000
* Bala, Bruck, Cypher, Elustondo, A Ho, CT Ho, Kipnis, Snir (1995) �
A portable and tunable collective communication library for scalable parallel computers in IEEE Transactions on Parallel and Distributed Systems,″ vol. 6, no. 2, pp. 154–164, Feb 1995.
External links
*
Official MPI-3.1 standard
Tutorial on MPI: The Message-Passing InterfaceA User's Guide to MPITutorial: Introduction to MPI (self-paced, includes self-tests and exercises){{Parallel computing, state=collapsed
Application programming interfaces
Parallel computing
Articles with example C code