HOME

TheInfoList



OR:

The LINPACK Benchmarks are a measure of a system's
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n''
system of linear equations In mathematics, a system of linear equations (or linear system) is a collection of one or more linear equations involving the same variables. For example, :\begin 3x+2y-z=1\\ 2x-2y+4z=-2\\ -x+\fracy-z=0 \end is a system of three equations in t ...
''Ax'' = ''b'', which is a common task in
engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more speciali ...
. The latest version of these benchmarks is used to build the
TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
list, ranking the world's most powerful supercomputers. The aim is to approximate how fast a computer will perform when solving real problems. It is a simplification, since no single computational task can reflect the overall performance of a computer system. Nevertheless, the LINPACK benchmark performance can provide a good correction over the peak performance provided by the manufacturer. The peak performance is the maximal theoretical performance a computer can achieve, calculated as the machine's frequency, in cycles per second, times the number of operations per cycle it can perform. The actual performance will always be lower than the peak performance. The performance of a computer is a complex issue that depends on many interconnected variables. The performance measured by the LINPACK benchmark consists of the number of 64-bit floating-point operations, generally additions and multiplications, a computer can perform per second, also known as
FLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate me ...
. However, a computer's performance when running actual applications is likely to be far behind the maximal performance it achieves running the appropriate LINPACK benchmark. The name of these benchmarks comes from the LINPACK package, a collection of algebra Fortran subroutines widely used in the 1980s, and initially tightly linked to the LINPACK benchmark. The LINPACK package has since been replaced by other libraries.


History

The LINPACK benchmark report appeared first in 1979 as an appendix to the LINPACK user's manual. LINPACK was designed to help users estimate the time required by their systems to solve a problem using the LINPACK package, by extrapolating the performance results obtained by 23 different computers solving a matrix problem of size 100. This matrix size was chosen due to memory and CPU limitations at that time: * 10,000 floating-point entries from -1 to 1 are randomly generated to fill in a general, dense matrix, * then,
LU decomposition In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a p ...
with partial pivoting is used for the timing. Over the years, additional versions with different problem sizes, like matrices of order 300 and 1000, and constraints were released, allowing new optimization opportunities as hardware architectures started to implement matrix-vector and matrix-matrix operations. Parallel processing was also introduced in the LINPACK Parallel benchmark in the late 1980s. In 1991, the LINPACK was modified for solving problems of arbitrary size, enabling high performance computers (HPC) to get near to their asymptotic performance. Two years later this benchmark was used for measuring the performance of the first
TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
list.


The benchmarks


LINPACK 100

LINPACK 100 is very similar to the original benchmark published in 1979 along with th
LINPACK users' manual
The solution is obtained by
Gaussian elimination In mathematics, Gaussian elimination, also known as row reduction, is an algorithm for solving systems of linear equations. It consists of a sequence of operations performed on the corresponding matrix of coefficients. This method can also be used ...
with partial pivoting, with 2/3n³ + 2n² floating-point operations where ''n'' is 100, the order of the dense matrix ''A'' that defines the problem. Its small size and the lack of software flexibility doesn't allow most modern computers to reach their performance limits. However, it can still be useful to predict performances in numerically intensive user written code using compiler optimization.


LINPACK 1000

LINPACK 1000 can provide a performance nearer to the machine's limit because in addition to offering a bigger problem size, a matrix of order 1000, changes in the algorithm are possible. The only constraints are that the relative accuracy can't be reduced and the number of operations will always be considered to be 2/3n³ + 2n², with n = 1000.


HPLinpack

The previous benchmarks are not suitable for testing parallel computers, and the so-called Linpack's Highly Parallel Computing benchmark, or HPLinpack benchmark, was introduced. In HPLinpack the size n of the problem can be made as large as it is needed to optimize the performance results of the machine. Once again, 2/3n³ + 2n² will be taken as the operation count, with independence of the algorithm used. Use of the
Strassen algorithm In linear algebra, the Strassen algorithm, named after Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication algorithm for large matrices, with a better asymptotic complexity, although ...
is not allowed because it distorts the real execution rate. The accuracy must be such that the following expression is satisfied: \leq O(1), where \epsilon is the machine's precision, and ''n'' is the size of the problem, \lVert \cdot \rVert is the matrix norm and O(1) corresponds to the
big-O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Lan ...
. For each computer system, the following quantities are reported: * Rmax: the performance in GFLOPS for the largest problem run on a machine. * Nmax: the size of the largest problem run on a machine. * N1/2: the size where half the Rmax execution rate is achieved. * Rpeak: the theoretical peak performance GFLOPS for the machine. These results are used to compile the
TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
list twice a year, with the world's most powerful computers. TOP500 measures these in
double-precision floating-point format Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. F ...
(FP64).


LINPACK benchmark implementations

The previous section describes the ground rules for the benchmarks. The actual
implementation Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy. Industry-specific definitions Computer science In computer science, an implementation is a real ...
of the program can diverge, with some examples being available in Fortran, C or
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
.


HPL

HPL is a portable implementation of HPLinpack that was written in C, originally as a guideline, but that is now widely used to provide data for the TOP500 list, though other technologies and packages can be used. HPL generates a linear system of equations of order n and solves it using LU decomposition with partial row pivoting. It requires installed implementations of
MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, an emerging non-invasive tomographic technique * Myocardial perfusion imaging, a nuclear medicine procedure that illustrates the function of the hear ...
and either
BLAS Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and mat ...
or VSIPL to run. Coarsely, the algorithm has the following characteristics: * cyclic data distribution in 2D blocks *
LU factorization In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix (see matrix decomposition). The product sometimes includes a ...
using the right-looking variant with various depths of look-ahead * recursive panel factorization * six different panel
broadcasting Broadcasting is the distribution of audio or video content to a dispersed audience via any electronic mass communications medium, but typically one using the electromagnetic spectrum (radio waves), in a one-to-many model. Broadcasting began wi ...
variants * bandwidth reducing swap-broadcast algorithm * backward substitution with look-ahead of depth 1


Criticism

The LINPACK benchmark is said to have succeeded because of the scalability of HPLinpack, the fact that it generates a single number, making the results easily comparable and the extensive historical data base it has associated. However, soon after its release, the LINPACK benchmark was criticized for providing performance levels "generally unobtainable by all but a very few programmers who tediously optimize their code for that machine and that machine alone", because it only tests the resolution of dense linear systems, which are not representative of all the operations usually performed in scientific computing. Jack Dongarra, the main driving force behind the LINPACK benchmarks, said that, while they only emphasize "peak" CPU speed and number of CPUs, not enough stress is given to local bandwidth and the network. Thom Dunning, Jr., director of the
National Center for Supercomputing Applications The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
, had this to say about the LINPACK benchmark: "The Linpack benchmark is one of those interesting phenomena -- almost anyone who knows about it will deride its utility. They understand its limitations but it has mindshare because it's the one number we've all bought into over the years." According to Dongarra, "the organizers of the Top500 are actively looking to expand the scope of the benchmark reporting" because "it is important to include more performance characteristic and signatures for a given system". One of the possibilities that is being considered to extend the benchmark for the TOP500 is the HPC Challenge Benchmark Suite. With the advent of
petascale computers Petascale computing refers to computing systems capable of calculating at least 1015 floating point operations per second (1 petaFLOPS). Petascale computing allowed faster processing of traditional supercomputer applications. The first system ...
, traversed edges per second have started to emerge as a complementary metric to FLOPS measured by LINPACK. Another such metric is the HPCG benchmark, proposed by Dongarra.


The running time issue

According to Jack Dongarra, the running time required to obtain good performance results with HPLinpack is expected to increase. At a conference held in 2010, he said he expects running times of 2.5 days in "a few years".


See also

* LAPACK


References

{{Reflist, 30em


External links


TOP500 LINPACK

a web-based LINPACK benchmark

Intel® Optimized LINPACK Benchmark
Benchmarks (computing) Supercomputer benchmarks