HPC Challenge Benchmark combines several

benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...

s to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the

DARPA The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Ad ...

High Productivity Computing Systems High Productivity Computing Systems (HPCS) is a DARPA project for developing a new generation of economically viable high productivity computing systems for national security and industry in the 2002–10 timeframe. The HPC Challenge (High-perf ...

program, the

United States Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and manages the research and development of nuclear power and nuclear weapons in the United Stat ...

and the

National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...

Context

The performance of complex applications on HPC systems can depend on a variety of independent performance attributes of the hardware. The HPC Challenge Benchmark is an effort to improve visibility into this multidimensional space by combining the measurement of several of these attributes into a single program. Although the performance attributes of interest are not specific to any particular computer architecture, the reference implementation of the HPC Challenge Benchmark in C and MPI assumes that the system under test is a

cluster may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study th ...

of shared memory multiprocessor systems connected by a

network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...

. Due to this assumption of a hierarchical system structure most of the tests are run in several different modes of operation. Following the notation used by the benchmark reports, results labeled "single" mean that the test was run on one randomly chosen processor in the system, results labeled "star" mean that an independent copy of the test was run concurrently on each processor in the system, and results labeled "global" mean that all the processors were working in coordination to solve a single problem (with data distributed across the nodes of the system).

Components

The benchmark currently consists of 7 tests (with the modes of operation indicated for each): # HPL (High Performance LINPACK) – measures performance of a solver for a dense system of linear equations (global). # DGEMM – measures performance for matrix-matrix multiplication (single, star). # STREAM – measures sustained

memory bandwidth Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are n ...

to/from memory (single, star). # PTRANS – measures the rate at which the system can

transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...

a large array (global). # RandomAccess – measures the rate of 64-bit updates to randomly selected elements of a large table (single, star, global). # FFT – performs a

Fast Fourier Transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in t ...

on a large one-dimensional vector using the generalized Cooley–Tukey algorithm (single, star, global). # Communication Bandwidth and Latency – MPI-centric performance measurements based on the b_eff bandwidth/latency benchmark.

Performance attributes

At a high level, the tests are intended to provide coverage of four important attributes of performance: double-precision floating-point arithmetic (DGEMM and HPL), local memory bandwidth (STREAM), network bandwidth for "large" messages (PTRANS, RandomAccess, FFT, b_eff), and network bandwidth for "small" messages (RandomAccess, b_eff). Some of the codes are more complex than others and can have additional performance sensitivities. For example, in some systems HPL performance can be limited by network bandwidth and/or network latency.

Competition

The annual HPC Challenge Award Competition at the

Supercomputing Conference SC (formerly Supercomputing), the International Conference for High Performance Computing, Networking, Storage and Analysis, is the annual conference established in 1988 by the Association for Computing Machinery and the IEEE Computer Society. In ...

focuses on four of the most challenging benchmarks in the suite: * Global HPL * Global RandomAccess (OR BSS Random Access Benchmark) * EP STREAM (Triad) per system * Global FFT There are two classes of awards: * Class 1: Best performance on a base or optimized run submitted to the HPC Challenge website. * Class 2: Most "elegant" implementation of four or five computational kernels including three or more of the HPC Challenge benchmarks.

References

External links