Benchmark (computing)
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, a benchmark is the act of running a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term ''benchmark'' is also commonly utilized for the purposes of elaborately designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the
floating point operation In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
performance of a CPU, but there are circumstances when the technique is also applicable to
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
. Software benchmarks are, for example, run against
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s or database management systems (DBMS). Benchmarks provide a method of comparing the performance of various subsystems across different chip/system
architectures Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings o ...
.


Purpose

As
computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...
advanced, it became more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, tests were developed that allowed comparison of different architectures. For example,
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 200 ...
processors generally operated at a higher clock frequency than
Athlon XP Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and the fi ...
or PowerPC processors, which did not necessarily translate to more computational power; a processor with a slower clock frequency might perform as well as or even better than a processor operating at a higher frequency. See BogoMips and the
megahertz myth The megahertz myth, or in more recent cases the gigahertz myth, refers to the misconception of only using clock rate (for example measured in megahertz or gigahertz) to compare the performance of different microprocessors. While clock rates are ...
. Benchmarks are designed to mimic a particular type of workload on a component or system. Synthetic benchmarks do this by specially created programs that impose the workload on the component. Application benchmarks run real-world programs on the system. While application benchmarks usually give a much better measure of real-world performance on a given system, synthetic benchmarks are useful for testing individual components, like a hard disk or networking device. Benchmarks are particularly important in CPU design, giving processor architects the ability to measure and make tradeoffs in microarchitectural decisions. For example, if a benchmark extracts the key
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
of an application, it will contain the performance-sensitive aspects of that application. Running this much smaller snippet on a cycle-accurate simulator can give clues on how to improve performance. Prior to 2000, computer and microprocessor architects used
SPEC Spec may refer to: *Specification (technical standard), an explicit set of requirements to be satisfied by a material, product, or service **datasheet, or "spec sheet" People * Spec Harkness (1887-1952), American professional baseball pitcher ...
to do this, although SPEC's Unix-based benchmarks were quite lengthy and thus unwieldy to use intact. Computer manufacturers are known to configure their systems to give unrealistically high performance on benchmark tests that are not replicated in real usage. For instance, during the 1980s some compilers could detect a specific mathematical operation used in a well-known floating-point benchmark and replace the operation with a faster mathematically equivalent operation. However, such a transformation was rarely useful outside the benchmark until the mid-1990s, when RISC and
VLIW Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
architectures emphasized the importance of
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
technology as it related to performance. Benchmarks are now regularly used by
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
companies to improve not only their own benchmark scores, but real application performance. CPUs that have many execution units — such as a superscalar CPU, a
VLIW Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
CPU, or a
reconfigurable computing Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). Th ...
CPU — typically have slower clock rates than a sequential CPU with one or two execution units when built from transistors that are just as fast. Nevertheless, CPUs with many execution units often complete real-world and benchmark tasks in less time than the supposedly faster high-clock-rate CPU. Given the large number of benchmarks available, a manufacturer can usually find at least one benchmark that shows its system will outperform another system; the other systems can be shown to excel with a different benchmark. Manufacturers commonly report only those benchmarks (or aspects of benchmarks) that show their products in the best light. They also have been known to mis-represent the significance of benchmarks, again to show their products in the best possible light. Taken together, these practices are called ''bench-marketing.'' Ideally benchmarks should only substitute for real applications if the application is unavailable, or too difficult or costly to port to a specific processor or computer system. If performance is critical, the only benchmark that matters is the target environment's application suite.


Functionality

Features of benchmarking software may include recording/ exporting the course of performance to a
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in c ...
file, visualization such as drawing
line graph In the mathematical discipline of graph theory, the line graph of an undirected graph is another graph that represents the adjacencies between edges of . is constructed in the following way: for each edge in , make a vertex in ; for every ...
s or
color-coded A color code is a system for displaying information by using different colors. The earliest examples of color codes in use are for long-distance communication by use of flags, as in semaphore communication. The United Kingdom adopted a color c ...
tiles, and pausing the process to be able to resume without having to start over. Software can have additional features specific to its purpose, for example, disk benchmarking software may be able to optionally start measuring the disk speed within a specified range of the disk rather than the full disk, measure random access reading speed and latency, have a "quick scan" feature which measures the speed through samples of specified intervals and sizes, and allow specifying a data block size, meaning the number of requested bytes per read request.


Challenges

Benchmarking is not easy and often involves several iterative rounds in order to arrive at predictable, useful conclusions. Interpretation of benchmarking data is also extraordinarily difficult. Here is a partial list of common challenges: * Vendors tend to tune their products specifically for industry-standard benchmarks. Norton SysInfo (SI) is particularly easy to tune for, since it mainly biased toward the speed of multiple operations. Use extreme caution in interpreting such results. * Some vendors have been accused of "cheating" at benchmarks — doing things that give much higher benchmark numbers, but make things worse on the actual likely workload. * Many benchmarks focus entirely on the speed of computational performance, neglecting other important features of a computer system, such as: ** Qualities of service, aside from raw performance. Examples of unmeasured qualities of service include security, availability, reliability, execution integrity, serviceability, scalability (especially the ability to quickly and nondisruptively add or reallocate capacity), etc. There are often real trade-offs between and among these qualities of service, and all are important in business computing.
Transaction Processing Performance Council In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of w ...
Benchmark specifications partially address these concerns by specifying ACID property tests, database scalability rules, and service level requirements. ** In general, benchmarks do not measure
Total cost of ownership Total cost of ownership (TCO) is a financial estimate intended to help buyers and owners determine the direct and indirect costs of a product or service. It is a management accounting concept that can be used in full cost accounting or even ecolog ...
. Transaction Processing Performance Council Benchmark specifications partially address this concern by specifying that a price/performance metric must be reported in addition to a raw performance metric, using a simplified TCO formula. However, the costs are necessarily only partial, and vendors have been known to price specifically (and only) for the benchmark, designing a highly specific "benchmark special" configuration with an artificially low price. Even a tiny deviation from the benchmark package results in a much higher price in real world experience. ** Facilities burden (space, power, and cooling). When more power is used, a portable system will have a shorter battery life and require recharging more often. A server that consumes more power and/or space may not be able to fit within existing data center resource constraints, including cooling limitations. There are real trade-offs as most semiconductors require more power to switch faster. See also performance per watt. ** In some embedded systems, where memory is a significant cost, better code density can significantly reduce costs. * Vendor benchmarks tend to ignore requirements for development, test, and disaster recovery computing capacity. Vendors only like to report what might be narrowly required for production capacity in order to make their initial acquisition price seem as low as possible. * Benchmarks are having trouble adapting to widely distributed servers, particularly those with extra sensitivity to network topologies. The emergence of grid computing, in particular, complicates benchmarking since some workloads are "grid friendly", while others are not. * Users can have very different perceptions of performance than benchmarks may suggest. In particular, users appreciate predictability — servers that always meet or exceed service level agreements. Benchmarks tend to emphasize mean scores (IT perspective), rather than maximum worst-case response times (
real-time computing Real-time computing (RTC) is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constra ...
perspective), or low standard deviations (user perspective). * Many server architectures degrade dramatically at high (near 100%) levels of usage — "fall off a cliff" — and benchmarks should (but often do not) take that factor into account. Vendors, in particular, tend to publish server benchmarks at continuous at about 80% usage — an unrealistic situation — and do not document what happens to the overall system when demand spikes beyond that level. * Many benchmarks focus on one application, or even one application tier, to the exclusion of other applications. Most data centers are now implementing
virtualization In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, stor ...
extensively for a variety of reasons, and benchmarking is still catching up to that reality where multiple applications and application tiers are concurrently running on consolidated servers. * There are few (if any) high quality benchmarks that help measure the performance of batch computing, especially high volume concurrent batch and online computing. Batch computing tends to be much more focused on the predictability of completing long-running tasks correctly before deadlines, such as end of month or end of fiscal year. Many important core business processes are batch-oriented and probably always will be, such as billing. * Benchmarking institutions often disregard or do not follow basic scientific method. This includes, but is not limited to: small sample size, lack of variable control, and the limited repeatability of results.


Benchmarking Principles

There are seven vital characteristics for benchmarks. These key properties are: # Relevance: Benchmarks should measure relatively vital features. # Representativeness: Benchmark performance metrics should be broadly accepted by industry and academia. # Equity: All systems should be fairly compared. # Repeatability: Benchmark results can be verified. # Cost-effectiveness: Benchmark tests are economical. # Scalability: Benchmark tests should work across systems possessing a range of resources from low to high. # Transparency: Benchmark metrics should be easy to understand.


Types of benchmark

#Real program #*word processing software #*tool software of CAD #*user's application software (i.e.: MIS) #*
Video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This fee ...
s #*
Compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s building a large project, for example Chromium browser or Linux kernel #Component Benchmark / Microbenchmark #*core routine consists of a relatively small and specific piece of code. #*measure performance of a computer's basic components #*may be used for automatic detection of computer's hardware parameters like number of registers,
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
size,
memory latency ''Memory latency'' is the time (the latency) between initiating a request for a byte or word in memory until it is retrieved by a processor. If the data are not in the processor's cache, it takes longer to obtain them, as the processor will hav ...
, etc. #Kernel #*contains key codes #*normally abstracted from actual program #*popular kernel: Livermore loop #*linpack benchmark (contains basic linear algebra subroutine written in FORTRAN language) #*results are represented in Mflop/s. #Synthetic Benchmark #*Procedure for programming synthetic benchmark: #**take statistics of all types of operations from many application programs #**get proportion of each operation #**write program based on the proportion above #*Types of Synthetic Benchmark are: #** Whetstone #** Dhrystone #*These were the first general purpose industry standard computer benchmarks. They do not necessarily obtain high scores on modern pipelined computers. # I/O benchmarks # Database benchmarks #* measure the throughput and response times of database management systems (DBMS) # Parallel benchmarks #* used on machines with multiple cores and/or processors, or systems consisting of multiple machines


Common benchmarks


Industry standard (audited and verifiable)

* Business Applications Performance Corporation (BAPCo) * Embedded Microprocessor Benchmark Consortium (EEMBC) * Standard Performance Evaluation Corporation (SPEC), in particular their
SPECint SPECint is a computer benchmark specification for CPU integer processing power. It is maintained by the Standard Performance Evaluation Corporation (SPEC). SPECint is the integer performance testing component of the SPEC test suite. The first SPEC ...
and SPECfp *
Transaction Processing Performance Council In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of w ...
(TPC): DBMS benchmarks


Open source benchmarks

* AIM Multiuser Benchmark – composed of a list of tests that could be mixed to create a ‘load mix’ that would simulate a specific computer function on any UNIX-type OS. * Bonnie++ – filesystem and hard drive benchmark *
BRL-CAD BRL-CAD is a constructive solid geometry (CSG) solid modeling computer-aided design (CAD) system. It includes an interactive geometry editor, ray tracing support for graphics rendering and geometric analysis, computer network distributed frame ...
– cross-platform architecture-agnostic benchmark suite based on multithreaded ray tracing performance; baselined against a VAX-11/780; and used since 1984 for evaluating relative CPU performance, compiler differences, optimization levels, coherency, architecture differences, and operating system differences. * Collective Knowledge – customizable, cross-platform framework to crowdsource benchmarking and optimization of user workloads (such as deep learning) across hardware provided by volunteers * Coremark – Embedded computing benchmark * DEISA Benchmark Suite – scientific HPC applications benchmark * Dhrystone – integer arithmetic performance, often reported in DMIPS (Dhrystone millions of instructions per second) * DiskSpd
Command-line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
tool for storage benchmarking that generates a variety of requests against
computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and trans ...
s, partitions or storage devices * Fhourstones – an integer benchmark * HINT – designed to measure overall CPU and memory performance * Iometer – I/O subsystem measurement and characterization tool for single and clustered systems. * IOzone – Filesystem benchmark *
LINPACK benchmarks The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a commo ...
– traditionally used to measure FLOPS * Livermore loops * NAS parallel benchmarks * NBench – synthetic benchmark suite measuring performance of integer arithmetic, memory operations, and floating-point arithmetic *
PAL Phase Alternating Line (PAL) is a colour encoding system for analogue television. It was one of three major analogue colour television standards, the others being NTSC and SECAM. In most countries it was broadcast at 625 lines, 50 fields (25 ...
– a benchmark for realtime physics engines * PerfKitBenchmarker – A set of benchmarks to measure and compare cloud offerings. * Phoronix Test Suite – open-source cross-platform benchmarking suite for Linux, OpenSolaris, FreeBSD, OSX and Windows. It includes a number of other benchmarks included on this page to simplify execution. *
POV-Ray The Persistence of Vision Ray Tracer, most commonly acronymed as POV-Ray, is a cross-platform ray-tracing program that generates images from a text-based scene description. It was originally based on DKBTrace, written by David Kirk Buck and Aaro ...
– 3D render *
Tak (function) In computer science, the Tak function is a recursive function, named after Ikuo Takeuchi (竹内郁雄). It is defined as follows: \tau (x,y,z) = \begin \tau (\tau (x-1,y,z) ,\tau (y-1,z,x) ,\tau (z-1,x,y) ) & \text y Int tarai x y z , x < ...
– a simple benchmark used to test recursion performance * TATP Benchmark – Telecommunication Application Transaction Processing Benchmark * TPoX – An XML transaction processing benchmark for XML databases * VUP (VAX unit of performance) – also called VAX MIPS * Whetstone – floating-point arithmetic performance, often reported in millions of Whetstone instructions per second (MWIPS)


Microsoft Windows benchmarks

* BAPCo: MobileMark, SYSmark, WebMark * CrystalDiskMark *
Futuremark Futuremark Oy was a Finnish software development company that produced computer benchmark applications for home, business, and press use. Futuremark was acquired by UL on 31 October 2014, and was formally merged into the company on 23 April 2 ...
:
3DMark 3DMark is a computer benchmarking tool created and developed by UL, (formerly Futuremark), to determine the performance of a computer's 3D graphic rendering and CPU workload processing capabilities. Running 3DMark produces a 3DMark score, with h ...
,
PCMark PCMark is a computer benchmark tool developed by UL (formerly Futuremark) to test the performance of a PC at the system and component level. In most cases, the tests in PCMark are designed to represent typical home user workloads. Running PCMar ...
* Heaven Benchmark * PiFast *
Superposition Benchmark Superposition Benchmark is benchmarking software based on the UNIGINE Engine. The benchmark was developed and published by UNIGINE Company in 2017. The main purpose of software is performance and stability testing for GPUs. Users can choose ...
* Super PI * SuperPrime * Valley Benchmark * Whetstone * Windows System Assessment Tool, included with Windows Vista and later releases, providing an index for consumers to rate their systems easily * Worldbench (discontinued)


Others

* AnTuTu – commonly used on phones and ARM-based devices. * Geekbench – A cross-platform benchmark for Windows, Linux, macOS, iOS and Android. * iCOMP – the Intel comparative microprocessor performance, published by Intel * Khornerstone * Performance Rating – modeling scheme used by AMD and Cyrix to reflect the relative performance usually compared to competing products. *
SunSpider A browser speed test is a computer benchmark that scores the performance of a web browser, by measuring the browser's efficiency in completing a predefined list of tasks. In general the testing software is available online, located on a website, w ...
– a browser speed test * VMmark – a virtualization benchmark suite.


See also

* Benchmarking (business perspective) *
Figure of merit A figure of merit is a quantity used to characterize the performance of a device, system or method, relative to its alternatives. Examples *Clock rate of a CPU *Calories per serving *Contrast ratio of an LCD *Frequency response of a speaker * Fi ...
* Lossless compression benchmarks * Performance Counter Monitor *
Test suite In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often contai ...
a collection of test cases intended to show that a software program has some specified set of behaviors


References


Further reading

* * *


External links

* The dates: 1962-1976 {{DEFAULTSORT:Benchmark (Computing)