computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, algorithmic efficiency is a property of an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

which relates to the amount of computational resources used by the algorithm. Algorithmic efficiency can be thought of as analogous to engineering

productivity Productivity is the efficiency of production of goods or services expressed by some measure. Measurements of productivity are often expressed as a ratio of an aggregate output to a single input or an aggregate input used in a production proce ...

for a repeating or continuous process. For maximum efficiency it is desirable to minimize resource usage. However, different resources such as

time Time is the continuous progression of existence that occurs in an apparently irreversible process, irreversible succession from the past, through the present, and into the future. It is a component quantity of various measurements used to sequ ...

and

space Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...

complexity cannot be compared directly, so which of two algorithms is considered to be more efficient often depends on which measure of efficiency is considered most important. For example,

bubble sort Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the input list element by element, comparing the current element with the one after it, Swap (computer science), swapping their values ...

and

timsort Timsort is a hybrid, stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was implemented by Tim Peters in 2002 for use in the Python programming language. The algo ...

are both algorithms to sort a list of items from smallest to largest. Bubble sort organizes the list in time proportional to the number of elements squared (

O(n^2)

, see

Big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...

), but only requires a small amount of extra

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

which is constant with respect to the length of the list (

O(1)

). Timsort sorts the list in time

linearithmic In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...

(proportional to a quantity times its logarithm) in the list's length (

O(n\log n)

), but has a space requirement

linear In mathematics, the term ''linear'' is used in two distinct senses for two different properties: * linearity of a '' function'' (or '' mapping''); * linearity of a '' polynomial''. An example of a linear function is the function defined by f(x) ...

in the length of the list (

O(n)

). If large lists must be sorted at high speed for a given application, timsort is a better choice; however, if minimizing the

memory footprint Memory footprint refers to the amount of main memory that a program uses or references while running. The word footprint generally refers to the extent of physical dimensions that an object occupies, giving a sense of its size. In computing, t ...

of the sorting is more important, bubble sort is a better choice.

Background

The importance of efficiency with respect to time was emphasized by

Ada Lovelace Augusta Ada King, Countess of Lovelace (''née'' Byron; 10 December 1815 – 27 November 1852), also known as Ada Lovelace, was an English mathematician and writer chiefly known for her work on Charles Babbage's proposed mechanical general-pur ...

in 1843 as applied to

Charles Babbage Charles Babbage (; 26 December 1791 – 18 October 1871) was an English polymath. A mathematician, philosopher, inventor and mechanical engineer, Babbage originated the concept of a digital programmable computer. Babbage is considered ...

's mechanical analytical engine:

"In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selections amongst them for the purposes of a calculating engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation"

Early

electronic computer A computer is a machine that can be programmed to automatically carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic sets of operations known as ''programs'', wh ...

s had both limited

speed In kinematics, the speed (commonly referred to as ''v'') of an object is the magnitude of the change of its position over time or the magnitude of the change of its position per unit of time; it is thus a non-negative scalar quantity. Intro ...

and limited

random access memory Random-access memory (RAM; ) is a form of electronic computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written ...

. Therefore, a space–time trade-off occurred. A task could use a fast algorithm using a lot of memory, or it could use a slow algorithm using little memory. The engineering trade-off was therefore to use the fastest algorithm that could fit in the available memory. Modern computers are significantly faster than early computers and have a much larger amount of memory available ( gigabytes instead of kilobytes). Nevertheless,

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

emphasized that efficiency is still an important consideration:

"In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering"

Overview

An algorithm is considered efficient if its resource consumption, also known as computational cost, is at or below some acceptable level. Roughly speaking, 'acceptable' means: it will run in a reasonable amount of time or space on an available computer, typically as a function of the size of the input. Since the 1950s computers have seen dramatic increases in both the available computational power and in the available amount of memory, so current acceptable levels would have been unacceptable even 10 years ago. In fact, thanks to the approximate doubling of computer power every 2 years, tasks that are acceptably efficient on modern

smartphone A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as multi ...

s and

embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...

s may have been unacceptably inefficient for industrial

server Server may refer to: Computing *Server (computing), a computer program or a device that provides requested information for other programs or devices, called clients. Role * Waiting staff, those who work at a restaurant or a bar attending custome ...

s 10 years ago. Computer manufacturers frequently bring out new models, often with higher

performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...

. Software costs can be quite high, so in some cases the simplest and cheapest way of getting higher performance might be to just buy a faster computer, provided it is compatible with an existing computer. There are many ways in which the resources used by an algorithm can be measured: the two most common measures are speed and memory usage; other measures could include transmission speed, temporary disk usage, long-term disk usage, power consumption,

total cost of ownership Total cost of ownership (TCO) is a financial estimate intended to help buyers and owners determine the direct and indirect costs of a product or service. It is a management accounting concept that can be used in full cost accounting or even eco ...

, response time to external stimuli, etc. Many of these measures depend on the size of the input to the algorithm, i.e. the amount of data to be processed. They might also depend on the way in which the data is arranged; for example, some

sorting algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending ...

s perform poorly on data which is already sorted, or which is sorted in reverse order. In practice, there are other factors which can affect the efficiency of an algorithm, such as requirements for accuracy and/or reliability. As detailed below, the way in which an algorithm is implemented can also have a significant effect on actual efficiency, though many aspects of this relate to

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...

issues.

Theoretical analysis

In the theoretical

analysis of algorithms In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that r ...

, the normal practice is to estimate their complexity in the asymptotic sense. The most commonly used notation to describe resource consumption or "complexity" is

, representing the complexity of an algorithm as a function of the size of the input

n

. Big O notation is an

asymptotic In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates Limit of a function#Limits at infinity, tends to infinity. In pro ...

measure of function complexity, where

f(n) = O\bigl( g(n)\bigr)

roughly means the time requirement for an algorithm is proportional to

g(n)

, omitting lower-order terms that contribute less than

g(n)

to the growth of the function as

n

grows arbitrarily large. This estimate may be misleading when

n

is small, but is generally sufficiently accurate when

n

is large as the notation is asymptotic. For example, bubble sort may be faster than

merge sort In computer science, merge sort (also commonly spelled as mergesort and as ) is an efficient, general-purpose, and comparison sort, comparison-based sorting algorithm. Most implementations of merge sort are Sorting algorithm#Stability, stable, wh ...

when only a few items are to be sorted; however either implementation is likely to meet performance requirements for a small list. Typically, programmers are interested in algorithms that scale efficiently to large input sizes, and merge sort is preferred over bubble sort for lists of length encountered in most data-intensive programs. Some examples of Big O notation applied to algorithms' asymptotic time complexity include:

Measuring performance

For new versions of software or to provide comparisons with competitive systems, benchmarks are sometimes used, which assist with gauging an algorithms relative performance. If a new sort algorithm is produced, for example, it can be compared with its predecessors to ensure that at least it is efficient as before with known data, taking into consideration any functional improvements. Benchmarks can be used by customers when comparing various products from alternative suppliers to estimate which product will best suit their specific requirements in terms of functionality and performance. For example, in the

mainframe A mainframe computer, informally called a mainframe or big iron, is a computer used primarily by large organizations for critical applications like bulk data processing for tasks such as censuses, industry and consumer statistics, enterpris ...

world certain proprietary sort products from independent software companies such as Syncsort compete with products from the major suppliers such as

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

for speed. Some benchmarks provide opportunities for producing an analysis comparing the relative speed of various compiled and interpreted languages for example and The Computer Language Benchmarks Game compares the performance of implementations of typical programming problems in several programming languages. Even creating "

do it yourself "Do it yourself" ("DIY") is the method of building, wikt:modification, modifying, or repairing things by oneself without the direct aid of professionals or certified experts. Academic research has described DIY as behaviors where "individuals ...

" benchmarks can demonstrate the relative performance of different programming languages, using a variety of user specified criteria. This is quite simple, as a "Nine language performance roundup" by Christopher W. Cowell-Shah demonstrates by example.

Implementation concerns

Implementation issues can also have an effect on efficiency, such as the choice of programming language, or the way in which the algorithm is actually coded, or the choice of a

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

for a particular language, or the compilation options used, or even the

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

being used. In many cases a language implemented by an

interpreter Interpreting is translation from a spoken or signed language into another language, usually in real time to facilitate live communication. It is distinguished from the translation of a written text, which can be more deliberative and make use o ...

may be much slower than a language implemented by a compiler. See the articles on

just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compilation (of computer code) during execution of a program (at run time) rather than before execution. This may consist of source code transl ...

and

interpreted language In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An inter ...

s. There are other factors which may affect time or space issues, but which may be outside of a programmer's control; these include data alignment, data granularity, cache locality,

cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that is stored in multiple local caches. In a cache coherent system, if multiple clients have a cached copy of the same region of a shared memory resource, all ...

garbage collection Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclable ...

instruction-level parallelism Instruction-level parallelism (ILP) is the Parallel computing, parallel or simultaneous execution of a sequence of Instruction set, instructions in a computer program. More specifically, ILP refers to the average number of instructions run per st ...

, multi-threading (at either a hardware or software level), simultaneous multitasking, and

subroutine In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a ...

calls.Guy Lewis Steele, Jr. "Debunking the 'Expensive Procedure Call' Myth, or, Procedure Call Implementations Considered Harmful, or, Lambda: The Ultimate GOTO". MIT AI Lab. AI Lab Memo AIM-443. October 197

/ref> Some processors have capabilities for vector processor, vector processing, which allow a single instruction to operate on multiple operands; it may or may not be easy for a programmer or compiler to use these capabilities. Algorithms designed for sequential processing may need to be completely redesigned to make use of parallel processing, or they could be easily reconfigured. As parallel and

distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...

grow in importance in the late 2010s, more investments are being made into efficient high-level

API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...

s for parallel and distributed computing systems such as

CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...

TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...

Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...

OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...

and

MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, a tomographic technique * Myocardial perfusion imaging, a medical procedure that illustrates heart function * Mannose phosphate isomerase, an enzyme ...

. Another problem which can arise in programming is that processors compatible with the same

instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...

(such as

x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set architecture, instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new ope ...

or ARM) may implement an instruction in different ways, so that instructions which are relatively fast on some models may be relatively slow on other models. This often presents challenges to

optimizing compiler An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of op ...

s, which must have extensive knowledge of the specific CPU and other hardware available on the compilation target to best optimize a program for performance. In the extreme case, a compiler may be forced to emulate instructions not supported on a compilation target platform, forcing it to generate code or link an external library call to produce a result that is otherwise incomputable on that platform, even if it is natively supported and more efficient in hardware on other platforms. This is often the case in

s with respect to

floating-point arithmetic In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...

, where small and low-power

microcontroller A microcontroller (MC, uC, or μC) or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs (processor cores) along with memory and programmable input/output peripherals. Pro ...

s often lack hardware support for floating-point arithmetic and thus require computationally expensive software routines to produce floating point calculations.

Measures of resource usage

Measures are normally expressed as a function of the size of the input

\scriptstyle

. The two most common measures are: * ''Time'': how long does the algorithm take to complete? * ''Space'': how much working memory (typically RAM) is needed by the algorithm? This has two aspects: the amount of memory needed by the code (auxiliary space usage), and the amount of memory needed for the data on which the code operates (intrinsic space usage). For computers whose power is supplied by a battery (e.g.

laptop A laptop computer or notebook computer, also known as a laptop or notebook, is a small, portable personal computer (PC). Laptops typically have a Clamshell design, clamshell form factor (design), form factor with a flat-panel computer scree ...

s and

s), or for very long/large calculations (e.g.

supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

s), other measures of interest are: * ''Direct power consumption'': power needed directly to operate the computer. * ''Indirect power consumption'': power needed for cooling, lighting, etc. , power consumption is growing as an important metric for computational tasks of all types and at all scales ranging from embedded

Internet of things Internet of things (IoT) describes devices with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other communication networks. The IoT encompasse ...

devices to

system-on-chip A system on a chip (SoC) is an integrated circuit that combines most or all key components of a computer or electronic system onto a single microchip. Typically, an SoC includes a central processing unit (CPU) with memory, input/output, and da ...

devices to

server farm A server farm or server cluster is a collection of Server (computing), computer servers, usually maintained by an organization to supply server functionality far beyond the capability of a single machine. They often consist of thousands of compu ...

s. This trend is often referred to as

green computing Green computing, green IT (Information Technology), or Information and Communication Technology Sustainability, is the study and practice of environmentally sustainable computing or IT. The goals of green computing include optimising energy ef ...

. Less common measures of computational efficiency may also be relevant in some cases: *''Transmission size'': bandwidth could be a limiting factor.

Data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

can be used to reduce the amount of data to be transmitted. Displaying a picture or image (e.g.

Google logo The Google logo appears in numerous settings to identify the search engine company. Google has used several logos over History of Google, its history, with the first logo created by Sergey Brin using GIMP. A revised logo debuted on September 1, ...

) can result in transmitting tens of thousands of bytes (48K in this case) compared with transmitting six bytes for the text "Google". This is important for I/O bound computing tasks. *''External space'': space needed on a disk or other external memory device; this could be for temporary storage while the algorithm is being carried out, or it could be long-term storage needed to be carried forward for future reference. *''Response time'' ( latency): this is particularly relevant in a real-time application when the computer system must respond quickly to some external event. *''Total cost of ownership'': particularly if a computer is dedicated to one particular algorithm.

Time

Theory

Analysis of algorithms In computer science, the analysis of algorithms is the process of finding the computational complexity of algorithms—the amount of time, storage, or other resources needed to execute them. Usually, this involves determining a function that r ...

, typically using concepts like

time complexity In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations ...

, can be used to get an estimate of the running time as a function of the size of the input data. The result is normally expressed using

. This is useful for comparing algorithms, especially when a large amount of data is to be processed. More detailed estimates are needed to compare algorithm performance when the amount of data is small, although this is likely to be of less importance.

Parallel algorithm In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract mach ...

s may be more difficult to analyze.

Practice

A benchmark can be used to assess the performance of an algorithm in practice. Many programming languages have an available function which provides

CPU time CPU time (or process time) is the amount of time that a central processing unit (CPU) was used for processing instructions of a computer program or operating system. CPU time is measured in clock ticks or seconds. Sometimes it is useful to con ...

usage. For long-running algorithms the elapsed time could also be of interest. Results should generally be averaged over several tests. Run-based profiling can be very sensitive to hardware configuration and the possibility of other programs or tasks running at the same time in a

multi-processing Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. Ther ...

and multi-programming environment. This sort of test also depends heavily on the selection of a particular programming language, compiler, and compiler options, so algorithms being compared must all be implemented under the same conditions.

Space

This section is concerned with use of memory resources ( registers,

cache Cache, caching, or caché may refer to: Science and technology * Cache (computing), a technique used in computer storage for easier data access * Cache (biology) or hoarding, a food storing behavior of animals * Cache (archaeology), artifacts p ...

RAM Ram, ram, or RAM most commonly refers to: * A male sheep * Random-access memory, computer memory * Ram Trucks, US, since 2009 ** List of vehicles named Dodge Ram, trucks and vans ** Ram Pickup, produced by Ram Trucks Ram, ram, or RAM may also ref ...

virtual memory In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...

, secondary memory) while the algorithm is being executed. As for time analysis above, analyze the algorithm, typically using

space complexity The space complexity of an algorithm or a data structure is the amount of memory space required to solve an instance of the computational problem as a function of characteristics of the input. It is the memory required by an algorithm until it exec ...

analysis to get an estimate of the run-time memory needed as a function as the size of the input data. The result is normally expressed using

. There are up to four aspects of memory usage to consider: * The amount of memory needed to hold the code for the algorithm. * The amount of memory needed for the input data. * The amount of memory needed for any output data. **Some algorithms, such as sorting, often rearrange the input data and do not need any additional space for output data. This property is referred to as "

in-place In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separa ...

" operation. * The amount of memory needed as working space during the calculation. **This includes

local variable In computer science, a local variable is a variable that is given ''local scope''. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with ...

s and any stack space needed by routines called during a calculation; this stack space can be significant for algorithms which use

recursive Recursion occurs when the definition of a concept or process depends on a simpler or previous version of itself. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in m ...

techniques. Early electronic computers, and early home computers, had relatively small amounts of working memory. For example, the 1949

Electronic Delay Storage Automatic Calculator The Electronic Delay Storage Automatic Calculator (EDSAC) was an early British computer. Inspired by John von Neumann's seminal ''First Draft of a Report on the EDVAC'', the machine was constructed by Maurice Wilkes and his team at the Universi ...

(EDSAC) had a maximum working memory of 1024 17-bit words, while the 1980 Sinclair

ZX80 The Sinclair ZX80 is a home computer launched on 29 January 1980 by Science of Cambridge Ltd. (later to be better known as Sinclair Research). It is notable for being one of the first computers available in the United Kingdom for less than a hu ...

came initially with 1024 8-bit bytes of working memory. In the late 2010s, it is typical for

personal computer A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...

s to have between 4 and 32 GB of RAM, an increase of over 300 million times as much memory.

Caching and memory hierarchy

Modern computers can have relatively large amounts of memory (possibly gigabytes), so having to squeeze an algorithm into a confined amount of memory is not the kind of problem it used to be. However, the different types of memory and their relative access speeds can be significant: *

Processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...

s, are the fastest memory with the least amount of space. Most direct computation on modern computers occurs with source and destination operands in registers before being updated to the cache, main memory and virtual memory if needed. On a

processor core A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

, there are typically on the order of hundreds of bytes or fewer of register availability, although a

register file A register file is an array of processor registers in a central processing unit (CPU). The instruction set architecture of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on ...

may contain more physical registers than

architectural Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and construction, constructi ...

registers defined in the instruction set architecture. *

Cache memory In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...

is the second fastest, and second smallest, available in the memory hierarchy. Caches are present in processors such as CPUs or GPUs, where they are typically implemented in

static RAM Static random-access memory (static RAM or SRAM) is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed. The ''static'' qualifier differ ...

, though they can also be found in peripherals such as disk drives. Processor caches often have their own multi-level hierarchy; lower levels are larger, slower and typically shared between

s in

multi-core processor A multi-core processor (MCP) is a microprocessor on a single integrated circuit (IC) with two or more separate central processing units (CPUs), called ''cores'' to emphasize their multiplicity (for example, ''dual-core'' or ''quad-core''). Ea ...

s. In order to process operands in cache memory, a processing unit must fetch the data from the cache, perform the operation in registers and write the data back to the cache. This operates at speeds comparable (about 2-10 times slower) with the CPU or GPU's

arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...

floating-point unit A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multip ...

if in the L1 cache. It is about 10 times slower if there is an L1

cache miss In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...

and it must be retrieved from and written to the

L2 cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

, and a further 10 times slower if there is an L2 cache miss and it must be retrieved from an L3 cache, if present. * Main physical memory is most often implemented in dynamic RAM (DRAM). The main memory is much larger (typically

gigabyte The gigabyte () is a multiple of the unit byte for digital information. The SI prefix, prefix ''giga-, giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte i ...

s compared to ≈8

megabyte The megabyte is a multiple of the unit byte for digital information. Its recommended unit symbol is MB. The unit prefix ''mega'' is a multiplier of (106) in the International System of Units (SI). Therefore, one megabyte is one million bytes ...

s) than an L3 CPU cache, with read and write latencies typically 10-100 times slower. , RAM is increasingly implemented on-chip of processors, as CPU or GPU memory. * Paged memory, often used for

management, is memory stored in

secondary storage Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers. The cent ...

such as a

hard disk A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...

, and is an extension to the

memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...

which allows use of a potentially larger storage space, at the cost of much higher latency, typically around 1000 times slower than a

for a value in RAM. While originally motivated to create the impression of higher amounts of memory being available than were truly available, virtual memory is more important in contemporary usage for its time-space tradeoff and enabling the usage of

virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...

s. Cache misses from main memory are called

page fault In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...

s, and incur huge performance penalties on programs. An algorithm whose memory needs will fit in cache memory will be much faster than an algorithm which fits in main memory, which in turn will be very much faster than an algorithm which has to resort to paging. Because of this,

cache replacement policies In computing, cache replacement policies (also known as cache replacement algorithms or cache algorithms) are optimizing instructions or algorithms which a computer program or hardware-maintained structure can utilize to manage a cache of info ...

are extremely important to high-performance computing, as are cache-aware programming and data alignment. To further complicate the issue, some systems have up to three levels of cache memory, with varying effective speeds. Different systems will have different amounts of these various types of memory, so the effect of algorithm memory needs can vary greatly from one system to another. In the early days of electronic computing, if an algorithm and its data would not fit in main memory then the algorithm could not be used. Nowadays the use of virtual memory appears to provide much more memory, but at the cost of performance. Much higher speed can be obtained if an algorithm and its data fit in cache memory; in this case minimizing space will also help minimize time. This is called the

principle of locality In physics, the principle of locality states that an object is influenced directly only by its immediate surroundings. A theory that includes the principle of locality is said to be a "local theory". This is an alternative to the concept of ins ...

, and can be subdivided into

locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...

, spatial locality, and temporal locality. An algorithm which will not fit completely in cache memory but which exhibits locality of reference may perform reasonably well.

References

{{Authority control Analysis of algorithms Computer performance Software optimization Software quality