HOME

TheInfoList



OR:

In computing, external memory algorithms or out-of-core algorithms are
algorithms In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing c ...
that are designed to process data that are too large to fit into a computer's
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory (
auxiliary memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a comput ...
) such as
hard drives A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magn ...
or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.


Model

External memory algorithms are analyzed in an idealized
model of computation In computer science, and more specifically in computability theory and computational complexity theory, a model of computation is a model which describes how an output of a mathematical function is computed given an input. A model describes how ...
called the external memory model (or I/O model, or disk access model). The external memory model is an
abstract machine An abstract machine is a computer science theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is analogous to a mathematical function in that it receives inputs and produces outputs based on pr ...
similar to the RAM machine model, but with a
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
in addition to
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
. The model captures the fact that read and write operations are much faster in a
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
than in
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
, and that
reading Reading is the process of taking in the sense or meaning of letters, symbols, etc., especially by sight or touch. For educators and researchers, reading is a multifaceted process involving such areas as word recognition, orthography (spelling) ...
long contiguous blocks is faster than reading randomly using a
disk read-and-write head A disk read-and-write head is the small part of a disk drive which moves above the disk platter and transforms the platter's magnetic field into electrical current (reads the disk) or, vice versa, transforms electrical current into magnetic fi ...
. The
running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
of an
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
in the external memory model is defined by the number of reads and writes to memory required. The model was introduced by Alok Aggarwal and
Jeffrey Vitter Jeffrey Scott Vitter is a U.S. computer scientist and academic administrator. Born in 1955 in New Orleans, Vitter has served in several senior higher education administration posts. He is a former chancellor of the University of Mississippi ( ...
in 1988. The external memory model is related to the cache-oblivious model, but algorithms in the external memory model may know both the block size and the
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
size. For this reason, the model is sometimes referred to as the cache-aware model. The model consists of a
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
with an internal memory or
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
of size , connected to an unbounded external memory. Both the internal and external memory are divided into blocks of size . One input/output or memory transfer operation consists of moving a block of contiguous elements from external to internal memory, and the
running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
of an algorithm is determined by the number of these input/output operations.


Algorithms

Algorithms in the external memory model take advantage of the fact that retrieving one object from external memory retrieves an entire block of size B. This property is sometimes referred to as locality. Searching for an element among N objects is possible in the external memory model using a
B-tree In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for ...
with branching factor B. Using a B-tree, searching, insertion, and deletion can be achieved in O(\log _B N) time (in
Big O notation Big ''O'' notation is a mathematical notation that describes the limiting behavior of a function when the argument tends towards a particular value or infinity. Big O is a member of a family of notations invented by Paul Bachmann, Edmund Land ...
). Information theoretically, this is the minimum
running time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
possible for these operations, so using a B-tree is
asymptotically optimal In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor (independent of the input size) worse than the best possible algorithm. It is a term commonly e ...
. External sorting is sorting in an external memory setting. External sorting can be done via distribution sort, which is similar to quicksort, or via a \tfrac -way merge sort. Both variants achieve the
asymptotically optimal In computer science, an algorithm is said to be asymptotically optimal if, roughly speaking, for large inputs it performs at worst a constant factor (independent of the input size) worse than the best possible algorithm. It is a term commonly e ...
runtime of O(\tfrac \log _ \tfrac) to sort objects. This bound also applies to the fast Fourier transform in the external memory model. The permutation problem is to rearrange N elements into a specific
permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or pr ...
. This can either be done either by sorting, which requires the above sorting runtime, or inserting each element in order and ignoring the benefit of locality. Thus, permutation can be done in O(\min (N, \tfrac \log _ \tfrac)) time.


Applications

The external memory model captures the
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controll ...
, which is not modeled in other common models used in analyzing
data structures In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, ...
, such as the
random-access machine In computer science, random-access machine (RAM) is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers. Like the c ...
, and is useful for proving
lower bound In mathematics, particularly in order theory, an upper bound or majorant of a subset of some preordered set is an element of that is greater than or equal to every element of . Dually, a lower bound or minorant of is defined to be an elemen ...
s for data structures. The model is also useful for analyzing algorithms that work on datasets too big to fit in internal memory. A typical example is
geographic information system A geographic information system (GIS) is a type of database containing geographic data (that is, descriptions of phenomena for which location is relevant), combined with software tools for managing, analyzing, and visualizing those data. In a ...
s, especially
digital elevation model A digital elevation model (DEM) or digital surface model (DSM) is a 3D computer graphics representation of elevation data to represent terrain or overlaying objects, commonly of a planet, moon, or asteroid. A "global DEM" refers to a discrete gl ...
s, where the full data set easily exceeds several gigabytes or even
terabytes The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
of data. This methodology extends beyond general purpose CPUs and also includes GPU computing as well as classical digital signal processing. In
general-purpose computing on graphics processing units General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU), powerful graphics cards (GPUs) with little memory (compared with the more familiar system memory, which is most often referred to simply as RAM) are utilized with relatively slow CPU-to-GPU memory transfer (when compared with computation bandwidth).


History

An early use of the term "out-of-core" as an adjective is in 1962 in reference to ''devices'' that are other than the
core memory Core or cores may refer to: Science and technology * Core (anatomy), everything except the appendages * Core (manufacturing), used in casting and molding * Core (optical fiber), the signal-carrying portion of an optical fiber * Core, the centr ...
of an IBM 360. An early use of the term "out-of-core" with respect to ''algorithms'' appears in 1971.


See also

* External sorting * Online algorithm * Streaming algorithm *
Cache-oblivious algorithm In computing, a cache-oblivious algorithm (or cache-transcendent algorithm) is an algorithm designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter. An o ...
* Parallel external memory * External memory graph traversal


References

{{Reflist


External links


Out of Core SVD and QROut of core graphicsScalapack design
Algorithms Models of computation Cache (computing) Analysis of algorithms External memory algorithms