HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, array is a
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
that represents a collection of ''elements'' (
values In ethics and social sciences, value denotes the degree of importance of some thing or action, with the aim of determining which actions are best to do or what way is best to live ( normative ethics), or to describe the significance of different a ...
or variables), each selected by one or more indices (identifying keys) that can be computed at run time during program execution. Such a collection is usually called an array variable or array value.Robert W. Sebesta (2001) ''Concepts of Programming Languages''. Addison-Wesley. 4th edition (1998), 5th edition (2001), By analogy with the mathematical concepts
vector Vector most often refers to: * Euclidean vector, a quantity with a magnitude and a direction * Disease vector, an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematics a ...
and
matrix Matrix (: matrices or matrixes) or MATRIX may refer to: Science and mathematics * Matrix (mathematics), a rectangular array of numbers, symbols or expressions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the m ...
, array types with one and two indices are often called vector type and matrix type, respectively. More generally, a multidimensional array type can be called a tensor type, by analogy with the mathematical concept,
tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space. Tensors may map between different objects such as vectors, scalars, and even other ...
. Language support for array types may include certain built-in array data types, some syntactic constructions (''array type constructors'') that the
programmer A programmer, computer programmer or coder is an author of computer source code someone with skill in computer programming. The professional titles Software development, ''software developer'' and Software engineering, ''software engineer' ...
may use to define such types and declare array variables, and special notation for indexing array elements. For example, in the
Pascal programming language Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named after French ...
, the declaration , defines a new array data type called MyTable. The declaration var A: MyTable then defines a variable A of that type, which is an aggregate of eight elements, each being an integer variable identified by two indices. In the Pascal program, those elements are denoted A ,1/code>, A ,2/code>, A ,1/code>, …, A ,2/code>.K. Jensen and Niklaus Wirth, ''PASCAL User Manual and Report''. Springer. Paperback edition (2007) 184 pages, Special array types are often defined by the language's standard
libraries A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
. Dynamic lists are also more common and easier to implement than
dynamic array In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard l ...
s. Array types are distinguished from record types mainly because they allow the element indices to be computed at run time, as in the Pascal assignment . Among other things, this feature allows a single iterative
statement Statement or statements may refer to: Common uses *Statement (computer science), the smallest standalone element of an imperative programming language *Statement (logic and semantics), declarative sentence that is either true or false *Statement, ...
to process arbitrarily many elements of an array variable. In more theoretical contexts, especially in
type theory In mathematics and theoretical computer science, a type theory is the formal presentation of a specific type system. Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of ...
and in the description of abstract
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
s, the terms "array" and "array type" sometimes refer to an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
(ADT) also called ''abstract array'' or may refer to an ''
associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
'', a
mathematical Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
model with the basic operations and behavior of a typical array type in most languages basically, a collection of elements that are selected by indices computed at run-time. Depending on the language, array types may overlap (or be identified with) other data types that describe aggregates of values, such as lists and strings. Array types are often implemented by
array data structure In computer science, an array is a data structure consisting of a collection of ''elements'' (value (computer science), values or variable (programming), variables), of same memory size, each identified by at least one ''array index'' or ''key' ...
s, but sometimes by other means, such as
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
s,
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...
s, or
search tree In computer science, a search tree is a tree data structure used for locating specific keys from within a set. In order for a tree to function as a search tree, the key for each node must be greater than any keys in subtrees on the left, and les ...
s.


History

Heinz Rutishauser Heinz Rutishauser (30 January 1918 – 10 November 1970) was a Swiss people, Swiss mathematician and a pioneer of modern numerical mathematics and computer science. Life Rutishauser's father died when he was 13 years old and his mother died t ...
's programming language Superplan (1949–1951) included multi-dimensional arrays. However, although Rutishauser described how a compiler for his language should be built, did not implement one. Assembly languages and low-level languages like BCPL generally have no syntactic support for arrays. Because of the importance of array structures for efficient computation, the earliest high-level programming languages, including FORTRAN (1957),
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
(1960), and
Algol 60 ALGOL 60 (short for ''Algorithmic Language 1960'') is a member of the ALGOL family of computer programming languages. It followed on from ALGOL 58 which had introduced code blocks and the begin and end pairs for delimiting them, representing a ...
(1960), provided support for multi-dimensional arrays.


Abstract arrays

An array data structure can be mathematically modeled as an
abstract data structure In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a ''user'' of the data, specifically in terms of possible values, possible operations on data ...
(an ''abstract array'') with two operations :: the data stored in the element of the array ''A'' whose indices are the integer
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
''I''. :: the array that results by setting the value of that element to ''V''. These operations are required to satisfy the
axiom An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or ...
s : : if for any array state ''A'', any value ''V'', and any tuples ''I'', ''J'' for which the operations are defined. The first axiom means that each element behaves like a variable. The second axiom means that elements with distinct indices behave as disjoint variables, so that storing a value in one element does not affect the value of any other element. These axioms do not place any constraints on the set of valid index tuples ''I'', therefore this abstract model can be used for triangular matrices and other oddly-shaped arrays.


Implementations

In order to effectively implement variables of such types as array structures (with indexing done by
pointer arithmetic In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer ''refe ...
), many languages restrict the indices to
integer An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
data types (or other types that can be interpreted as integers, such as
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s and
enumerated type In computer programming, an enumerated type (also called enumeration, enum, or factor in the R (programming language), R programming language, a status variable in the JOVIAL programming language, and a categorical variable in statistics) is a data ...
s), and require that all elements have the same data type and storage size. Most of those languages also restrict each index to a finite interval of integers, that remains fixed throughout the lifetime of the array variable. In some compiled languages, in fact, the index ranges may have to be known at
compile time In computer science, compile time (or compile-time) describes the time window during which a language's statements are converted into binary instructions for the processor to execute. The term is used as an adjective to describe concepts relat ...
. On the other hand, some programming languages provide more liberal array types, that allow indexing by arbitrary values, such as
floating-point numbers In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base. Numbers of this form ...
, strings, objects,
references A reference is a relationship between Object (philosophy), objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. ...
, etc.. Such index values cannot be restricted to an interval, much less a fixed interval. So, these languages usually allow arbitrary new elements to be created at any time. This choice precludes the implementation of array types as array data structures. That is, those languages use array-like syntax to implement a more general
associative array In computer science, an associative array, key-value store, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In math ...
semantics, and must therefore be implemented by a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
or some other
search data structure In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items from a set of items, such as a specific record from a database. The simplest, most general, and least efficient search struct ...
.


Language support


Multi-dimensional arrays

The number of indices needed to specify an element is called the ''dimension'', ''dimensionality'', or
rank A rank is a position in a hierarchy. It can be formally recognized—for example, cardinal, chief executive officer, general, professor—or unofficial. People Formal ranks * Academic rank * Corporate title * Diplomatic rank * Hierarchy ...
of the array type. (This nomenclature conflicts with the concept of dimension in linear algebra, which expresses the shape of a matrix. Thus, an array of numbers with 5 rows and 4 columns, hence 20 elements, is said to have dimension 2 in computing contexts, but represents a matrix that is said to be 4×5-dimensional. Also, the computer science meaning of "rank" conflicts with the notion of
tensor rank In mathematics, the modern component-free approach to the theory of a tensor views a tensor as an abstract object, expressing some definite type of multilinear concept. Their properties can be derived from their definitions, as linear maps or ...
, which is a generalization of the linear algebra concept of
rank of a matrix In linear algebra, the rank of a matrix is the dimension of the vector space generated (or spanned) by its columns. p. 48, § 1.16 This corresponds to the maximal number of linearly independent columns of . This, in turn, is identical to the dime ...
.) Many languages support only one-dimensional arrays. In those languages, a multi-dimensional array is typically represented by an
Iliffe vector In computer programming, an Iliffe vector, also known as a display, is a data structure used to implement multi-dimensional array data structure, arrays. Data structure An Iliffe vector for an ''n''-dimensional array (where ''n'' ≥ 2) ...
, a one-dimensional array of
references A reference is a relationship between Object (philosophy), objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. ...
to arrays of one dimension less. A two-dimensional array, in particular, would be implemented as a vector of pointers to its rows. Thus an element in row ''i'' and column ''j'' of an array ''A'' would be accessed by double indexing (''A'' 'i''''j''] in typical notation). This way of emulating multi-dimensional arrays allows the creation of
jagged array In computer science, a jagged array, also known as a ragged array or irregular array is an array data structure, array of arrays of which the member arrays can be of different lengths, producing rows of jagged edges when visualized as output. ...
s, where each row may have a different size or, in general, where the valid range of each index depends on the values of all preceding indices. This representation for multi-dimensional arrays is quite prevalent in C and C++ software. However, C and C++ will use a linear indexing formula for multi-dimensional arrays that are declared with compile time constant size, e.g. by or , instead of the traditional . The C99 standard introduced Variable Length Array types that let define array types with dimensions computed in run time. The dynamic 4D array can be constructed using a pointer to 4d array, e.g. . The individual elements are accessed by first de-referencing an array pointer followed by indexing, e.g. (*arr) j] l]. Alternatively, n-d arrays can be declared as pointers to its first element which is a (n-1) dimensional array, e.g. and accessed using more idiomatic syntax, e.g. arr j] l].


Indexing notation

Most programming languages that support arrays support the ''store'' and ''select'' operations, and have special syntax for indexing. Early languages used parentheses, e.g. A(i,j), as in FORTRAN; others choose square brackets, e.g. A ,j/code> or A j], as in Algol 60 and Pascal (to distinguish from the use of parentheses for
function call In computer programming, a function (also procedure, method, subroutine, routine, or subprogram) is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times. Callable units provide a p ...
s).


Index types

Array data types are most often implemented as array structures: with the indices restricted to integer (or totally ordered) values, index ranges fixed at array creation time, and multilinear element addressing. This was the case in most "third generation" languages, and is still the case of most systems programming languages such as Ada, C, and C++. In some languages, however, array data types have the semantics of associative arrays, with indices of arbitrary type and dynamic element creation. This is the case in some
scripting languages In computing, a script is a relatively short and simple set of instructions that typically automation, automate an otherwise manual process. The act of writing a script is called scripting. A scripting language or script language is a programming ...
such as Awk and Lua, and of some array types provided by standard C++ libraries.


Bounds checking

Some languages (like Pascal and Modula) perform
bounds checking In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
on every access, raising an exception or aborting the program when any index is out of its valid range. Compilers may allow these checks to be turned off to trade safety for speed. Other languages (like FORTRAN and C) trust the programmer and perform no checks. Good compilers may also analyze the program to determine the range of possible values that the index may have, and this analysis may lead to
bounds-checking elimination In computer science, bounds-checking elimination is a compiler optimization useful in programming languages or runtime systems that enforce bounds checking, the practice of checking every index into an array to verify that the index is within the ...
.


Index origin

Some languages, such as C, provide only zero-based array types, for which the minimum valid value for any index is 0. This choice is convenient for array implementation and address computations. With a language such as C, a pointer to the interior of any array can be defined that will symbolically act as a pseudo-array that accommodates negative indices. This works only because C does not check an index against bounds when used. Other languages provide only ''one-based'' array types, where each index starts at 1; this is the traditional convention in mathematics for matrices and mathematical
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is cal ...
s. A few languages, such as Pascal and Lua, support ''n-based'' array types, whose minimum legal indices are chosen by the programmer. The relative merits of each choice have been the subject of heated debate. Zero-based indexing can avoid off-by-one or
fencepost error An off-by-one error or off-by-one bug (known by acronyms OBOE, OBOB, OBO and OB1) is a logic error that involves a number that differs from its intended value by 1. An off-by-one error can sometimes appear in a mathematical context. It often occ ...
s.
Edsger W. Dijkstra Edsger Wybe Dijkstra ( ; ; 11 May 1930 – 6 August 2002) was a Dutch computer scientist, programmer, software engineer, mathematician, and science essayist. Born in Rotterdam in the Netherlands, Dijkstra studied mathematics and physics and the ...
,
Why numbering should start at zero


Highest index

The relation between numbers appearing in an array declaration and the index of that array's last element also varies by language. In many languages (such as C), one should specify the number of elements contained in the array; whereas in others (such as Pascal and
Visual Basic .NET Visual Basic (VB), originally called Visual Basic .NET (VB.NET), is a multi-paradigm, object-oriented programming language developed by Microsoft and implemented on .NET, Mono, and the .NET Framework. Microsoft launched VB.NET in 2002 as the ...
) one should specify the numeric value of the index of the last element. This distinction is not present in languages where the indices start at 1, such as Lua.


Array algebra

Some programming languages support
array programming In computer science, array programming refers to solutions that allow the application of operations to an entire set of values at once. Such solutions are commonly used in computational science, scientific and engineering settings. Modern program ...
, where operations and functions defined for certain data types are implicitly extended to arrays of elements of those types. Thus one can write ''A''+''B'' to add corresponding elements of two arrays ''A'' and ''B''. Usually these languages provide both the element-by-element multiplication and the standard
matrix product In mathematics, specifically in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
of
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as :a_1x_1+\cdots +a_nx_n=b, linear maps such as :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrix (mathemat ...
, and which of these is represented by the ''*'' operator varies by language. Languages providing array programming capabilities have proliferated since the innovations in this area of APL. These are core capabilities of
domain-specific language A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
s such as
GAUSS Johann Carl Friedrich Gauss (; ; ; 30 April 177723 February 1855) was a German mathematician, astronomer, Geodesy, geodesist, and physicist, who contributed to many fields in mathematics and science. He was director of the Göttingen Observat ...
, IDL,
Matlab MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
, and
Mathematica Wolfram (previously known as Mathematica and Wolfram Mathematica) is a software system with built-in libraries for several areas of technical computing that allows machine learning, statistics, symbolic computation, data manipulation, network ...
. They are a core facility in newer languages, such as Julia and recent versions of Fortran. These capabilities are also provided via standard extension libraries for other general purpose programming languages (such as the widely used
NumPy NumPy (pronounced ) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predeces ...
library for Python).


String types and arrays

Many languages provide a built-in
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
data type, with specialized notation ("
string literal string literal or anonymous string is a literal for a string value in the source code of a computer program. Modern programming languages commonly use a quoted sequence of characters, formally "bracketed delimiters", as in x = "foo", where , "foo ...
s") to build values of that type. In some languages (such as C), a string is just an array of characters, or is handled in much the same way. Other languages, like Pascal, may provide vastly different operations for strings and arrays.


Array index range queries

Some programming languages provide operations that return the size (number of elements) of a vector, or, more generally, range of each index of an array. In C and C++ arrays do not support the ''size'' function, so programmers often have to declare separate variable to hold the size, and pass it to procedures as a separate parameter. Elements of a newly created array may have undefined values (as in C), or may be defined to have a specific "default" value such as 0 or a
null pointer In computing, a null pointer (sometimes shortened to nullptr or null) or null reference is a value saved for indicating that the Pointer (computer programming), pointer or reference (computer science), reference does not refer to a valid Object (c ...
(as in Java). In C++ a object supports the ''store'', ''select'', and ''append'' operations with the performance characteristics discussed above. Vectors can be queried for their size and can be resized. Slower operations like inserting an element in the middle are also supported.


Slicing

An
array slicing In computer programming, array slicing is an operation that extracts a subset of elements from an array and packages them as another array, possibly in a different dimension from the original. Common examples of array slicing are extracting a ...
operation takes a subset of the elements of an array-typed entity (value or variable) and then assembles them as another array-typed entity, possibly with other indices. If array types are implemented as array structures, many useful slicing operations (such as selecting a sub-array, swapping indices, or reversing the direction of the indices) can be performed very efficiently by manipulating the
dope vector In computer programming, a dope vector is a data structure used to hold information about a data object, especially its memory layout. Purpose Dope vectors are most commonly used to describe arrays, which commonly store multiple instances of a part ...
of the structure. The possible slicings depend on the implementation details: for example, Fortran allows slicing off one column of a matrix variable, but not a row, and treat it as a vector. On the other hand, other slicing operations are possible when array types are implemented in other ways.


Resizing

Some languages allow
dynamic array In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard l ...
s (also called resizable, growable, or extensible): array variables whose index ranges may be expanded at any time after creation, without changing the values of its current elements. For one-dimensional arrays, this facility may be provided as an operation append(''A'',''x'') that increases the size of the array ''A'' by one and then sets the value of the last element to ''x''. Other array types (such as Pascal strings) provide a concatenation operator, which can be used together with slicing to achieve that effect and more. In some languages, assigning a value to an element of an array automatically extends the array, if necessary, to include that element. In other array types, a slice can be replaced by an array of different size, with subsequent elements being renumbered accordingly as in Python's list assignment ''A'' :5= 0,20,30/code>, that inserts three new elements (10, 20, and 30) before element "''A'' . Resizable arrays are conceptually similar to lists, and the two concepts are synonymous in some languages. An extensible array can be implemented as a fixed-size array, with a counter that records how many elements are actually in use. The append operation merely increments the counter; until the whole array is used, when the append operation may be defined to fail. This is an implementation of a
dynamic array In computer science, a dynamic array, growable array, resizable array, dynamic table, mutable array, or array list is a random access, variable-size list data structure that allows elements to be added or removed. It is supplied with standard l ...
with a fixed capacity, as in the string type of Pascal. Alternatively, the append operation may re-allocate the underlying array with a larger size, and copy the old elements to the new area.


See also

* Array access analysis * Array database management system *
Bounds-checking elimination In computer science, bounds-checking elimination is a compiler optimization useful in programming languages or runtime systems that enforce bounds checking, the practice of checking every index into an array to verify that the index is within the ...
*
Delimiter-separated values Formats that use delimiter-separated values (also DSV)DSV stands for ''Delimiter Separated Values'' store two-dimensional arrays of data by separating the values in each row with specific delimiter character (computing), characters. Most database ...
*
Index checking In computer programming, bounds checking is any method of detecting whether a variable is within some bounds before it is used. It is usually used to ensure that a number fits into a given type (range checking), or that a variable being used as ...
*
Parallel array In computing, a group of parallel arrays (also known as structure of arrays or SoA) is a form of implicit data structure that uses multiple arrays to represent a singular array of records. It keeps a separate, homogeneous data array for each fi ...
*
Sparse array In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse ...
*
Variable-length array In computer programming, a variable-length array (VLA), also called variable-sized or runtime-sized, is an array data structure whose length is determined at runtime, instead of at compile time. In the language C, the VLA is said to have a variab ...


References


External links


NIST's Dictionary of Algorithms and Data Structures: Array
{{Data types * Data types Composite data types