computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...

, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in

scientific Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe. Science may be as old as the human species, and some of the earliest archeological evidence for ...

and

engineering Engineering is the use of scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad range of more speciali ...

settings. Modern programming languages that support array programming (also known as

vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...

or multidimensional languages) have been engineered specifically to generalize operations on scalars to apply transparently to

matrices Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...

, and higher-dimensional arrays. These include APL, J, Fortran 90,

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...

, Analytica, lists), Octave, R,

Cilk Plus Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loop ...

Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...

, Perl Data Language (PDL). In these languages, an operation that operates on entire arrays can be called a ''vectorized'' operation, regardless of whether it is executed on a

vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data calle ...

, which implements vector instructions. Array programming primitives concisely express broad ideas about data manipulation. The level of concision can be dramatic in certain cases: it is not uncommon to find array programming language one-liners that require several pages of object-oriented code.

Concepts of array

The fundamental idea behind array programming is that operations apply at once to an entire set of values. This makes it a high-level programming model as it allows the programmer to think and operate on whole aggregates of data, without having to resort to explicit loops of individual scalar operations. Kenneth E. Iverson described the rationale behind array programming (actually referring to APL) as follows: The basis behind array programming and thinking is to find and exploit the properties of data where individual elements are similar or adjacent. Unlike object orientation which implicitly breaks down data to its constituent parts (or scalar quantities), array orientation looks to group data and apply a uniform handling.

Function rank Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...

is an important concept to array programming languages in general, by analogy to

tensor In mathematics, a tensor is an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Tensors may map between different objects such as vectors, scalars, and even other tensor ...

rank in mathematics: functions that operate on data may be classified by the number of dimensions they act on. Ordinary multiplication, for example, is a scalar ranked function because it operates on zero-dimensional data (individual numbers). The cross product operation is an example of a vector rank function because it operates on vectors, not scalars.

Matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...

is an example of a 2-rank function, because it operates on 2-dimensional objects (matrices). Collapse operators reduce the dimensionality of an input data array by one or more dimensions. For example, summing over elements collapses the input array by 1 dimension.

Uses

Array programming is very well suited to

implicit parallelization In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's constructs. ...

; a topic of much research nowadays. Further,

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...

and compatible CPUs developed and produced after 1997 contained various instruction set extensions, starting from MMX and continuing through

SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitectu ...

and

3DNow! 3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of fl ...

, which include rudimentary

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...

array capabilities. Array processing is distinct from parallel processing in that one physical processor performs operations on a group of items simultaneously while parallel processing aims to split a larger problem into smaller ones (

MIMD In computing, multiple instruction, multiple data (MIMD) is a technique employed to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be exe ...

) to be solved piecemeal by numerous processors. Processors with two or more cores are increasingly common today.

Languages

The canonical examples of array programming languages are Fortran, APL, and J. Others include: A+, Analytica,

Chapel A chapel is a Christian place of prayer and worship that is usually relatively small. The term has several meanings. Firstly, smaller spaces inside a church that have their own altar are often called chapels; the Lady chapel is a common ty ...

, IDL,

, K, Klong, Q,

GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...

, PDL, R, S-Lang, SAC,

Nial Nial (from "Nested Interactive Array Language") is a high-level array programming language developed from about 1981 by Mike Jenkins of Queen's University, Kingston, Ontario, Canada. Jenkins co-created the Jenkins–Traub algorithm. Nial c ...

, ZPL and

TI-BASIC TI-BASIC is the official name of a BASIC-like language built into Texas Instruments (TI)'s graphing calculators. TI-BASIC is a language family of three different and incompatible versions, released on different products: * TI-BASIC 83 (on Z80 ...

Scalar languages

In scalar languages such as C and Pascal, operations apply only to single values, so ''a''+''b'' expresses the addition of two numbers. In such languages, adding one array to another requires indexing and looping, the coding of which is tedious. for (i = 0; i < n; i++) for (j = 0; j < n; j++) a j] += b j]; In array-based languages, for example in Fortran, the nested for-loop above can be written in array-format in one line, a = a + b or alternatively, to emphasize the array nature of the objects, a(:,:) = a(:,:) + b(:,:) While scalar languages like C do not have native array programming elements as part of the language proper, this does not mean programs written in these languages never take advantage of the underlying techniques of vectorization (i.e., utilizing a CPU's Single instruction, multiple data, vector-based instructions if it has them or by using multiple CPU cores). Some C compilers like GCC at some optimization levels detect and vectorize sections of code that its heuristics determine would benefit from it. Another approach is given by the

OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syst ...

API, which allows one to parallelize applicable sections of code by taking advantage of multiple CPU cores.

Array languages

In array languages, operations are generalized to apply to both scalars and arrays. Thus, ''a''+''b'' expresses the sum of two scalars if ''a'' and ''b'' are scalars, or the sum of two arrays if they are arrays. An array language simplifies programming but possibly at a cost known as the ''abstraction penalty''. Because the additions are performed in isolation from the rest of the coding, they may not produce the optimally most efficient code. (For example, additions of other elements of the same array may be subsequently encountered during the same execution, causing unnecessary repeated lookups.) Even the most sophisticated

optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...

would have an extremely hard time amalgamating two or more apparently disparate functions which might appear in different program sections or sub-routines, even though a programmer could do this easily, aggregating sums on the same pass over the array to minimize overhead).

Ada

The previous C code would become the following in the

Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...

language, which supports array-programming syntax. A := A + B;

APL

APL uses single character Unicode symbols with no syntactic sugar. A ← A + B This operation works on arrays of any rank (including rank 0), and on a scalar and an array. Dyalog APL extends the original language with

augmented assignment Augmented assignment (or compound assignment) is the name given to certain assignment operators in certain programming languages (especially those derived from C). An augmented assignment is generally used to replace a statement where an operat ...

s: A +← B

Analytica

Analytica provides the same economy of expression as Ada.

A := A + B;

BASIC

Dartmouth BASIC Dartmouth BASIC is the original version of the BASIC programming language. It was designed by two professors at Dartmouth College, John G. Kemeny and Thomas E. Kurtz. With the underlying Dartmouth Time Sharing System (DTSS), it offered an inte ...

had MAT statements for matrix and array manipulation in its third edition (1966). DIM A(4),B(4),C(4) MAT A = 1 MAT B = 2 * A MAT C = A + B MAT PRINT A,B,C

Mata

Stata's matrix programming language Mata supports array programming. Below, we illustrate addition, multiplication, addition of a matrix and a scalar, element by element multiplication, subscripting, and one of Mata's many inverse matrix functions. . mata: : A = (1,2,3) \(4,5,6) : A 1 2 3 +-------------+ 1 , 1 2 3 , 2 , 4 5 6 , +-------------+ : B = (2..4) \(1..3) : B 1 2 3 +-------------+ 1 , 2 3 4 , 2 , 1 2 3 , +-------------+ : C = J(3,2,1) // A 3 by 2 matrix of ones : C 1 2 +---------+ 1 , 1 1 , 2 , 1 1 , 3 , 1 1 , +---------+ : D = A + B : D 1 2 3 +-------------+ 1 , 3 5 7 , 2 , 5 7 9 , +-------------+ : E = A*C : E 1 2 +-----------+ 1 , 6 6 , 2 , 15 15 , +-----------+ : F = A:*B : F 1 2 3 +----------------+ 1 , 2 6 12 , 2 , 4 10 18 , +----------------+ : G = E :+ 3 : G 1 2 +-----------+ 1 , 9 9 , 2 , 18 18 , +-----------+ : H = F 2\1), (1, 2) // Subscripting to get a submatrix of F and : // switch row 1 and 2 : H 1 2 +-----------+ 1 , 4 10 , 2 , 2 6 , +-----------+ : I = invsym(F'*F) // Generalized inverse (F*F^(-1)F=F) of a : // symmetric positive semi-definite matrix : I ymmetric 1 2 3 +-------------------------------------------+ 1 , 0 , 2 , 0 3.25 , 3 , 0 -1.75 .9444444444 , +-------------------------------------------+ : end

MATLAB

The implementation in

allows the same economy allowed by using the Fortran language. A = A + B; A variant of the MATLAB language is the

language, which extends the original language with augmented assignments: A += B; Both MATLAB and GNU Octave natively support

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices ...

operations such as matrix multiplication,

matrix inversion In linear algebra, an -by- square matrix is called invertible (also nonsingular or nondegenerate), if there exists an -by- square matrix such that :\mathbf = \mathbf = \mathbf_n \ where denotes the -by- identity matrix and the multiplicati ...

, and the numerical solution of system of linear equations, even using the Moore–Penrose pseudoinverse. The

example of the inner product of two arrays can be implemented using the native matrix multiplication operator. If a is a row vector of size nand b is a corresponding column vector of size 1 a * b; The inner product between two matrices having the same number of elements can be implemented with the auxiliary operator (:), which reshapes a given matrix into a column vector, and the

transpose In linear algebra, the transpose of a matrix is an operator which flips a matrix over its diagonal; that is, it switches the row and column indices of the matrix by producing another matrix, often denoted by (among other notations). The tr ...

operator ': A(:)' * B(:);

rasql

The rasdaman query language is a database-oriented array-programming language. For example, two arrays could be added with the following query: SELECT A + B FROM A, B

R

The R language supports array paradigm by default. The following example illustrates a process of multiplication of two matrices followed by an addition of a scalar (which is, in fact, a one-element vector) and a vector: > A <- matrix(1:6, nrow=2) # !!this has nrow=2 ... and A has 2 rows > A 1 2 3 , 1 3 5 , 2 4 6 > B <- t( matrix(6:1, nrow=2) ) # t() is a transpose operator !!this has nrow=2 ... and B has 3 rows --- a clear contradiction to the definition of A > B 1 2 , 6 5 , 4 3 , 2 1 > C <- A %*% B > C 1 2 , 28 19 , 40 28 > D <- C + 1 > D 1 2 , 29 20 , 41 29 > D + c(1, 1) # c() creates a vector 1 2 , 30 21 , 42 30

Mathematical reasoning and language notation

The matrix left-division operator concisely expresses some semantic properties of matrices. As in the scalar equivalent, if the (

determinant In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if a ...

of the) coefficient (matrix) A is not null then it is possible to solve the (vectorial) equation A * x = b by left-multiplying both sides by the inverse of A: A⁻¹ (in both MATLAB and GNU Octave languages: A^-1). The following mathematical statements hold when A is a full rank square matrix: :

A^-1  *(A * x)
A^-1 * (b)

(A^-1 * A)* x 
A^-1 *  b

(matrix-multiplication

associativity In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacement ...

) :x = A^-1 * b where is the equivalence

relational operator In computer science, a relational operator is a programming language construct or operator that tests or defines some kind of relation between two entities. These include numerical equality (''e.g.'', ) and inequalities (''e.g.'', ). In pr ...

. The previous statements are also valid MATLAB expressions if the third one is executed before the others (numerical comparisons may be false because of round-off errors). If the system is overdetermined – so that A has more rows than columns – the pseudoinverse A⁺ (in MATLAB and GNU Octave languages: pinv(A)) can replace the inverse A⁻¹, as follows: :

pinv(A)  *(A * x)
pinv(A) * (b)

(pinv(A) * A)* x 
pinv(A) * b

(matrix-multiplication associativity) :x = pinv(A) * b However, these solutions are neither the most concise ones (e.g. still remains the need to notationally differentiate overdetermined systems) nor the most computationally efficient. The latter point is easy to understand when considering again the scalar equivalent a * x = b, for which the solution x = a^-1 * b would require two operations instead of the more efficient x = b / a. The problem is that generally matrix multiplications are not

commutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name of ...

as the extension of the scalar solution to the matrix case would require: :

(a * x)/ a 
b / a

(x * a)/ a 
b / a

(commutativity does not hold for matrices!) :

x * (a / a)
b / a

(associativity also holds for matrices) :x = b / a The MATLAB language introduces the left-division operator \ to maintain the essential part of the analogy with the scalar case, therefore simplifying the mathematical reasoning and preserving the conciseness: :

A \ (A * x)
A \ b

(A \ A)* x 
A \ b

(associativity also holds for matrices, commutativity is no more required) :x = A \ b This is not only an example of terse array programming from the coding point of view but also from the computational efficiency perspective, which in several array programming languages benefits from quite efficient linear algebra libraries such as

ATLAS An atlas is a collection of maps; it is typically a bundle of maps of Earth or of a region of Earth. Atlases have traditionally been bound into book form, but today many atlases are in multimedia formats. In addition to presenting geograp ...

LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It als ...

. Returning to the previous quotation of Iverson, the rationale behind it should now be evident:

Third-party libraries

The use of specialized and efficient libraries to provide more terse abstractions is also common in other programming languages. In

C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...

several linear algebra libraries exploit the language's ability to overload operators. In some cases a very terse abstraction in those languages is explicitly influenced by the array programming paradigm, as the NumPy extension library to

Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...

, Armadillo and

Blitz++ Blitz++ is a high-performance vector mathematics library written in C++. This library is intended for use in scientific applications that might otherwise be implemented with Fortran or MATLAB MATLAB (an abbreviation of "MATrix LABoratory") ...

libraries do.

References

External links

"No stinking loops" programming
{{Types of programming languages Programming paradigms Articles with example MATLAB/Octave code Articles with example BASIC code Articles with example Ada code Articles with example R code