computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, the longest increasing subsequence problem is to find a subsequence of a given

sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...

in which the subsequence's elements are in sorted order, lowest to highest, and in which the subsequence is as long as possible. This subsequence is not necessarily contiguous, or unique. Longest increasing subsequences are studied in the context of various disciplines related to mathematics, including

algorithmics Algorithmics is the systematic study of the design and analysis of algorithms. It is fundamental and one of the oldest fields of computer science. It includes algorithm design, the art of building a procedure which can solve efficiently a specific ...

, random matrix theory,

representation theory Representation theory is a branch of mathematics that studies abstract algebraic structures by ''representing'' their elements as linear transformations of vector spaces, and studies modules over these abstract algebraic structures. In essen ...

, and

physics Physics is the natural science that studies matter, its fundamental constituents, its motion and behavior through space and time, and the related entities of energy and force. "Physical science is that department of knowledge which rel ...

. The longest increasing subsequence problem is solvable in time

O(n \log n),

where

n

denotes the length of the input sequence.

Example

In the first 16 terms of the binary Van der Corput sequence :0, 8, 4, 12, 2, 10, 6, 14, 1, 9, 5, 13, 3, 11, 7, 15 a longest increasing subsequence is :0, 2, 6, 9, 11, 15. This subsequence has length six; the input sequence has no seven-member increasing subsequences. The longest increasing subsequence in this example is not the only solution: for instance, :0, 4, 6, 9, 11, 15 :0, 2, 6, 9, 13, 15 :0, 4, 6, 9, 13, 15 are other increasing subsequences of equal length in the same input sequence.

Relations to other algorithmic problems

The longest increasing subsequence problem is closely related to the

longest common subsequence problem The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, su ...

, which has a quadratic time

dynamic programming Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. I ...

solution: the longest increasing subsequence of a sequence

S

is the longest common subsequence of

S

and

T,

where

T

is the result of sorting

S.

However, for the special case in which the input is a permutation of the integers

1, 2, \ldots, n,

this approach can be made much more efficient, leading to time bounds of the form

O(n \log \log n).

The largest

clique A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popula ...

in a

permutation graph In the mathematical field of graph theory, a permutation graph is a graph whose vertices represent the elements of a permutation, and whose edges represent pairs of elements that are reversed by the permutation. Permutation graphs may also be de ...

corresponds to the longest decreasing subsequence of the permutation that defines the graph (assuming the original non-permuted sequence is sorted from lowest value to highest). Similarly, the maximum independent set in a permutation graph corresponds to the longest non-decreasing subsequence. Therefore, longest increasing subsequence algorithms can be used to solve the

clique problem In computer science, the clique problem is the computational problem of finding cliques (subsets of vertices, all adjacent to each other, also called complete subgraphs) in a graph. It has several different formulations depending on which cliq ...

efficiently in permutation graphs. In the Robinson–Schensted correspondence between

permutation In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or p ...

s and Young tableaux, the length of the first row of the tableau corresponding to a permutation equals the length of the longest increasing subsequence of the permutation, and the length of the first column equals the length of the longest decreasing subsequence..

Efficient algorithms

The algorithm outlined below solves the longest increasing subsequence problem efficiently with arrays and

binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...

ing. It processes the sequence elements in order, maintaining the longest increasing subsequence found so far. Denote the sequence values as

\ldots,

etc. Then, after processing

X

the algorithm will have stored an integer

L

and values in two arrays: *

L

— stores the length of the longest increasing subsequence found so far. *

M /math> — stores the index k of the smallest value X /math> such that there is an increasing subsequence of length l ending at X /math> in the range k \leq i. Explicitly, suppose that K_denotes the set of all indices j such that j \leq i and there exists an increasing subsequence of length l ending at X Then k = M /math> is the index in K_for which X [l is minimized; meaning that M[l">[l.html" ;"title="[l">[l is minimized; meaning that M[l\in K_and X [l = \min_ X[j] (or equivalently, M \in K_and for every j \in K_, X [l \leq X[j]); if multiple indices satisfy this condition then M /math> is the largest one. 
** To clarify, "there exists an increasing subsequence of length l ending at X /math>" means that there exist l indices i_1 < i_2 < \cdots < i_l = k ending at k such that X\left_1\right < X\left_2\right < \cdots < X ** Note that 1 \leq l \leq i + 1, because l \geq 1 represents the length of the increasing subsequence, and k \geq 0 represents the index of its termination.
** The length of M is 1 more than the length of X but it is possible that not all elements in this array are used by the algorithm (in fact, if the longest increasing sequence has length L then only M \ldots, M /math> are used by the algorithm). If however M /math> is used/defined then l - 1 \leq M /math> (and moreover, at iteration i, M \leq i will also hold). M /math> is undefined since sequences of length 0 have no ending index (M /math> can be any value).
* P /math> — stores the index of the predecessor of X /math> in the longest increasing subsequence ending at X ** The length of P is equal to that of X, ** If k > 0 then P < k while P /math> is undefined since X /math> has no predecessor (P /math> can be any value).
Because the algorithm below uses

zero-based numbering Zero-based numbering is a way of numbering in which the initial element of a sequence is assigned the index 0, rather than the index 1 as is typical in everyday ''non-mathematical'' or ''non-programming'' circumstances. Under zero-base ...

, for clarity

M

is padded with

M

which goes unused so that

M /math> corresponds to a subsequence of length l. A real implementation can skip M /math> and adjust the indices accordingly.

Note that, at any point in the algorithm, the sequence X [1, X[M[2">[1.html" ;"title="[1">[1, X[M[2, \ldots, X[M[L">[1.html"_;"title="[1">[1<_a>,_X[M[2.html" ;"title="[1.html" ;"title="[1">[1, X[M[2">[1.html" ;"title="[1">[1, X[M[2, \ldots, X[M[L is increasing.  For, if there is an increasing subsequence of length l \geq 2 ending at X [l, then there is also a subsequence of length l - 1 ending at a smaller value: namely the one ending at X[P [l]. Thus, we may do binary searches in this sequence in logarithmic time.

The algorithm, then, proceeds as follows:

P = array of length N M = array of length N + 1 M = -1 // undefined so can be set to any value L = 0 for i in range 0 to N-1: // Binary search for the smallest positive l ≤ L // such that X [l > X[i">[l.html" ;"title="[l">[l > X[i lo = 1 hi = L + 1 while lo < hi: mid = lo + floor((hi-lo)/2) // lo <= mid < hi if X[M[mid >= X hi = mid else: // if X[M[mid <= X lo = mid + 1 // After searching, lo

hi is 1 greater than the // length of the longest prefix of X newL = lo // The predecessor of X is the last index of // the subsequence of length newL-1 P = M ewL-1 M ewL= i if newL > L: // If we found a subsequence longer than any we've // found yet, update L L = newL // Reconstruct the longest increasing subsequence // It consists of the values of X at the L indices: // ..., P [M[L">[M[L.html" ;"title="[M[L">[M[L P[M[L">[M[L.html"_;"title="[M[L">[M[L<_a>.html" ;"title="[M[L.html" ;"title="[M[L">[M[L">[M[L.html" ;"title="[M[L">[M[L P[M[L, M[L] S = array of length L k = M[L] for j in range L-1 to 0: S[j] = X[k] k = P[k] return S Because the algorithm performs a single binary search per sequence element, its total time can be expressed using Big O notation as

O(n \log n).

discusses a variant of this algorithm, which he credits to

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer ...

; in the variant that he studies, the algorithm tests whether each value

X /math> can be used to extend the current longest increasing sequence, in constant time, prior to doing the binary search. With this modification, the algorithm uses at most n \log_2 n - n \log_2 \log_2 n + O(n) comparisons in the worst case, which is optimal for a comparison-based algorithm up to the constant factor in the O(n). term.

Example run

Length bounds

According to the Erdős–Szekeres theorem, any sequence of

n^2 + 1

distinct integers has an increasing or a decreasing subsequence of length

n + 1.

For inputs in which each permutation of the input is equally likely, the expected length of the longest increasing subsequence is approximately

2 \sqrt.

. In the limit as

n

approaches infinity, the length of the longest increasing subsequence of a randomly permuted sequence of

n

items has a distribution approaching the Tracy–Widom distribution, the distribution of the largest eigenvalue of a random matrix in the Gaussian unitary ensemble.

Online algorithms

The longest increasing subsequence has also been studied in the setting of

online algorithm In computer science, an online algorithm is one that can process its input piece-by-piece in a serial fashion, i.e., in the order that the input is fed to the algorithm, without having the entire input available from the start. In contrast, an o ...

s, in which the elements of a sequence of independent random variables with continuous distribution

F

– or alternatively the elements of a

random permutation A random permutation is a random ordering of a set of objects, that is, a permutation-valued random variable. The use of random permutations is often fundamental to fields that use randomized algorithms such as coding theory, cryptography, and sim ...

– are presented one at a time to an algorithm that must decide whether to include or exclude each element, without knowledge of the later elements. In this variant of the problem, which allows for interesting applications in several contexts, it is possible to devise an optimal selection procedure that, given a random sample of size

n

as input, will generate an increasing sequence with maximal expected length of size approximately

\sqrt.

The length of the increasing subsequence selected by this optimal procedure has variance approximately equal to

\sqrt / 3,

and its limiting distribution is asymptotically normal after the usual centering and scaling. The same asymptotic results hold with more precise bounds for the corresponding problem in the setting of a Poisson arrival process. A further refinement in the Poisson process setting is given through the proof of a

central limit theorem In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables thems ...

for the optimal selection process which holds, with a suitable normalization, in a more complete sense than one would expect. The proof yields not only the "correct" functional limit theorem but also the (singular)

covariance matrix In probability theory and statistics, a covariance matrix (also known as auto-covariance matrix, dispersion matrix, variance matrix, or variance–covariance matrix) is a square matrix giving the covariance between each pair of elements o ...

of the three-dimensional process summarizing all interacting processes. .

Application

*Part of MUMmer (Maximum Unique Match finder) system for aligning entire genomes. *Used in version control systems like Git etc. *Used in Patience Diff, a diffing algorithm (computes and displays the differences between the content of files), which is used in the “Bazaar” (Bazaar is a version control system that helps you track project history over time and to collaborate easily with others..)

References

{{reflist

External links

Algorithmist's Longest Increasing Subsequence

Finding count of longest increased subsequences

Poldo's diet
Problems on strings Combinatorics Formal languages Dynamic programming