In
graph theory
In mathematics, graph theory is the study of ''graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conne ...
and
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes and in a
tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
or
directed acyclic graph
In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ve ...
(DAG) is the lowest (i.e. deepest) node that has both and as descendants, where we define each node to be a descendant of itself (so if has a direct connection from , is the lowest common ancestor).
The LCA of and in is the shared ancestor of and that is located farthest from the root. Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from to can be computed as the distance from the root to , plus the distance from the root to , minus twice the distance from the root to their lowest common ancestor . In
ontologies, the lowest common ancestor is also known as the least common ancestor.
In a
tree data structure
In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...
where each node points to its parent, the lowest common ancestor can be easily determined by finding the first intersection of the paths from and to the root. In general, the computational time required for this algorithm is where is the height of the tree (length of longest path from a leaf to the root). However, there exist several algorithms for processing trees so that lowest common ancestors may be found more quickly.
Tarjan's off-line lowest common ancestors algorithm, for example, preprocesses a tree in linear time to provide constant-time LCA queries. In general DAGs, similar algorithms exist, but with super-linear complexity.
History
The lowest common ancestor problem was defined by , but were the first to develop an optimally efficient lowest common ancestor data structure. Their algorithm processes any tree in linear time, using a
heavy path decomposition In combinatorial mathematics and theoretical computer science, heavy path decomposition (also called heavy-light decomposition) is a technique for decomposing a rooted tree into a set of paths. In a heavy path decomposition, each non-leaf node sel ...
, so that subsequent lowest common ancestor queries may be answered in constant time per query. However, their data structure is complex and difficult to implement. Tarjan also found a simpler but less efficient algorithm, based on the
union-find
In computer science, a disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets. Equivalently, it stores a partition of a set ...
data structure, for
computing lowest common ancestors of an offline batch of pairs of nodes.
simplified the data structure of Harel and Tarjan, leading to an implementable structure with the same asymptotic preprocessing and query time bounds. Their simplification is based on the principle that, in two special kinds of trees, lowest common ancestors are easy to determine: if the tree is a path, then the lowest common ancestor can be computed simply from the minimum of the levels of the two queried nodes, while if the tree is a
complete binary tree
In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary tr ...
, the nodes may be indexed in such a way that lowest common ancestors reduce to simple binary operations on the indices. The structure of Schieber and Vishkin decomposes any tree into a collection of paths, such that the connections between the paths have the structure of a binary tree, and combines both of these two simpler indexing techniques.
discovered a completely new way to answer lowest common ancestor queries, again achieving linear preprocessing time with constant query time. Their method involves forming an
Euler tour
In graph theory, an Eulerian trail (or Eulerian path) is a trail in a finite graph that visits every edge exactly once (allowing for revisiting vertices). Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and ends ...
of a graph formed from the input tree by doubling every edge, and using this tour to write a sequence of level numbers of the nodes in the order the tour visits them; a lowest common ancestor query can then be transformed into a query that seeks the minimum value occurring within some subinterval of this sequence of numbers. They then handle this
range minimum query problem (RMQ) by combining two techniques, one technique based on precomputing the answers to large intervals that have sizes that are powers of two, and the other based on table lookup for small-interval queries. This method was later presented in a simplified form by . As had been previously observed by , the range minimum problem can in turn be transformed back into a lowest common ancestor problem using the technique of
Cartesian tree
In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric (in-order) traversal of the tree returns the original sequenc ...
s.
Further simplifications were made by and .
proposed the dynamic LCA variant of the problem in which the data structure should be prepared to handle LCA queries intermixed with operations that change the tree (that is, rearrange the tree by adding and removing edges). This variant can be solved in
time in the total size of the tree for all modifications and queries. This is done by maintaining the forest using the dynamic trees data structure with partitioning by size; this then maintains a heavy-light decomposition of each tree, and allows LCA queries to be carried out in logarithmic time in the size of the tree.
One can also improve the naïve online algorithm's computation time to
in the height of the tree by storing the paths through the tree using skew-binary random access lists, although edits are limited to extension at the leaves.
Linear space and constant search time solution to tree based LCA problem
As mentioned above, LCA can be reduced into RMQ first, then divided the sequence of numbers into intervals and apply two different techniques to handle range minimum query across different intervals, and handle range minimum query within an interval.
Reduction from LCA to RMQ
Reduction of LCA into RMQ started by walking the tree. When walking the tree, the order of the label and the depth of the node visited is recorded. Then a LCA question can be answered by answering a RMQ question which the input of a RMQ problem is the indices of two child nodes in the list of visited nodes.

Therefore, LCA can be solved by solving RMQ.
Linear space and constant search time algorithm for RMQ reduced from LCA
Despite that there exists
a constant time and linear space solution for general RMQ, but a simplified solution can be applied that make uses of LCA’s properties. This simplified solution can only be used for RMQ reduced from LCA.
Similar to the solution mentioned above, we divide the sequence into each block
, where each block
has size of
.

By splitting the sequence into blocks, the
query can be solved by solving two different cases:
Case 1: if i and j are in different blocks
To answer the
query in case one, there are 3 groups of variables precomputed to help reducing query time.
First, the minimum element with smallest index in each block
is precomputed and denoted as
. A set of
takes
space.
Second, given the set of
, the RMQ query for this set is precomputed using
the solution with constant time and linearithmic space. There are
blocks, so the lookup table in that solution takes
space. Because
,
=
space. Hence the precomputed RMQ query using
the solution with constant time and linearithmic space on these blocks only take
space.
Third, in each block
, let
be an index in
such that
. For all
from
until
, block
is divided into two intervals