graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...

and

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, the lowest common ancestor (LCA) (also called least common ancestor) of two nodes and in a

tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...

(DAG) is the lowest (i.e. deepest) node that has both and as descendants, where we define each node to be a descendant of itself (so if has a direct connection from , is the lowest common ancestor). The LCA of and in is the shared ancestor of and that is located farthest from the root. Computation of lowest common ancestors may be useful, for instance, as part of a procedure for determining the distance between pairs of nodes in a tree: the distance from to can be computed as the distance from the root to , plus the distance from the root to , minus twice the distance from the root to their lowest common ancestor . In a

tree data structure In computer science, a tree is a widely used abstract data type that represents a hierarchical tree structure with a set of connected nodes. Each node in the tree can be connected to many children (depending on the type of tree), but must be conn ...

where each node points to its parent, the lowest common ancestor can be easily determined by finding the first intersection of the paths from and to the root. In general, the computational time required for this algorithm is where is the height of the tree (length of longest path from a leaf to the root). However, there exist several algorithms for processing trees so that lowest common ancestors may be found more quickly. Tarjan's off-line lowest common ancestors algorithm, for example, preprocesses a tree in linear time to provide constant-time LCA queries. In general DAGs, similar algorithms exist, but with super-linear complexity.

History

The lowest common ancestor problem was defined by , but were the first to develop an optimally efficient lowest common ancestor data structure. Their algorithm processes any tree in linear time, using a heavy path decomposition, so that subsequent lowest common ancestor queries may be answered in constant time per query. However, their data structure is complex and difficult to implement. Tarjan also found a simpler but less efficient algorithm, based on the union-find data structure, for computing lowest common ancestors of an offline batch of pairs of nodes. simplified the data structure of Harel and Tarjan, leading to an implementable structure with the same asymptotic preprocessing and query time bounds. Their simplification is based on the principle that, in two special kinds of trees, lowest common ancestors are easy to determine: if the tree is a path, then the lowest common ancestor can be computed simply from the minimum of the levels of the two queried nodes, while if the tree is a

complete binary tree In computer science, a binary tree is a Tree (data structure), tree data structure in which each node has at most two child node, children, referred to as the ''left child'' and the ''right child''. That is, it is a m-ary tree, ''k''-ary tree wi ...

, the nodes may be indexed in such a way that lowest common ancestors reduce to simple binary operations on the indices. The structure of Schieber and Vishkin decomposes any tree into a collection of paths, such that the connections between the paths have the structure of a binary tree, and combines both of these two simpler indexing techniques. discovered a completely new way to answer lowest common ancestor queries, again achieving linear preprocessing time with constant query time. Their method involves forming an

Euler tour In graph theory, an Eulerian trail (or Eulerian path) is a trail in a finite graph that visits every edge exactly once (allowing for revisiting vertices). Similarly, an Eulerian circuit or Eulerian cycle is an Eulerian trail that starts and end ...

of a graph formed from the input tree by doubling every edge, and using this tour to write a sequence of level numbers of the nodes in the order the tour visits them; a lowest common ancestor query can then be transformed into a query that seeks the minimum value occurring within some subinterval of this sequence of numbers. They then handle this

range minimum query In computer science, a range minimum query (RMQ) solves the problem of finding the minimal value in a sub-array of an array of comparable objects. Range minimum queries have several use cases in computer science, such as the lowest common ancesto ...

problem (RMQ) by combining two techniques, one technique based on precomputing the answers to large intervals that have sizes that are powers of two, and the other based on table lookup for small-interval queries. This method was later presented in a simplified form by . As had been previously observed by , the range minimum problem can in turn be transformed back into a lowest common ancestor problem using the technique of Cartesian trees. Further simplifications were made by and . proposed the dynamic LCA variant of the problem in which the data structure should be prepared to handle LCA queries intermixed with operations that change the tree (that is, rearrange the tree by adding and removing edges). This variant can be solved in

O(\log N)

time in the total size of the tree for all modifications and queries. This is done by maintaining the forest using the dynamic trees data structure with partitioning by size; this then maintains a heavy-light decomposition of each tree, and allows LCA queries to be carried out in logarithmic time in the size of the tree.

Linear space and constant search time solution to LCA in trees

As mentioned above, LCA can be reduced to RMQ. An efficient solution to the resulting RMQ problem starts by partitioning the number sequence into blocks. Two different techniques are used for queries across blocks and within blocks.

Reduction from LCA to RMQ

Reduction of LCA to RMQ starts by walking the tree. For each node visited, record in sequence its label and depth. Suppose nodes and occur in positions and in this sequence, respectively. Then the LCA of and will be found in position RMQ(, ), where the RMQ is taken over the depth values. LMC to RMQ

Linear space and constant search time algorithm for RMQ reduced from LCA

Despite that there exists a constant time and linear space solution for general RMQ, but a simplified solution can be applied that make uses of LCA’s properties. This simplified solution can only be used for RMQ reduced from LCA. Similar to the solution mentioned above, we divide the sequence into each block

B_i

, where each block

B_i

has size of

b=\log n

By splitting the sequence into blocks, the

RMQ(i,j)

query can be solved by solving two different cases:

Case 1: if i and j are in different blocks

To answer the

RMQ(i,j)

query in case one, there are 3 groups of variables precomputed to help reduce query time. First, the minimum element with the smallest index in each block

B_i

is precomputed and denoted as

y_i

. A set of

y_i

takes

O(n/b)

space. Second, given the set of

y_i

, the RMQ query for this set is precomputed using the solution with constant time and linearithmic space. There are

n/b

blocks, so the lookup table in that solution takes

O( \log )

space. Because

b=\log n

O( \log )

O(n)

space. Hence, the precomputed RMQ query using the solution with constant time and linearithmic space on these blocks only take

O(n)

space. Third, in each block

B_i

, let

k_i

be an index in

B_i

such that

0 \leq ki < b

. For all

k_i

from

0

until

b

, block

B_i

is divided into two intervals

the solution with constant time and linearithmic space # The prefix min in [0, j \mod b) in the block B_j All 3 questions can be answered in constant time. Hence, case 1 can be answered in linear space and constant time.

Case 2: if i and j are in the same block

The sequence of RMQ that reduced from LCA has one property that a normal RMQ doesn’t have. The next element is always +1 or -1 from the current element. For example: Therefore, each block

B_i

can be encoded as a bitstring with 0 represents the current depth -1, and 1 represent the current depth +1. This transformation turns a block

B_i

into a bitstring of size

b-1

. A bitstring of size

b-1

has

2^

possible bitstrings. Since

b=\log n

, so

2^ \leq 2^b = 2^ = n^ = \sqrt

. Hence,

B_i

is always one of the

\sqrt

possible bitstring with size of

b-1

. Then, for each possible bitstring, we apply Range minimum query#Naive solution">the naïve quadratic space constant time solution. This will take up

\sqrt\cdot b^2

spaces, which is

O(\sqrt\cdot(\log n)^2) \le O(\sqrt\cdot\sqrt) = O(n)

. Therefore, answering the

RMQ(i,j)

query in case 2 is simply finding the corresponding block (in which is a bitstring) and perform a table lookup for that bitstring. Hence, case 2 can be solved using linear space with constant searching time.

Extension to directed acyclic graphs

While originally studied in the context of trees, the notion of lowest common ancestors can be defined for directed acyclic graphs (DAGs), using either of two possible definitions. In both, the edges of the DAG are assumed to point from parents to children. * Given , define a

poset In mathematics, especially order theory, a partial order on a Set (mathematics), set is an arrangement such that, for certain pairs of elements, one precedes the other. The word ''partial'' is used to indicate that not every pair of elements need ...

such that iff is reachable from . The lowest common ancestors of and are then the minimum elements under ≤ of the common ancestor set . * gave an equivalent definition, where the lowest common ancestors of and are the nodes of out-degree zero in the subgraph of induced by the set of common ancestors of and . In a tree, the lowest common ancestor is unique; in a DAG of nodes, each pair of nodes may have as much as LCAs , while the existence of an LCA for a pair of nodes is not even guaranteed in arbitrary connected DAGs. A brute-force algorithm for finding lowest common ancestors is given by : find all ancestors of and , then return the maximum element of the intersection of the two sets. Better algorithms exist that, analogous to the LCA algorithms on trees, preprocess a graph to enable constant-time LCA queries. The problem of ''LCA existence'' can be solved optimally for sparse DAGs by means of an algorithm due to . present a unified framework for preprocessing directed acyclic graphs to compute ''a representative'' lowest common ancestor in ''a rooted DAG'' in constant time. Their framework can achieve near-linear preprocessing times for sparse graphs and is available for public use.

Applications

The problem of computing lowest common ancestors of classes in an inheritance hierarchy arises in the implementation of

object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...

systems . The LCA problem also finds applications in models of

complex system A complex system is a system composed of many components that may interact with one another. Examples of complex systems are Earth's global climate, organisms, the human brain, infrastructure such as power grid, transportation or communication sy ...

s found in

distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...

References

*. *. *. A preliminary version appeared in SPAA 2002. *. * . *. * *. *. *. *. * *. *{{citation , last1 = Sleator , author1-link = Daniel Sleator , first1 = D. D. , last2 = Tarjan , author2-link = Robert Tarjan , first2 = R. E. , doi = 10.1145/800076.802464 , chapter = A Data Structure for Dynamic Trees , title = Proceedings of the thirteenth annual ACM symposium on Theory of computing - STOC '81 , pages = 114–122 , year = 1983 , s2cid = 15402750 , chapter-url = https://www.cs.cmu.edu/~sleator/papers/dynamic-trees.pdf

External links

Lowest Common Ancestor of a Binary Search Tree
by Kamal Rawat
Python implementation of the algorithm of Bender and Farach-Colton for trees
by

David Eppstein David Arthur Eppstein (born 1963) is an American computer scientist and mathematician. He is a distinguished professor of computer science at the University of California, Irvine. He is known for his work in computational geometry, graph algor ...

Python implementation for arbitrary directed acyclic graphs

Lecture notes on LCAs from a 2003 MIT Data Structures course
Course by

Erik Demaine Erik D. Demaine (born February 28, 1981) is a Canadian-American professor of computer science at the Massachusetts Institute of Technology and a former child prodigy. Early life and education Demaine was born in Halifax, Nova Scotia, to mathe ...

, notes written by Loizos Michael and Christos Kapoutsis
Notes from 2007 offering of same course
written by Alison Cichowlas.

in C. A simplified version of the Schieber–Vishkin technique that works only for balanced binary trees.
Video
of

Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...

explaining the Schieber–Vishkin technique
Range Minimum Query and Lowest Common Ancestor article in Topcoder

Documentation for the lca package for Haskell
by Edward Kmett, which includes the skew-binary random access list algorithm
Purely functional data structures for on-line LCA
slides for the same package. Theoretical computer science Trees (graph theory)