In
graph theory, a branch of mathematics, graph canonization is the problem of finding a
canonical form of a given graph ''G''. A canonical form is a
labeled graph Canon(''G'') that is
isomorphic
In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word is ...
to ''G'', such that every graph that is isomorphic to ''G'' has the same canonical form as ''G''. Thus, from a solution to the graph canonization problem, one could also solve the problem of
graph isomorphism: to test whether two graphs ''G'' and ''H'' are isomorphic, compute their canonical forms Canon(''G'') and Canon(''H''), and test whether these two canonical forms are identical.
The canonical form of a graph is an example of a
complete
Complete may refer to:
Logic
* Completeness (logic)
* Completeness of a theory, the property of a theory that every formula in the theory's language or its negation is provable
Mathematics
* The completeness of the real numbers, which implies t ...
graph invariant
Graph may refer to:
Mathematics
* Graph (discrete mathematics), a structure made of vertices and edges
** Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of disc ...
: every two isomorphic graphs have the same canonical form, and every two non-isomorphic graphs have different canonical forms.
[.][.] Conversely, every complete invariant of graphs may be used to construct a canonical form. The vertex set of an ''n''-vertex graph may be identified with the
integers from 1 to ''n'', and using such an identification a canonical form of a graph may also be described as a
permutation
In mathematics, a permutation of a set is, loosely speaking, an arrangement of its members into a sequence or linear order, or if the set is already ordered, a rearrangement of its elements. The word "permutation" also refers to the act or proc ...
of its vertices. Canonical forms of a graph are also called canonical labelings, and graph canonization is also sometimes known as graph canonicalization.
Computational complexity
Clearly, the graph canonization problem is at least as
computationally hard as the
graph isomorphism problem. In fact, graph isomorphism is even
AC0-
reducible to graph canonization. However it is still an open question whether the two problems are
polynomial time equivalent
In computational complexity theory, a polynomial-time reduction is a method for solving one problem using another. One shows that if a hypothetical subroutine solving the second problem exists, then the first problem can be solved by transforming ...
.
[
While the existence of (deterministic) polynomial algorithms for graph isomorphism is still an open problem in computational complexity theory, in 1977 László Babai reported that with probability at least 1 − exp(−O(''n'')), a simple vertex classification algorithm produces a canonical labeling of a graph chosen uniformly at random from the set of all ''n''-vertex graphs after only two refinement steps. Small modifications and an added ]depth-first search
Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alon ...
step produce canonical labeling of such uniformly-chosen random graphs in linear expected time. This result sheds some light on the fact why many reported graph isomorphism algorithms behave well in practice. This was an important breakthrough in probabilistic complexity theory
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and 1, where, roughly speaking, ...
which became widely known in its manuscript form and which was still cited as an "unpublished manuscript" long after it was reported at a symposium.
A commonly known canonical form is the lexicographically smallest graph within the isomorphism class, which is the graph of the class with lexicographically smallest adjacency matrix considered as a linear string.
However, the computation of the lexicographically smallest graph is NP-hard
In computational complexity theory, NP-hardness ( non-deterministic polynomial-time hardness) is the defining property of a class of problems that are informally "at least as hard as the hardest problems in NP". A simple example of an NP-hard pr ...
.
For trees, a concise polynomial canonization algorithm requiring O(n) space is presented by . Begin by labeling each vertex with the string 01. Iteratively for each non-leaf x remove the leading 0 and trailing 1 from x's label; then sort x's label along with the labels of all adjacent leaves in lexicographic order. Concatenate these sorted labels, add back a leading 0 and trailing 1, make this the new label of x, and delete the adjacent leaves. If there are two vertices remaining, concatenate their labels in lexicographic order.
Applications
Graph canonization is the essence of many graph isomorphism algorithms. One of the leading tools is Nauty.
A common application of graph canonization is in graphical data mining, in particular in chemical database applications.
A number of identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
s for chemical substances, such as SMILES and InChI use canonicalization steps in their computation, which is essentially the canonicalization of the graph which represents the molecule.
These identifiers are designed to provide a standard (and sometimes human-readable) way to encode molecular information and to facilitate the search for such information in databases and on the web.
References
{{reflist
Graph theory