HOME

TheInfoList



OR:

In
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
and
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, graph edit distance (GED) is a measure of similarity (or dissimilarity) between two graphs. The concept of graph edit distance was first formalized mathematically by Alberto Sanfeliu and King-Sun Fu in 1983. A major application of graph edit distance is in inexact graph matching, such as error-tolerant
pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. The graph edit distance between two graphs is related to the string edit distance between strings. With the interpretation of strings as connected,
directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...
s of
maximum degree This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges. Symbols A B ...
one, classical definitions of edit distance such as Levenshtein distance,
Hamming distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to chan ...
and
Jaro–Winkler distance In computer science and statistics, the Jaro–Winkler distance is a string metric measuring an edit distance between two sequences. It is a variant proposed in 1990 by William E. Winkler of the Jaro distance metric (1989, Matthew A. Jaro). ...
may be interpreted as graph edit distances between suitably constrained graphs. Likewise, graph edit distance is also a generalization of tree edit distance between rooted trees.


Formal definitions and properties

The mathematical definition of graph edit distance is dependent upon the definitions of the graphs over which it is defined, i.e. whether and how the vertices and edges of the graph are labeled and whether the edges are
directed Director may refer to: Literature * ''Director'' (magazine), a British magazine * ''The Director'' (novel), a 1971 novel by Henry Denker * ''The Director'' (play), a 2000 play by Nancy Hasty Music * Director (band), an Irish rock band * ''D ...
. Generally, given a set of graph edit operations (also known as elementary
graph operations In the mathematical field of graph theory, graph operations are operations which produce new graphs from initial ones. They include both unary (one input) and binary (two input) operations. Unary operations Unary operations create a new grap ...
), the graph edit distance between two graphs g_ and g_, written as GED(g_,g_) can be defined as : GED(g_,g_) = \min_ \sum_^ c(e_) where \mathcal(g_,g_) denotes the set of edit paths transforming g_ into (a graph
isomorphic In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word i ...
to) g_ and c(e) \ge 0 is the cost of each graph edit operation e. The set of elementary graph edit operators typically includes: :vertex insertion to introduce a single new labeled vertex to a graph. :vertex deletion to remove a single (often disconnected) vertex from a graph. :vertex substitution to change the label (or color) of a given vertex. :edge insertion to introduce a new colored edge between a pair of vertices. :edge deletion to remove a single edge between a pair of vertices. :edge substitution to change the label (or color) of a given edge. Additional, but less common operators, include operations such as edge splitting that introduces a new vertex into an edge (also creating a new edge), and edge contraction that eliminates vertices of degree two between edges (of the same color). Although such complex edit operators can be defined in terms of more elementary transformations, their use allows finer parameterization of the cost function c when the operator is cheaper than the sum of its constituents. A deep analysis of the elementary graph edit operators is presented in And some methods have been presented to automatically deduce these elementary graph edit operators. And some algorithms learn these costs online:


Applications

Graph edit distance finds applications in
handwriting recognition Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other de ...
,
fingerprint recognition A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...
and
cheminformatics Cheminformatics (also known as chemoinformatics) refers to use of physical chemistry theory with computer and information science techniques—so called "''in silico''" techniques—in application to a range of descriptive and prescriptive problem ...
.


Algorithms and complexity

Exact algorithms for computing the graph edit distance between a pair of graphs typically transform the problem into one of finding the minimum cost edit path between the two graphs. The computation of the optimal edit path is cast as a
pathfinding Pathfinding or pathing is the plotting, by a computer application, of the shortest route between two points. It is a more practical variant on solving mazes. This field of research is based heavily on Dijkstra's algorithm for finding the s ...
search or
shortest path problem In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between ...
, often implemented as an A* search algorithm. In addition to exact algorithms, a number of efficient approximation algorithms are also known. Most of them have cubic computational time Moreover, there is an algorithm that deduces an approximation of the GED in linear time Despite the above algorithms sometimes working well in practice, in general the problem of computing graph edit distance is NP-hard (for a proof that's available online, see Section 2 o
Zeng et al.
, and is even hard to approximate (formally, it is APX-hard).


References

{{Reflist Graph theory Graph algorithms Computational problems in graph theory Distance