graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...

and

theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...

, a maximum common induced subgraph of two graphs ''G'' and ''H'' is a graph that is an

induced subgraph In graph theory, an induced subgraph of a graph is another graph, formed from a subset of the vertices of the graph and ''all'' of the edges, from the original graph, connecting pairs of vertices in that subset. Definition Formally, let G=(V,E) ...

of both ''G'' and ''H'', and that has as many vertices as possible. Finding this graph is

NP-hard In computational complexity theory, a computational problem ''H'' is called NP-hard if, for every problem ''L'' which can be solved in non-deterministic polynomial-time, there is a polynomial-time reduction from ''L'' to ''H''. That is, assumi ...

. In the associated

decision problem In computability theory and computational complexity theory, a decision problem is a computational problem that can be posed as a yes–no question on a set of input values. An example of a decision problem is deciding whether a given natura ...

, the input is two graphs ''G'' and ''H'' and a number ''k''. The problem is to decide whether ''G'' and ''H'' have a common induced subgraph with at least ''k'' vertices. This problem is

NP-complete In computational complexity theory, NP-complete problems are the hardest of the problems to which ''solutions'' can be verified ''quickly''. Somewhat more precisely, a problem is NP-complete when: # It is a decision problem, meaning that for any ...

. It is a generalization of the induced

subgraph isomorphism problem In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H. Subgraph isomorphism is a gen ...

, which arises when ''k'' equals the number of vertices in the smaller of ''G'' and ''H'', so that this entire graph must appear as an induced subgraph of the other graph. Based on

hardness of approximation In computer science, hardness of approximation is a field that studies the algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of approximation algorithms by pro ...

results for the

maximum independent set In mathematical analysis, the maximum and minimum of a function are, respectively, the greatest and least value taken by the function. Known generically as extremum, they may be defined either within a given range (the ''local'' or ''relative ...

problem, the maximum common induced subgraph problem is also hard to approximate. This implies that, unless

P = NP The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved. Here, "quickly" means an algorithm exists that ...

, there is no

approximation algorithm In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems (in particular NP-hard problems) with provable guarantees on the distance of the returned sol ...

that, in

polynomial time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...

n

-vertex graphs, always finds a solution within a factor of

n^

of optimal, for any

\epsilon > 0

. One possible solution for this problem is to build a modular product graph of ''G'' and ''H''. In this graph, the largest

clique A clique (AusE, CanE, or ; ), in the social sciences, is a small group of individuals who interact with one another and share similar interests rather than include others. Interacting with cliques is part of normative social development regardles ...

corresponds to a maximum common induced subgraph of ''G'' and ''H''. Therefore, algorithms for finding maximum cliques can be used to find the maximum common induced subgraph. Moreover, a modified maximum-clique algorithm can be used to find a maximum common ''connected'' subgraph. The McSplit algorithm (along with its McSplit↓ variant) is a forward checking algorithm that does not use the clique encoding, but uses a compact data structure to keep track of the vertices in graph ''H'' to which each vertex in graph ''G'' may be mapped. Both versions of the McSplit algorithm outperform the clique encoding for many graph classes. A more efficient implementation of McSplit is McSplitDAL+PR, which combines a

Reinforcement Learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...

agent with some heuristic scores computed with the

PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordin ...

algorithm.

Applications

Maximum common induced subgraph algorithms form the basis for both graph differencing and graph alignment. Graph differencing identifies and highlights differences between two graphs by pinpointing changes, additions, or deletions. Graph alignment involves finding correspondences between the vertices and edges of two graphs to identify similar structures. Maximum common induced subgraph algorithms have a long tradition in

bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...

cheminformatics Cheminformatics (also known as chemoinformatics) refers to the use of physical chemistry theory with computer and information science techniques—so called "'' in silico''" techniques—in application to a range of descriptive and prescriptive ...

, pharmacophore mapping,

pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...

computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...

, code analysis, compilers, and

model checking In computer science, model checking or property checking is a method for checking whether a finite-state model of a system meets a given specification (also known as correctness). This is typically associated with hardware or software syst ...

. The problem is also particularly useful in software engineering and

model-based systems engineering Model-based systems engineering (MBSE) represents a paradigm shift in systems engineering, replacing traditional document-centric approaches with a methodology that uses structured domain models as the primary means of information exchange and sys ...

, where software code and engineering models (e.g.,

Simulink Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offe ...

, UML diagrams) are represented as graph data structures. Graph differencing can be used to detect changes between different versions of software code and models for change auditing, debugging, version control and collaborative team development.

References

{{reflist NP-complete problems Cheminformatics Computational problems in graph theory

Applications

See also

References