HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a graph is an
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
that is meant to implement the
undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' ve ...
and
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
concepts from the field of
graph theory In mathematics, graph theory is the study of '' graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...
within
mathematics Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...
. A graph data structure consists of a finite (and possibly mutable)
set Set, The Set, SET or SETS may refer to: Science, technology, and mathematics Mathematics *Set (mathematics), a collection of elements *Category of sets, the category whose objects and morphisms are sets and total functions, respectively Electro ...
of ''vertices'' (also called ''nodes'' or ''points''), together with a set of unordered pairs of these vertices for an undirected graph or a set of ordered pairs for a directed graph. These pairs are known as ''edges'' (also called ''links'' or ''lines''), and for a directed graph are also known as ''edges'' but also sometimes ''arrows'' or ''arcs''. The vertices may be part of the graph structure, or may be external entities represented by integer indices or
references Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another object. The first object in this relation is said to ''refer to'' the second object. It is called a ''name'' ...
. A graph data structure may also associate to each edge some ''edge value'', such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).


Operations

The basic operations provided by a graph data structure ''G'' usually include:See, e.g. , Section 13.1.2: Operations on graphs, p. 360. For a more detailed set of operations, see * : tests whether there is an edge from the vertex ''x'' to the vertex ''y''; * : lists all vertices ''y'' such that there is an edge from the vertex ''x'' to the vertex ''y''; * : adds the vertex ''x'', if it is not there; * : removes the vertex ''x'', if it is there; * : adds the edge ''z'' from the vertex ''x'' to the vertex ''y'', if it is not there; * : removes the edge from the vertex ''x'' to the vertex ''y'', if it is there; * : returns the value associated with the vertex ''x''; * : sets the value associated with the vertex ''x'' to ''v''. Structures that associate values to the edges usually also provide: * : returns the value associated with the edge (''x'', ''y''); * : sets the value associated with the edge (''x'', ''y'') to ''v''.


Common data structures for graph representation

;
Adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is ...
: Vertices are stored as records or objects, and every vertex stores a
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby unio ...
of adjacent vertices. This data structure allows the storage of additional data on the vertices. Additional data can be stored if edges are also stored as objects, in which case each vertex stores its incident edges and each edge stores its incident vertices. ;
Adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. In the special case of a finite simp ...
: A two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices. ; Incidence matrix : A two-dimensional matrix, in which the rows represent the vertices and columns represent the edges. The entries indicate the incidence relation between the vertex at a row and edge at a column. The following table gives the
time complexity In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
cost of performing various operations on graphs, for each of these representations, with , ''V'', the number of vertices and , ''E'', the number of edges. In the matrix representations, the entries encode the cost of following an edge. The cost of edges that are not present are assumed to be ∞. Adjacency lists are generally preferred for the representation of sparse graphs, while an adjacency matrix is preferred if the graph is dense; that is, the number of edges , ''E'', is close to the number of vertices squared, , ''V'', 2, or if one must be able to quickly look up if there is an edge connecting two vertices.


Parallel representations

The parallelization of graph problems faces significant challenges: Data-driven computations, unstructured problems, poor locality and high data access to computation ratio. The graph representation used for parallel architectures plays a significant role in facing those challenges. Poorly chosen representations may unnecessarily drive up the communication cost of the algorithm, which will decrease its
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
. In the following, shared and distributed memory architectures are considered.


Shared memory

In the case of a
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
model, the graph representations used for parallel processing are the same as in the sequential case, since parallel read-only access to the graph representation (e.g. an
adjacency list In graph theory and computer science, an adjacency list is a collection of unordered lists used to represent a finite graph. Each unordered list within an adjacency list describes the set of neighbors of a particular vertex in the graph. This is ...
) is efficient in shared memory.


Distributed memory

In the
distributed memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...
model, the usual approach is to partition the vertex set V of the graph into p sets V_0, \dots, V_. Here, p is the amount of available processing elements (PE). The vertex set partitions are then distributed to the PEs with matching index, additionally to the corresponding edges. Every PE has its own subgraph representation, where edges with an endpoint in another partition require special attention. For standard communication interfaces like
MPI MPI or Mpi may refer to: Science and technology Biology and medicine * Magnetic particle imaging, an emerging non-invasive tomographic technique * Myocardial perfusion imaging, a nuclear medicine procedure that illustrates the function of the hear ...
, the ID of the PE owning the other endpoint has to be identifiable. During computation in a distributed graph algorithms, passing information along these edges implies communication. Partitioning the graph needs to be done carefully - there is a trade-off between low communication and even size partitioning But partitioning a graph is a NP-hard problem, so it is not feasible to calculate them. Instead, the following heuristics are used. 1D partitioning: Every processor gets n/p vertices and the corresponding outgoing edges. This can be understood as a row-wise or column-wise decomposition of the adjacency matrix. For algorithms operating on this representation, this requires an All-to-All communication step as well as \mathcal(m) message buffer sizes, as each PE potentially has outgoing edges to every other PE. 2D partitioning: Every processor gets a submatrix of the adjacency matrix. Assume the processors are aligned in a rectangle p = p_r \times p_c, where p_r and p_c are the amount of processing elements in each row and column, respectively. Then each processor gets a submatrix of the adjacency matrix of dimension (n/p_r)\times(n/p_c). This can be visualized as a
checkerboard A checkerboard (American English) or chequerboard (British English; see spelling differences) is a board of checkered pattern on which checkers (also known as English draughts) is played. Most commonly, it consists of 64 squares (8×8) of altern ...
pattern in a matrix. Therefore, each processing unit can only have outgoing edges to PEs in the same row and column. This bounds the amount of communication partners for each PE to p_r + p_c - 1 out of p = p_r \times p_c possible ones.


Compressed representations

Graphs with trillions of edges occur in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
,
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
, and other areas. Compressed graph representations have been developed to reduce I/O and memory requirements. General techniques such as
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algo ...
are applicable, but the adjacency list or adjacency matrix can be processed in specific ways to increase efficiency.


See also

* Graph traversal for graph walking strategies *
Graph database A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the '' graph'' (or ''edge'' or ''relationship''). The graph rel ...
for graph (data structure) persistency *
Graph rewriting In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also ...
for rule based transformations of graphs (graph data structures) *
Graph drawing software Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graphs arising from applications such as social network analysis ...
for software, systems, and providers of systems for drawing graphs


References


External links


Boost Graph Library: a powerful C++ graph library
s.a.
Boost (C++ libraries) Boost is a set of libraries for the C++ programming language that provides support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing. It c ...

Networkx: a Python graph libraryGraphMatcher
a java program to align directed/undirected graphs.
GraphBLAS
A specification for a library interface for operations on graphs, with a particular focus on sparse graphs. {{DEFAULTSORT:Graph (Abstract Data Type) Graph theory Abstract data types Graphs Hypergraphs