HOME

TheInfoList



OR:

A method for pruning dense networks to highlight key links


Rationale

Relationships among a set of elements are often represented as a square matrix with entries representing the relations between all pairs of the elements. Relations such as distances, dissimilarities, similarities, relatedness, correlations, co-occurrences, conditional probabilities, etc., can be represented by such matrices. Such data can also be represented as networks with weighted links between the elements. Such matrices and networks are extremely dense and are not easily apprehended without some form of
data reduction Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The purpose of data reduction can be two-fold: reduce the number of data rec ...
or pruning. A pathfinder network results from applying a pruning method that removes weaker links from a (usually dense) network according to the lengths of alternative paths (see below). It is used as a
psychometric Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and rela ...
scaling method based on
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
and used in the study of expertise, education,
knowledge acquisition Knowledge acquisition is the process used to define the rules and ontologies required for a knowledge-based system. The phrase was first used in conjunction with expert systems to describe the initial tasks associated with developing an expert s ...
,
mental models A mental model is an internal representation of external reality: that is, a way of representing reality within one's mind. Such models are hypothesized to play a major role in cognition, reasoning and decision-making. The term for this concept wa ...
, and
knowledge engineering Knowledge engineering (KE) refers to all aspects involved in knowledge-based systems. Background Expert systems One of the first examples of an expert system was MYCIN, an application to perform medical diagnosis. In the MYCIN example, the ...
. It is also employed in generating communication networks, software debugging, visualizing scientific
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
patterns,
information retrieval Information retrieval (IR) in computing and information science is the task of identifying and retrieving information system resources that are relevant to an Information needs, information need. The information need can be specified in the form ...
, and other forms of
data visualization Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating Graphics, graphic or visual Representation (arts), representations of a large amount of complex quantitative and qualitative data and i ...
. Pathfinder networks are potentially applicable to any problem addressed by
network theory In mathematics, computer science, and network science, network theory is a part of graph theory. It defines networks as Graph (discrete mathematics), graphs where the vertices or edges possess attributes. Network theory analyses these networks ...
.


Overview

Network pruning aims to highlight the more important links between elements represented in a network. It helps to simplify the collection of connections involved which is valuable in data visualization and in comprehending essential relations among the elements represented in the network. Several psychometric scaling methods start from pairwise data and yield structures revealing the underlying organization of the data.
Data clustering Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some specific sense defined by the analyst) to each o ...
and
multidimensional scaling Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a data set. MDS is used to translate distances between each pair of n objects in a set into a configuration of n points mapped into an ...
are two such methods. Network scaling represents another method based on
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
. Pathfinder networks are derived from matrices of data for pairs of entities. Because the algorithm uses distances, similarity data are inverted to yield dissimilarities for the computations. In the pathfinder network, the entities correspond to the nodes of the generated network, and the links in the network are determined by the patterns of proximities. For example, if the proximities are similarities, links will generally connect nodes of high similarity. When proximities are distances or dissimilarities, links will connect the shorter distances. The links in the network will be undirected if the proximities are symmetrical for every pair of entities. Symmetrical proximities mean that the order of the entities is not important, so the proximity of ''i'' and ''j'' is the same as the proximity of ''j'' and ''i'' for all pairs ''i,j''. If the proximities are not symmetrical for every pair, the links will be directed.


Algorithm

The pathfinder algorithm uses two parameters. #The ''q'' parameter constrains the number of indirect proximities examined in generating the network. q is an integer between 2 and n-1, inclusive where n is the number of nodes or items. Shortest paths can have no more than q links. When q = n - 1, all possible paths are included. #The r parameter defines the metric used for computing the distance of paths (cf. the
Minkowski distance The Minkowski distance or Minkowski metric is a metric in a normed vector space which can be considered as a generalization of both the Euclidean distance and the Manhattan distance. It is named after the Polish mathematician Hermann Minkowski. ...
). r is a real number between 1 and \infin, inclusive. Path distance d_p is computed as: d_p = (\sum_^k l_i^r)^, where l_i is the distance of the i th link in the path and 2 \leq k \leq q. For r = 1, d_p is simply the sum of the distances of the links in the path. For r = \infin, d_p is the maximum of the distances of the links in the path because \lim_ d_p = \max_^k l_i. A link is pruned if its distance is greater than the minimum distance of paths between the nodes connected by the link. Efficient methods for finding minimum distances include the
Floyd–Warshall algorithm In computer science, the Floyd–Warshall algorithm (also known as Floyd's algorithm, the Roy–Warshall algorithm, the Roy–Floyd algorithm, or the WFI algorithm) is an algorithm for finding shortest paths in a directed weighted graph with po ...
(for q = n - 1) and
Dijkstra's algorithm Dijkstra's algorithm ( ) is an algorithm for finding the shortest paths between nodes in a weighted graph, which may represent, for example, a road network. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three ...
(for any value of q). A network generated with particular values of ''q'' and r is called a PFNet(q, r). Both of the parameters have the effect of decreasing the number of links in the network as their values are increased. The network with the minimum number of links is obtained when q = n - 1 and r = \infin, i.e., PFNet(n - 1, \infin). With ordinal-scale data (see
level of measurement Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
), the r parameter should be \infin because the same PFNet would result from any positive monotonic transformation of the proximity data. Other values of r require data measured on a ratio scale. The ''q'' parameter can be varied to yield the desired number of links in the network or to focus on more local relations with smaller values of ''q''. Essentially, pathfinder networks preserve the shortest possible paths given the data. Therefore, links are eliminated when they are not on shortest paths. The PFNet(n - 1, \infin) will be the
minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. ...
for the links defined by the proximity data if a unique minimum spanning tree exists. In general, the PFNet(n - 1, \infin) includes all of the links in any minimum spanning tree.


Example

Here is an example of an undirected pathfinder network derived from average ratings of a group of biology graduate students. The students rated the relatedness of all pairs of the terms shown, and the mean rating for each pair was computed. The solid blue links are the PFNet(n - 1, \infin) (labeled "both" in the figure). The dotted red links are added in the PFNet(2, \infin). For the added links, there are no 2-link paths shorter than the link distance but there is at least one shorter path with more than two links in the data. A minimal spanning tree would have 24 links so the 26 links in PFNet(n - 1, \infin) implies that there is more than one minimum spanning tree. There are two cycles present so there are tied distances in the set of links in the cycle. Breaking each cycle would require removing one of the tied links in each cycle.


References


Other Information

Further information on pathfinder networks and several examples of the application of PFnets to a variety of problems can be found in the references. * The 1990 book is out of print. A zipped copy of pdf chapters can b
downloaded
* The 1989 chapter in the Bower book is an article summarizing pathfinder networks and some applications
pdf
Three papers describing fast implementations of pathfinder networks: * * * {{cite journal , doi = 10.1002/asi.20904 , last1 = Quirin , first1 = A. , last2 = Cordón , first2 = O. , last3 = Guerrero-Bote , first3 = V. P. , last4 = Vargas-Quesada , first4 = B. , last5 = Moya-Anegón , first5 = F. , year = 2008 , title = A Quick MST-based Algorithm to Obtain Pathfinder Networks , journal = Journal of the American Society for Information Science and Technology , volume = 59 , issue = 12, pages = 1912–1924 , citeseerx = 10.1.1.331.1548 (The two variants by Quirin et al. are significantly faster. While the former can be applied with q = 2 or q = n-1 and any value for r, the latter can only be applied in cases where q = n - 1 and r = \infin.)


External links






Implementation of the original, Binary, Fast and MST variants of the algorithm in C


Psychometrics