Main path analysis is a mathematical tool, first proposed by Hummon and Doreian in 1989, to identify the major paths in a

citation network A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each Vertex (graph theory), vertex (or Vertex (graph theory), node) in the gra ...

, which is one form of a

directed acyclic graph In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges (also called ''arcs''), with each edge directed from one ...

(DAG). It has since become an effective technique for mapping technological trajectories, exploring scientific knowledge flows, and conducting literature reviews. Global key-route main paths for a citation network

Global key-route main paths for a citation network

The method begins by measuring the significance of all the links in a citation network through the concept of ‘traversal count’ and then sequentially chains the most significant links into a "main path", which is deemed the most significant historical path in the target

. The method is applicable to any human activity that can be organized in the form of a

. The method is commonly applied to trace the knowledge flow paths or development trajectories of a science or technology field, through bibliographic citations or patent citations. It has also been applied to judicial decisions to trace the evolving changes of legal opinions. Main path analysis has attracted scholars attention recently. Academic research related to main path analysis saw a fast growing since 2007. A list of academic articles that introduce, explain, apply, modify, or extend the method originated in Hummon and Doreian can be foun
here
Nevertheless, there are issues not broadly discussed in applying the method, including the handling of citation data, choosing a proper traversal weight scheme, search options, and interpretation of the resulting paths.

History

Main path analysis is first proposed in Hummon and Doreian (1989) in which they suggest a different approach for analyzing a citation network "where the connective threads through a network are preserved and the focus is on the links in the network rather than on the nodes." They call the resulting chain of the most used citation links "main path" and claim that "It is our intuition that the main path, selected on the basis of the most used path will identify the main stream of a literature." The idea was verified using a set of DNA research articles. To make the method more practical, Liu and Lu (2012) extends the method to include the key-route search. The most useful feature of the key-route search is that one is able to view the different level of main paths by adjusting the key-route numbers.

The method

Main path analysis operates in two steps. The first step obtains the traversal counts of each link in a citation network. Several types of traversal counts are mentioned in the literature. The second step searches for the main paths by linking the significant links according to the size of traversal counts. One needs to prepare a citation network before proceeding for main path analysis.

Preparing a citation network

It is necessary to prepare a

before starting main path analysis. In a citation network, the nodes represent the documents such as academic articles, patents, or legal cases. These nodes are connected using citation information. Citation networks are by nature directed because the two nodes on the opposite end of a link are not symmetrical in their roles. As regards to the direction, this article adopts the convention that the cited node points to the citing node, signifying the fact that knowledge in the cited node flows to the citing node. Citation network is also by nature acyclic, which means that a node can never chain back to itself if one moves along the links following their direction. Several terms related to a citation network are defined here before proceeding further. Heads are the nodes the direction arrow leads to. Tails are the nodes on other ends of the direction arrow. Sources are the nodes that are cited but cite no others. Sinks cite other nodes but are not cited. Ancestors are the nodes that can be traced back to from a target node. Descendants are the nodes that one can reach from a target if one moves along the links following their direction. SPC values for a citation network

Traversal counts

Traversal counts measure the significance of a link. The literature discusses several types of traversal counts, including search path count (SPC), search path link count (SPLC), search path node pair (SPNP), and other variations. All these traversal counts will be noted as SPX. SPLC values for a citation network

Search path count (SPC)

A link’s SPC is the number of times the link is traversed if one runs through all possible paths from all the sources to all the sinks. SPC is first proposed by

Vladimir Batagelj Vladimir Batagelj (born June 14, 1948 in Idrija, Yugoslavia) is a Slovenian mathematician and an emeritus professor of mathematics at the University of Ljubljana. He is known for his work in discrete mathematics and combinatorial optimization, p ...

. SPC values for each link in a sample citation network is shown in Figure 1. The SPC value for the link (B, D) is 5 because five paths (B-D-F-H-K, B-D-F-I-L, B-D-F-I-M-N, B-D-I-L, and B-D-I-M-N) traverse through it. SPNP values for a citation network

Search path link count (SPLC)

A link’s SPLC is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the sinks. SPLC is first proposed by Hummon and Doreian. Figure 2 presents the SPLC values for each link in the same citation network as shown in Figure 1. Six paths traverse through the link (D, F) thus give it the SPLC value 6. They are: B-D-F-H-K, B-D-F-I-L, B-D-F-I-M-N, D-F-H-K, D-F-I-L, and D-F-I-M-N, noting that all the paths begin either from the ancestor of D, which is B, and D itself.

Search path node pair (SPNP)

A link’s SPNP is the number of times the link is traversed if one runs through all possible paths from all the ancestors of the tail node (including itself) to all the descendants of the head node (including itself). SPNP is first proposed by Hummon and Doreian. The SPNP values of the link (C, H) is 6 because there are 6 paths that begin from A, B, C (A and B are C's ancestors) and end at H and K (K is H's descendant). These paths are A-C-H, A-C-H-K, B-C-H, B-C-H-K, C-H, and C-H-K. Local main paths SPC

Path search

Based on the traversal counts, one can then search for the most significant path(s). There are several ways of finding them, including local, global, and key-route search. Global main paths SPC

Local search

Local search is mentioned in Hummon and Doreian as "priority first" search. This search process always chooses the next link(s) with the highest SPX as the outgoing link. It keeps tracking the most traversed link(s) thus obtains the main stream among all citation chains. Figure 4 shows the local main paths that are obtained based on SPC. Noticing that when the search reaches the node I, two outgoing links have the same SPC values thus producing two paths afterward. Local key-route main paths SPC

Global search

Global search simply suggests the citation chain with the largest overall SPX. The concept of global search is similar to the

critical path method The critical path method (CPM), or critical path analysis (CPA), is an algorithm for schedule (project management), scheduling a set of project activities. A critical path is determined by identifying the longest stretch of dependent activiti ...

in project scheduling. The global main paths of the sample citation network based on SPC is presented in Figure 5. The sum of all the SPC values in the path B-D-F-I-M-N is 15, which is the largest among all possible paths. Global key-route main paths SPC

Key-route search

Key-route search is designed to avoid the problem of missing significant links in both the local and global search. The problem is in the local and global main paths shown above, in which one of the most important links (H, K) is not included in the main paths. As described in Liu and Lu (2012), the approach searches main paths from the specified links (key-routes) thus guarantees the inclusion of the links. One can also specify multiple links to obtain multiple main paths. An additional advantage of the key-route approach is that one is able to control the detail of the main paths by varying the number of key-routes. The larger the number of key-route is specified, the more detail is revealed. When the number of key-route increases to a certain point the search returns the whole citation network. Figure 6 and 7 show the local key-route and global key-route main paths of the sample citation network. In both main paths the number of key-route is set to 1, i.e., doing the search base on only the top links. Since there are two top links (B, D) and (H, K), the resulting main paths include both of them.

The Variants

In addition to the key-route search approach, variations of the method include the approach that is aggregative and stochastic, considers decay in knowledge diffusion, etc.

Applications

The method has been applied to three types of documenting system that maintain the tradition of making references to the previous documents. They are the academic article, patent, and judicial documenting system.

Academic article

Academic citation databases such as

Web of Science The Web of Science (WoS; previously known as Web of Knowledge) is a paid-access platform that provides (typically via the internet) access to multiple databases that provide reference and citation data from academic journals, conference proceedi ...

and

Scopus Scopus is a scientific abstract and citation database, launched by the academic publisher Elsevier as a competitor to older Web of Science in 2004. The ensuing competition between the two databases has been characterized as "intense" and is c ...

include comprehensive digitized citation information. These information make it possible to apply main path analysis to examine the knowledge structure or trace the knowledge flow of any scientific fields. Some early applications explores the subject of centrality-productivity, conflict resolution, etc. More recent applications include fullerenes, nanotubes, data envelopment analysis, supply chain management, corporate social responsibility, IT outsourcing, medical tourism, etc.

Patent

Patents referencing prior arts is a common practice. For example, each United States patent document includes a "References Cited" section that lists the prior arts of the patent. Patent databases such as

Clarivate Analytics Clarivate Plc is a British-American publicly traded analytics company that operates a collection of subscription-based services, in the areas of bibliometrics and scientometrics; business and market intelligence, and competitive profiling ...

and Webpat provide digitized patent citation information. Verspagen (2007) and Mina (2007){{Cite journal, last1=Mina, first1=A., last2=Ramlogan, first2=R., last3=Tampubolon, first3=G., last4=Metcalfe, first4=J.S., title=Mapping evolutionary trajectories: Applications to the growth and transformation of medical knowledge, journal=Research Policy, volume=36, issue=5, pages=789–806, doi=10.1016/j.respol.2006.12.007, year=2007 are the two early works that apply main path analysis to the patent data.

Judicial document

In the common law system, a court decision document usually references previously published opinions for the purpose of justifying the current decision. These judicial references, or legal citations, can also be used to construct citation networks and then tracing the changes of legal opinions. Research opportunity in this area is wide open. Liu et al. (2014) conducted an exploratory study on such type of application.

Software Implementation

Main path analysis is implemented i
Pajek
a widely used

social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...

software written by

and Andrej Mrvar of

University of Ljubljana The University of Ljubljana (, , ), abbreviated UL, is the oldest and largest university in Slovenia. It has approximately 38,000 enrolled students. The university has 23 faculties and three art academies with approximately 4,000 teaching and re ...

, Slovenia. To run main path analysis in Pajek, one needs to first prepare a citation network and have Pajek reads in the network. Next, in the Pajek main menu, computes the traversal counts of all links in the network applying one of the following command sequences (depending on the choice of traversal counts). ''Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPC), or'' ''Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Link Count (SPLC), or'' ''Network → Acyclic Network → Create Weighted Network + Vector → Traversal Weights → Search Path Node Pairs (SPNP)'' After traversal counts are computed, the following command sequences find the main paths. For local main paths ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Forward'' For global main paths ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Standard'' For local key-route main paths ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Key-Route'' For global key-route main paths ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Key-Route'' In addition to key-route search, a more flexible search feature is added starting from Pajek version 5.03 (January 4, 2018). The new feature allows for local and global search passing through vertices defined by a cluster. The command sequences are as follows: ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Local Search → Key-Route → Through Vertices in Cluster'' ''Network → Acyclic Network → Create (Sub)Network → Main Paths → Global Search → Key-Route → Through Vertices in Cluster''

References

External links

Pajek
a free social network analysis software.

this page contain a list of academic articles that introduce, explain, apply, modify, or extend the method originated in Hummon and Doreian. __FORCETOC__ Social networks