Network science is an academic field which studies

complex network In the context of network theory, a complex network is a graph (network) with non-trivial topological features—features that do not occur in simple networks such as lattices or random graphs but often occur in networks representing real ...

s such as

telecommunication network A telecommunications network is a group of nodes interconnected by telecommunications links that are used to exchange messages between the nodes. The links may use a variety of technologies based on the methodologies of circuit switching, mes ...

computer network A computer network is a set of computers sharing resources located on or provided by network nodes. The computers use common communication protocols over digital interconnections to communicate with each other. These interconnections ar ...

s, biological networks, cognitive and

semantic network A semantic network, or frame network is a knowledge base that represents semantic relations between concepts in a network. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices ...

s, and

social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...

s, considering distinct elements or actors represented by ''nodes'' (or ''vertices'') and the connections between the elements or actors as ''links'' (or ''edges''). The field draws on theories and methods including

graph theory In mathematics, graph theory is the study of '' graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of '' vertices'' (also called ''nodes'' or ''points'') which are conn ...

from mathematics, statistical mechanics from physics, data mining and

information visualization Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...

from computer science, inferential modeling from statistics, and

social structure In the social sciences, social structure is the aggregate of patterned social arrangements in society that are both emergent from and determinant of the actions of individuals. Likewise, society is believed to be grouped into structurally rela ...

from sociology. The

United States National Research Council The National Academies of Sciences, Engineering, and Medicine (also known as NASEM or the National Academies) are the collective scientific national academy of the United States. The name is used interchangeably in two senses: (1) as an umbrell ...

defines network science as "the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena."

Background and history

The study of networks has emerged in diverse disciplines as a means of analyzing complex relational data. The earliest known paper in this field is the famous

Seven Bridges of Königsberg The Seven Bridges of Königsberg is a historically notable problem in mathematics. Its negative resolution by Leonhard Euler in 1736 laid the foundations of graph theory and prefigured the idea of topology. The city of Königsberg in Prussia (n ...

written by

Leonhard Euler Leonhard Euler ( , ; 15 April 170718 September 1783) was a Swiss mathematician, physicist, astronomer, geographer, logician and engineer who founded the studies of graph theory and topology and made pioneering and influential discoveries in ma ...

in 1736. Euler's mathematical description of vertices and edges was the foundation of

, a branch of mathematics that studies the properties of pairwise relations in a network structure. The field of

continued to develop and found applications in chemistry (Sylvester, 1878). Dénes Kőnig, a Hungarian mathematician and professor, wrote the first book in Graph Theory, entitled "Theory of finite and infinite graphs", in 1936. Moreno Sociogram 1st Grade

In the 1930s

Jacob Moreno Jacob Levy Moreno (born Iacob Levy; May 18, 1889 – May 14, 1974) was a Romanian-American psychiatrist, psychosociologist, and educator, the founder of psychodrama, and the foremost pioneer of group psychotherapy. During his lifetime, he was r ...

, a psychologist in the Gestalt tradition, arrived in the United States. He developed the sociogram and presented it to the public in April 1933 at a convention of medical scholars. Moreno claimed that "before the advent of sociometry no one knew what the interpersonal structure of a group 'precisely' looked like" (Moreno, 1953). The sociogram was a representation of the social structure of a group of elementary school students. The boys were friends of boys and the girls were friends of girls with the exception of one boy who said he liked a single girl. The feeling was not reciprocated. This network representation of social structure was found so intriguing that it was printed in

The New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...

(April 3, 1933, page 17). The sociogram has found many applications and has grown into the field of

social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...

. Probabilistic theory in network science developed as an offshoot of

with Paul Erdős and

Alfréd Rényi Alfréd Rényi (20 March 1921 – 1 February 1970) was a Hungarian mathematician known for his work in probability theory, though he also made contributions in combinatorics, graph theory, and number theory. Life Rényi was born in Budapest ...

's eight famous papers on

random graphs In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs ...

. For

social networks A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...

the

exponential random graph model Exponential family random graph models (ERGMs) are a family of statistical models for analyzing data from social and other networks. Examples of networks examined using ERGM include knowledge networks, organizational networks, colleague networks, ...

or p* is a notational framework used to represent the probability space of a tie occurring in a

. An alternate approach to network probability structures is the

network probability matrix The network probability matrix describes the probability structure of a network based on the historical presence or absence of edges in a network. For example, individuals in a social network are not connected to other individuals with uniform ...

, which models the probability of edges occurring in a network, based on the historic presence or absence of the edge in a sample of networks. In 1998,

David Krackhardt David Krackhardt is Professor of Organizations at Heinz College and the Tepper School of Business, with courtesy appointments in the Department of Social and Decision Sciences (Dietrich College of Humanities and Social Sciences) and the Machine L ...

and

Kathleen Carley Kathleen M. Carley is an American social scientist specializing in dynamic network analysis. She is a professor in the School of Computer Science in the Institute for Software Research at Carnegie Mellon University and also holds appointments in ...

introduced the idea of a meta-network with the PCANS Model. They suggest that "all organizations are structured along these three domains, Individuals, Tasks, and Resources". Their paper introduced the concept that networks occur across multiple domains and that they are interrelated. This field has grown into another sub-discipline of network science called

dynamic network analysis Dynamic network analysis (DNA) is an emergent scientific field that brings together traditional social network analysis (SNA), link analysis (LA), social simulation and multi-agent systems (MAS) within network science and network theory. Dynamic ...

. More recently other network science efforts have focused on mathematically describing different network topologies. Duncan Watts and Steven Strogatz reconciled empirical data on networks with mathematical representation, describing the

small-world network A small-world network is a type of mathematical graph in which most nodes are not neighbors of one another, but the neighbors of any given node are likely to be neighbors of each other and most nodes can be reached from every other node by a sm ...

. Albert-László Barabási and

Reka Albert Reka may refer to: Places * Řeka, a village in the Czech Republic * Reka, Cerkno, a village near Cerkno, Slovenia * Reka, Laško, a village near Laško, Slovenia * Reka (Kladovo), a village near Kladovo, Serbia * Reka, Koprivnica, a village nea ...

discovered

scale-free networks A scale-free network is a network whose degree distribution follows a power law, at least asymptotically. That is, the fraction ''P''(''k'') of nodes in the network having ''k'' connections to other nodes goes for large values of ''k'' as : P( ...

, a property that captures the fact that in real network hubs coexist with many small degree vertices, and offered a dynamical model to explain the origin of this scale-free state.

Department of Defense initiatives

The U.S. military first became interested in

network-centric warfare Network-centric warfare, also called network-centric operations or net-centric warfare, is a military doctrine or theory of war that aims to translate an information advantage, enabled partly by information technology, into a competitive advant ...

as an operational concept based on network science in 1996. John A. Parmentola, the U.S. Army Director for Research and Laboratory Management, proposed to the Army's Board on Science and Technology (BAST) on December 1, 2003 that Network Science become a new Army research area. The BAST, the Division on Engineering and Physical Sciences for the National Research Council (NRC) of the National Academies, serves as a convening authority for the discussion of science and technology issues of importance to the Army and oversees independent Army-related studies conducted by the National Academies. The BAST conducted a study to find out whether identifying and funding a new field of investigation in basic research, Network Science, could help close the gap between what is needed to realize Network-Centric Operations and the current primitive state of fundamental knowledge of networks. As a result, the BAST issued the NRC study in 2005 titled Network Science (referenced above) that defined a new field of basic research in Network Science for the Army. Based on the findings and recommendations of that study and the subsequent 2007 NRC report titled Strategy for an Army Center for Network Science, Technology, and Experimentation, Army basic research resources were redirected to initiate a new basic research program in Network Science. To build a new theoretical foundation for complex networks, some of the key Network Science research efforts now ongoing in Army laboratories address: * Mathematical models of network behavior to predict performance with network size, complexity, and environment * Optimized human performance required for network-enabled warfare * Networking within ecosystems and at the molecular level in cells. As initiated in 2004 by Frederick I. Moxley with support he solicited from David S. Alberts, the Department of Defense helped to establish the first Network Science Center in conjunction with the U.S. Army at the United States Military Academy (USMA). Under the tutelage of Dr. Moxley and the faculty of the USMA, the first interdisciplinary undergraduate courses in Network Science were taught to cadets at West Point. In order to better instill the tenets of network science among its cadre of future leaders, the USMA has also instituted a five-course undergraduate minor in Network Science. In 2006, the U.S. Army and the United Kingdom (UK) formed the Network and Information Science

International Technology Alliance The International Technology Alliance (ITA) refers to a series of research programs that were jointly sponsored by UK Ministry of Defence (United Kingdom) and the US Army Research Laboratory (ARL). One such program focusing on network sciences NIS ...

, a collaborative partnership among the Army Research Laboratory, UK Ministry of Defense and a consortium of industries and universities in the U.S. and UK. The goal of the alliance is to perform basic research in support of Network- Centric Operations across the needs of both nations. In 2009, the U.S. Army formed the

Network Science CTA The Network Science Collaborative Technology Alliance (NS CTA) is a collaborative research alliance funded by the US Army Research Laboratory (ARL) and focused on fundamental research on the critical scientific and technical challenges that emerge ...

, a collaborative research alliance among the Army Research Laboratory,

CERDEC The Combat Capabilities Development Command (CCDC) C5ISR Center, formerly the Communications-Electronics RD&E Center (CERDEC), is the United States Army information technologies and integrated systems center. CCDC C5ISR Center is headquartered ...

, and a consortium of about 30 industrial R&D labs and universities in the U.S. The goal of the alliance is to develop a deep understanding of the underlying commonalities among intertwined social/cognitive, information, and communications networks, and as a result improve our ability to analyze, predict, design, and influence complex systems interweaving many kinds of networks. Subsequently, as a result of these efforts, the U.S. Department of Defense has sponsored numerous research projects that support Network Science.

Network properties

Often, networks have certain attributes that can be calculated to analyze the properties & characteristics of the network. The behavior of these network properties often define

network model The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, ...

s and can be used to analyze how certain models contrast to each other. Many of the definitions for other terms used in network science can be found in Glossary of graph theory.

Size

The size of a network can refer to the number of nodes

N

or, less commonly, the number of edges

E

which (for connected graphs with no multi-edges) can range from

N-1

(a tree) to

E_

(a complete graph). In the case of a simple graph (a network in which at most one (undirected) edge exists between each pair of vertices, and in which no vertices connect to themselves), we have

E_=\tbinom N2=N(N-1)/2

; for directed graphs (with no self-connected nodes),

E_=N(N-1)

; for directed graphs with self-connections allowed,

E_=N^2

. In the circumstance of a graph within which multiple edges may exist between a pair of vertices,

E_=\infty

Density

The density

D

of a network is defined as a normalized ratio between 0 and 1 of the number of edges

E

to the number of possible edges in a network with

N

nodes. Network density is a measure of the percentage of "optional" edges that exist in the network and can be computed as

D =\frac

where

E_

and

E_

are the minimum and maximum number of edges in a connected network with

N

nodes, respectively. In the case of simple graphs,

E_

is given by the

binomial coefficient In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the t ...

\tbinom N2

and

E_ = N-1

, giving density

D =\frac = \frac

. Another possible equation is

D = \frac,

whereas the ties

T

are unidirectional (Wasserman & Faust 1994). This gives a better overview over the network density, because unidirectional relationships can be measured.

Planar network density

The density

D

of a network, where there is no intersection between edges, is defined as a ratio of the number of edges

E

to the number of possible edges in a network with

N

nodes, given by a graph with no intersecting edges

(E_=3N-6)

, giving

D = \frac.

Average degree

The

degree Degree may refer to: As a unit of measurement * Degree (angle), a unit of angle measurement ** Degree of geographical latitude ** Degree of geographical longitude * Degree symbol (°), a notation used in science, engineering, and mathemati ...

k

of a node is the number of edges connected to it. Closely related to the density of a network is the average degree,

\langle k\rangle = \tfrac

(or, in the case of directed graphs,

\langle k\rangle = \tfrac

, the former factor of 2 arising from each edge in an undirected graph contributing to the degree of two distinct vertices). In the ER random graph model (

G(N,p)

) we can compute the expected value of

\langle k \rangle

(equal to the expected value of

k

of an arbitrary vertex): a random vertex has

N-1

other vertices in the network available, and with probability

p

, connects to each. Thus,

\mathbb langle k \rangle \mathbb p(N-1)

Average shortest path length (or characteristic path length)

The average shortest path length is calculated by finding the

shortest path In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between t ...

between all pairs of nodes, and taking the average over all paths of the length thereof (the length being the number of intermediate edges contained in the path, i.e., the distance

d_

between the two vertices

u,v

within the graph). This shows us, on average, the number of steps it takes to get from one member of the network to another. The behavior of the expected average shortest path length (that is, the ensemble average of the average shortest path length) as a function of the number of vertices

N

of a random network model defines whether that model exhibits the small-world effect; if it scales as

O(\ln N)

, the model generates small-world nets. For faster-than-logarithmic growth, the model does not produce small worlds. The special case of

O(\ln\ln N)

is known as ultra-small world effect.

Diameter of a network

As another means of measuring network graphs, we can define the diameter of a network as the longest of all the calculated shortest paths in a network. It is the shortest distance between the two most distant nodes in the network. In other words, once the shortest path length from every node to all other nodes is calculated, the diameter is the longest of all the calculated path lengths. The diameter is representative of the linear size of a network. If node A-B-C-D are connected, going from A->D this would be the diameter of 3 (3-hops, 3-links).

Clustering coefficient

The clustering coefficient is a measure of an "all-my-friends-know-each-other" property. This is sometimes described as the friends of my friends are my friends. More precisely, the clustering coefficient of a node is the ratio of existing links connecting a node's neighbors to each other to the maximum possible number of such links. The clustering coefficient for the entire network is the average of the clustering coefficients of all the nodes. A high clustering coefficient for a network is another indication of a small world. The clustering coefficient of the

i

'th node is :

C_i = \,,

where

k_i

is the number of neighbours of the

i

'th node, and

e_i

is the number of connections between these neighbours. The maximum possible number of connections between neighbors is, then, :

= \,.

From a probabilistic standpoint, the expected local clustering coefficient is the likelihood of a link existing between two arbitrary neighbors of the same node.

Connectedness

The way in which a network is connected plays a large part into how networks are analyzed and interpreted. Networks are classified in four different categories: * ''Clique''/''Complete Graph'': a completely connected network, where all nodes are connected to every other node. These networks are symmetric in that all nodes have in-links and out-links from all others. * ''Giant Component'': A single connected component which contains most of the nodes in the network. * ''Weakly Connected Component'': A collection of nodes in which there exists a path from any node to any other, ignoring directionality of the edges. * ''Strongly Connected Component'': A collection of nodes in which there exists a ''directed'' path from any node to any other.

Node centrality

Centrality indices produce rankings which seek to identify the most important nodes in a network model. Different centrality indices encode different contexts for the word "importance." The

betweenness centrality In graph theory, betweenness centrality (or "betweeness centrality") is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices suc ...

, for example, considers a node highly important if it form bridges between many other nodes. The eigenvalue centrality, in contrast, considers a node highly important if many other highly important nodes link to it. Hundreds of such measures have been proposed in the literature. Centrality indices are only accurate for identifying the most important nodes. The measures are seldom, if ever, meaningful for the remainder of network nodes. Also, their indications are only accurate within their assumed context for importance, and tend to "get it wrong" for other contexts. For example, imagine two separate communities whose only link is an edge between the most junior member of each community. Since any transfer from one community to the other must go over this link, the two junior members will have high betweenness centrality. But, since they are junior, (presumably) they have few connections to the "important" nodes in their community, meaning their eigenvalue centrality would be quite low.

Node influence

Limitations to centrality measures have led to the development of more general measures. Two examples are the accessibility, which uses the diversity of random walks to measure how accessible the rest of the network is from a given start node, and the expected force, derived from the expected value of the force of infection generated by a node. Both of these measures can be meaningfully computed from the structure of the network alone.

Community structure

Nodes in a network may be partitioned into groups representing communities. Depending on the context, communities may be distinct or overlapping. Typically, nodes in such communities will be strongly connected to other nodes in the same community, but weakly connected to nodes outside the community. In the absence of a

ground truth Ground truth is information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence) as opposed to information provided by inference. Etymology The ''Oxford English Dictionary'' (s.v. "ground t ...

describing the

community structure In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the par ...

of a specific network, several algorithms have been developed to infer possible community structures using either supervised of unsupervised clustering methods.

Network models

Network models serve as a foundation to understanding interactions within empirical complex networks. Various random graph generation models produce network structures that may be used in comparison to real-world complex networks.

Erdős–Rényi random graph model

The

Erdős–Rényi model In the mathematical field of graph theory, the Erdős–Rényi model is either of two closely related models for generating random graphs or the evolution of a random network. They are named after Hungarian mathematicians Paul Erdős and Alf ...

, named for Paul Erdős and

, is used for generating random graphs in which edges are set between nodes with equal probabilities. It can be used in the probabilistic method to prove the existence of graphs satisfying various properties, or to provide a rigorous definition of what it means for a property to hold for almost all graphs. To generate an Erdős–Rényi model

G(n, p)

two parameters must be specified: the total number of nodes and the probability that a random pair of nodes has an edge. Because the model is generated without bias to particular nodes, the degree distribution is binomial: for a randomly chosen vertex

v

, :

P(\deg(v) = k) =  p^k (1-p)^.

In this model the clustering coefficient is a.s. The behavior of

G(n, p)

can be broken into three regions. ''Subcritical''

n p < 1

: All components are simple and very small, the largest component has size

, C_1,  = O(\log n)

; ''Critical''

n p = 1

, C_1,  = O(n^\frac)

; ''Supercritical''

n p >1

, C_1,  \approx yn

where

y = y(n p)

is the positive solution to the equation

e^=1-y

. The largest connected component has high complexity. All other components are simple and small

, C_2,  = O(\log n)

Configuration model

The configuration model takes a degree sequence or degree distribution (which subsequently is used to generate a degree sequence) as the input, and produces randomly connected graphs in all respects other than the degree sequence. This means that for a given choice of the degree sequence, the graph is chosen uniformly at random from the set of all graphs that comply with this degree sequence. The degree

k

of a randomly chosen vertex is an

independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usua ...

random variable with integer values. When

\mathbb E^2 - 2 \mathbb E 0

, the configuration graph contains the giant connected component, which has infinite size. The rest of the components have finite sizes, which can be quantified with the notion of the size distribution. The probability

w(n)

that a randomly sampled node is connected to a component of size

n

is given by convolution powers of the degree distribution:

w(n)=\begin
\frac u_1^(n-2),& n>1, \\
u(0) & n=1,
\end

where

u(k)

denotes the degree distribution and

u_1(k)=\frac

. The giant component can be destroyed by randomly removing the critical fraction

p_c

of all edges. This process is called percolation on random networks. When the second moment of the degree distribution is finite,

\infty

, this critical edge fraction is given by

p_c=1-\frac

, and the average vertex-vertex distance

l

in the giant component scales logarithmically with the total size of the network,

l = O(\log N)

. In the directed configuration model, the degree of a node is given by two numbers, in-degree

k_\text

and out-degree

k_\text

, and consequently, the degree distribution is two-variate. The expected number of in-edges and out-edges coincides, so that

\mathbb E_\text \mathbb E_\text /math>. The directed configuration model contains the

giant component In network theory, a giant component is a connected component of a given random graph that contains a finite fraction of the entire graph's vertices. Giant component in Erdős–Rényi model Giant components are a prominent feature of the Erdő ...

iff

2\mathbb_\text \mathbb_\textk_\text - \mathbb_\text \mathbb_\text^2 - \mathbb_\text \mathbb_\text^2 +\mathbb_\text^2 \mathbb_\text^2 - \mathbb_\text k_\text 2 >0.

Note that

\mathbb E_\text /math> and \mathbb E_\text /math> are equal and therefore interchangeable in the latter inequality. The probability that a randomly chosen vertex belongs to a component of size n is given by: h_\text(n)=\frac \tilde u_\text^(n-2),  \;n>1, \; \tilde u_\text=\frac\sum\limits_u(k_\text+1,k_\text), for in-components, and

: h_\text(n)=\frac \tilde u_\text^(n-2),  \;n>1, \;\tilde u_\text=\frac\sum\limits_u(k_\text,k_\text+1), for out-components.

Watts–Strogatz small world model

The Watts and Strogatz model is a random graph generation model that produces graphs with

small-world properties The small-world experiment comprised several experiments conducted by Stanley Milgram and other researchers examining the average path length for social networks of people in the United States. The research was groundbreaking in that it suggested ...

. An initial lattice structure is used to generate a Watts–Strogatz model. Each node in the network is initially linked to its

\langle k\rangle

closest neighbors. Another parameter is specified as the rewiring probability. Each edge has a probability

p

that it will be rewired to the graph as a random edge. The expected number of rewired links in the model is

pE = pN\langle k\rangle/2

. As the Watts–Strogatz model begins as non-random lattice structure, it has a very high clustering coefficient along with high average path length. Each rewire is likely to create a shortcut between highly connected clusters. As the rewiring probability increases, the clustering coefficient decreases slower than the average path length. In effect, this allows the average path length of the network to decrease significantly with only slightly decreases in clustering coefficient. Higher values of p force more rewired edges, which in effect makes the Watts–Strogatz model a random network.

Barabási–Albert (BA) preferential attachment model

The Barabási–Albert model is a random network model used to demonstrate a preferential attachment or a "rich-get-richer" effect. In this model, an edge is most likely to attach to nodes with higher degrees. The network begins with an initial network of ''m''₀ nodes. ''m''₀ ≥ 2 and the degree of each node in the initial network should be at least 1, otherwise it will always remain disconnected from the rest of the network. In the BA model, new nodes are added to the network one at a time. Each new node is connected to

m

existing nodes with a probability that is proportional to the number of links that the existing nodes already have. Formally, the probability ''p''_''i'' that the new node is connected to node ''i'' is :

p_i = \frac,

where ''k''_''i'' is the degree of node ''i''. Heavily linked nodes ("hubs") tend to quickly accumulate even more links, while nodes with only a few links are unlikely to be chosen as the destination for a new link. The new nodes have a "preference" to attach themselves to the already heavily linked nodes. Barabasi-albert model degree distribution

Barabasi-albert model degree distribution

The degree distribution resulting from the BA model is scale free, in particular, for large degree it is a power law of the form: :

P(k)\sim k^ \,

Hubs exhibit high betweenness centrality which allows short paths to exist between nodes. As a result, the BA model tends to have very short average path lengths. The clustering coefficient of this model also tends to 0. The Barabási–Albert model was developed for undirected networks but applied to a wide range of different applications. The directed version of this model is the Price model which was applied to just citation networks. Mathematically, the same equations can be used to describe these models and they both show the same features.

Non-linear preferential attachment

In non-linear preferential attachment (NLPA), existing nodes in the network gain new edges proportionally to the node degree raised to a constant positive power,

\alpha

. Formally, this means that the probability that node

i

gains a new edge is given by :

p_i = \frac.

\alpha=1

, NLPA reduces to the BA model and is referred to as "linear". If

0<\alpha<1

, NLPA is referred to as "sub-linear" and the degree distribution of the network tends to a

stretched exponential distribution Stretching is a form of physical exercise in which a specific muscle or tendon (or muscle group) is deliberately flexed or stretched in order to improve the muscle's felt elasticity and achieve comfortable muscle tone. The result is a feeling ...

. If

\alpha>1

, NLPA is referred to as "super-linear" and a small number of nodes connect to almost all other nodes in the network. For both

\alpha<1

and

\alpha>1

, the scale-free property of the network is broken in the limit of infinite system size. However, if

\alpha

is only slightly larger than

1

, NLPA may result in

degree distribution In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network. Definition The degre ...

s which appear to be transiently scale free.

Mediation-driven attachment (MDA) model

In the mediation-driven attachment (MDA) model in which a new node coming with

m

edges picks an existing connected node at random and then connects itself not with that one but with

m

of its neighbors chosen also at random. The probability

\Pi(i)

that the node

i

of the existing node picked is :

\Pi(i) = \frac N \frac.

The factor

\frac

is the inverse of the harmonic mean (IHM) of degrees of the

k_i

neighbors of a node

i

. Extensive numerical investigation suggest that for an approximately

m> 14

the mean IHM value in the large

N

limit becomes a constant which means

\Pi(i) \propto k_i

. It implies that the higher the links (degree) a node has, the higher its chance of gaining more links since they can be reached in a larger number of ways through mediators which essentially embodies the intuitive idea of rich get richer mechanism (or the preferential attachment rule of the Barabasi–Albert model). Therefore, the MDA network can be seen to follow the PA rule but in disguise. However, for

m=1

it describes the winner takes it all mechanism as we find that almost

99\%

of the total nodes have degree one and one is super-rich in degree. As

m

value increases the disparity between the super rich and poor decreases and as

m>14

we find a transition from rich get super richer to rich get richer mechanism.

Fitness model

Another model where the key ingredient is the nature of the vertex has been introduced by Caldarelli et al. Here a link is created between two vertices

i,j

with a probability given by a linking function

f(\eta_i,\eta_j)

of the fitnesses of the vertices involved. The degree of a vertex i is given by :

k(\eta_i)=N\int_0^\infty f(\eta_i,\eta_j) \rho(\eta_j) \, d\eta_j

k(\eta_i)

is an invertible and increasing function of

\eta_i

, then the probability distribution

P(k)

is given by :

P(k)=\rho(\eta(k)) \cdot \eta'(k)

As a result, if the fitnesses

\eta

are distributed as a power law, then also the node degree does. Less intuitively with a fast decaying probability distribution as

\rho(\eta)=e^

together with a linking function of the kind :

f(\eta_i,\eta_j)=\Theta(\eta_i+\eta_j-Z)

with

Z

a constant and

\Theta

the Heavyside function, we also obtain scale-free networks. Such model has been successfully applied to describe trade between nations by using GDP as fitness for the various nodes

i,j

and a linking function of the kind :

\frac.

Network analysis

Social network analysis

Social network A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for ...

analysis examines the structure of relationships between social entities. Wasserman, Stanley and Katherine Faust. 1994. ''Social Network Analysis: Methods and Applications.'' Cambridge: Cambridge University Press. These entities are often persons, but may also be groups,

organizations An organization or organisation (Commonwealth English; see spelling differences), is an entity—such as a company, an institution, or an association—comprising one or more people and having a particular purpose. The word is derived fr ...

nation states A nation state is a political unit where the state and nation are congruent. It is a more precise concept than "country", since a country does not need to have a predominant ethnic group. A nation, in the sense of a common ethnicity, may inc ...

web sites A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wikipe ...

, scholarly publications. Since the 1970s, the empirical study of networks has played a central role in social science, and many of the

mathematical Mathematics is an area of knowledge that includes the topics of numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes. These topics are represented in modern mathematics ...

and

statistical Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industr ...

tools used for studying networks have been first developed in

sociology Sociology is a social science that focuses on society, human social behavior, patterns of social relationships, social interaction, and aspects of culture associated with everyday life. It uses various methods of empirical investigation and ...

.Newman, M.E.J. ''Networks: An Introduction.'' Oxford University Press. 2010, Amongst many other applications, social network analysis has been used to understand the

diffusion of innovations Diffusion of innovations is a theory that seeks to explain how, why, and at what rate new ideas and technology spread. Everett Rogers, a professor of communication studies, popularized the theory in his book ''Diffusion of Innovations''; the boo ...

, news and rumors. Similarly, it has been used to examine the spread of both diseases and health-related behaviors. It has also been applied to the study of markets, where it has been used to examine the role of trust in exchange relationships and of social mechanisms in setting prices. Similarly, it has been used to study recruitment into political movements and social organizations. It has also been used to conceptualize scientific disagreements as well as academic prestige. In the second language acquisition literature, it has an established history in study abroad research, revealing how peer learner interaction networks influence their language progress. More recently, network analysis (and its close cousin

traffic analysis Traffic analysis is the process of intercepting and examining messages in order to deduce information from patterns in communication, it can be performed even when the messages are encrypted. In general, the greater the number of messages observed ...

) has gained a significant use in military intelligence, for uncovering insurgent networks of both hierarchical and leaderless nature. In criminology, it is being used to identify influential actors in criminal gangs, offender movements, co-offending, predict criminal activities and make policies.

Dynamic network analysis

Dynamic network analysis Dynamic network analysis (DNA) is an emergent scientific field that brings together traditional social network analysis (SNA), link analysis (LA), social simulation and multi-agent systems (MAS) within network science and network theory. Dynamic ...

examines the shifting structure of relationships among different classes of entities in complex socio-technical systems effects, and reflects social stability and changes such as the emergence of new groups, topics, and leaders.Gross, T. and Sayama, H. (Eds.). 2009. ''Adaptive Networks: Theory, Models and Applications.'' Springer.Holme, P. and Saramäki, J. 2013. ''Temporal Networks.'' Springer. Dynamic Network Analysis focuses on meta-networks composed of multiple types of nodes (entities) and multiple types of links. These entities can be highly varied. Examples include people, organizations, topics, resources, tasks, events, locations, and beliefs. Dynamic network techniques are particularly useful for assessing trends and changes in networks over time, identification of emergent leaders, and examining the co-evolution of people and ideas.

Biological network analysis

With the recent explosion of publicly available high throughput biological data, the analysis of molecular networks has gained significant interest. The type of analysis in this content are closely related to social network analysis, but often focusing on local patterns in the network. For example, network motifs are small subgraphs that are over-represented in the network. Activity motifs are similar over-represented patterns in the attributes of nodes and edges in the network that are over represented given the network structure. The analysis of biological networks has led to the development of network medicine, which looks at the effect of diseases in the interactome.

Link analysis

Link analysis is a subset of network analysis, exploring associations between objects. An example may be examining the addresses of suspects and victims, the telephone numbers they have dialed and financial transactions that they have partaken in during a given timeframe, and the familial relationships between these subjects as a part of police investigation. Link analysis here provides the crucial relationships and associations between very many objects of different types that are not apparent from isolated pieces of information. Computer-assisted or fully automatic computer-based link analysis is increasingly employed by

bank A bank is a financial institution that accepts deposits from the public and creates a demand deposit while simultaneously making loans. Lending activities can be directly performed by the bank or indirectly through capital markets. Becau ...

s and

insurance Insurance is a means of protection from financial loss in which, in exchange for a fee, a party agrees to compensate another party in the event of a certain loss, damage, or injury. It is a form of risk management, primarily used to hedge ...

agencies in

fraud In law, fraud is intentional deception to secure unfair or unlawful gain, or to deprive a victim of a legal right. Fraud can violate civil law (e.g., a fraud victim may sue the fraud perpetrator to avoid the fraud or recover monetary compen ...

detection, by telecommunication operators in telecommunication network analysis, by medical sector in

epidemiology Epidemiology is the study and analysis of the distribution (who, when, and where), patterns and determinants of health and disease conditions in a defined population. It is a cornerstone of public health, and shapes policy decisions and evide ...

and

pharmacology Pharmacology is a branch of medicine, biology and pharmaceutical sciences concerned with drug or medication action, where a drug may be defined as any artificial, natural, or endogenous (from within the body) molecule which exerts a biochemi ...

, in law enforcement

investigation Investigation or Investigations may refer to: Law enforcement * Investigation, the work of a detective * Investigation, the work of a private investigator * Criminal investigation, the study of facts, used to identify, locate and prove the guilt ...

s, by

search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...

s for

relevance Relevance is the concept of one topic being connected to another topic in a way that makes it useful to consider the second topic when considering the first. The concept of relevance is studied in many different fields, including cognitive sc ...

rating (and conversely by the spammers for

spamdexing Spamdexing (also known as search engine spam, search engine poisoning, black-hat search engine optimization, search spam or web spam) is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link buildin ...

and by business owners for

search engine optimization Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic (known as "natural" or " organic" results) rather than di ...

), and everywhere else where relationships between many objects have to be analyzed.

Pandemic analysis

The

SIR model Compartmental models are a very general modelling technique. They are often applied to the mathematical modelling of infectious diseases. The population is assigned to compartments with labels – for example, S, I, or R, (Susceptible, Infectious ...

is one of the most well known algorithms on predicting the spread of global pandemics within an infectious population.

=Susceptible to infected

= :

S = \beta\left(\frac 1 N \right)

The formula above describes the "force" of infection for each susceptible unit in an infectious population, where is equivalent to the transmission rate of said disease. To track the change of those susceptible in an infectious population: :

\Delta S = \beta \times S  \, \Delta t

=Infected to recovered

= :

\Delta I = \mu I \, \Delta t

Over time, the number of those infected fluctuates by: the specified rate of recovery, represented by

\mu

but deducted to one over the average infectious period

, the numbered of infectious individuals,

I

, and the change in time,

\Delta t

=Infectious period

= Whether a population will be overcome by a pandemic, with regards to the SIR model, is dependent on the value of

R_0

or the "average people infected by an infected individual." :

R_0 = \beta\tau =

Web link analysis

Several Web search

ranking A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second. In mathematics, this is known as a weak order or total preorder of o ...

algorithms use link-based centrality metrics, including (in order of appearance) Marchiori's Hyper Search,

Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...

PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordi ...

, Kleinberg's

HITS algorithm Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of we ...

, the CheiRank and TrustRank algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example, the analysis might be of the interlinking between politicians' web sites or blogs.

=PageRank

works by randomly picking "nodes" or websites and then with a certain probability, "randomly jumping" to other nodes. By randomly jumping to these other nodes, it helps PageRank completely traverse the network as some webpages exist on the periphery and would not as readily be assessed. Each node,

x_i

, has a PageRank as defined by the sum of pages

j

that link to

i

times one over the outlinks or "out-degree" of

j

times the "importance" or PageRank of

j

. :

x_i = \sum_x_j^

Random jumping

As explained above, PageRank enlists random jumps in attempts to assign PageRank to every website on the internet. These random jumps find websites that might not be found during the normal search methodologies such as

breadth-first search Breadth-first search (BFS) is an algorithm for searching a tree data structure for a node that satisfies a given property. It starts at the tree root and explores all nodes at the present depth prior to moving on to the nodes at the next d ...

and

depth-first search Depth-first search (DFS) is an algorithm for traversing or searching tree or graph data structures. The algorithm starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible alo ...

. In an improvement over the aforementioned formula for determining PageRank includes adding these random jump components. Without the random jumps, some pages would receive a PageRank of 0 which would not be good. The first is

\alpha

, or the probability that a random jump will occur. Contrasting is the "damping factor", or

1 - \alpha

. :

R =  + (1 - \alpha) \sum_  x_j^

Another way of looking at it: :

R(A) = \sum  + \cdots +

Centrality measures

Information about the relative importance of nodes and edges in a graph can be obtained through

centrality In graph theory and network analysis, indicators of centrality assign numbers or rankings to nodes within a graph corresponding to their network position. Applications include identifying the most influential person(s) in a social network, k ...

measures, widely used in disciplines like

. Centrality measures are essential when a network analysis has to answer questions such as: "Which nodes in the network should be targeted to ensure that a message or information spreads to all or most nodes in the network?" or conversely, "Which nodes should be targeted to curtail the spread of a disease?". Formally established measures of centrality are degree centrality,

closeness centrality In a connected graph, closeness centrality (or closeness) of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the ...

, eigenvector centrality, and katz centrality. The objective of network analysis generally determines the type of centrality measure(s) to be used. *Degree centrality of a node in a network is the number of links (vertices) incident on the node. *Closeness centrality determines how "close" a node is to other nodes in a network by measuring the sum of the shortest distances (geodesic paths) between that node and all other nodes in the network. *Betweenness centrality determines the relative importance of a node by measuring the amount of traffic flowing through that node to other nodes in the network. This is done by measuring the fraction of paths connecting all pairs of nodes and containing the node of interest. Group Betweenness centrality measures the amount of traffic flowing through a group of nodes. *Eigenvector centrality is a more sophisticated version of degree centrality where the centrality of a node not only depends on the number of links incident on the node but also the quality of those links. This quality factor is determined by the eigenvectors of the adjacency matrix of the network. *Katz centrality of a node is measured by summing the geodesic paths between that node and all (reachable) nodes in the network. These paths are weighted, paths connecting the node with its immediate neighbors carry higher weights than those which connect with nodes farther away from the immediate neighbors.

Spread of content in networks

Content in a

can spread via two major methods: conserved spread and non-conserved spread. In conserved spread, the total amount of content that enters a complex network remains constant as it passes through. The model of conserved spread can best be represented by a pitcher containing a fixed amount of water being poured into a series of funnels connected by tubes. Here, the pitcher represents the original source and the water is the content being spread. The funnels and connecting tubing represent the nodes and the connections between nodes, respectively. As the water passes from one funnel into another, the water disappears instantly from the funnel that was previously exposed to the water. In non-conserved spread, the amount of content changes as it enters and passes through a complex network. The model of non-conserved spread can best be represented by a continuously running faucet running through a series of funnels connected by tubes. Here, the amount of water from the original source is infinite. Also, any funnels that have been exposed to the water continue to experience the water even as it passes into successive funnels. The non-conserved model is the most suitable for explaining the transmission of most

infectious diseases An infection is the invasion of tissues by pathogens, their multiplication, and the reaction of host tissues to the infectious agent and the toxins they produce. An infectious disease, also known as a transmissible disease or communicable d ...

The SIR model

In 1927, W. O. Kermack and A. G. McKendrick created a model in which they considered a fixed population with only three compartments, susceptible:

S(t)

, infected,

I(t)

, and recovered,

R(t)

. The compartments used for this model consist of three classes: *

S(t)

is used to represent the number of individuals not yet infected with the disease at time t, or those susceptible to the disease *

I(t)

denotes the number of individuals who have been infected with the disease and are capable of spreading the disease to those in the susceptible category *

R(t)

is the compartment used for those individuals who have been infected and then recovered from the disease. Those in this category are not able to be infected again or to transmit the infection to others. The flow of this model may be considered as follows: :

\mathcal \rightarrow \mathcal \rightarrow \mathcal

Using a fixed population,

N = S(t) + I(t) + R(t)

, Kermack and McKendrick derived the following equations: :

\frac & = \gamma I \end

Several assumptions were made in the formulation of these equations: First, an individual in the population must be considered as having an equal probability as every other individual of contracting the disease with a rate of

\beta

, which is considered the contact or infection rate of the disease. Therefore, an infected individual makes contact and is able to transmit the disease with

\beta N

others per unit time and the fraction of contacts by an infected with a susceptible is

S/N

. The number of new infections in unit time per infective then is

\beta N (S/N)

, giving the rate of new infections (or those leaving the susceptible category) as

\beta N (S/N)I = \beta SI

(Brauer & Castillo-Chavez, 2001). For the second and third equations, consider the population leaving the susceptible class as equal to the number entering the infected class. However, infectives are leaving this class per unit time to enter the recovered/removed class at a rate

\gamma

per unit time (where

\gamma

represents the mean recovery rate, or

1/\gamma

the mean infective period). These processes which occur simultaneously are referred to as the

Law of Mass Action In chemistry, the law of mass action is the proposition that the rate of the chemical reaction is directly proportional to the product of the activities or concentrations of the reactants. It explains and predicts behaviors of solutions in dy ...

, a widely accepted idea that the rate of contact between two groups in a population is proportional to the size of each of the groups concerned (Daley & Gani, 2005). Finally, it is assumed that the rate of infection and recovery is much faster than the time scale of births and deaths and therefore, these factors are ignored in this model. More can be read on this model on the Epidemic model page.

The master equation approach

A master equation can express the behaviour of an undirected growing network where, at each time step, a new node is added to the network, linked to an old node (randomly chosen and without preference). The initial network is formed by two nodes and two links between them at time

t = 2

, this configuration is necessary only to simplify further calculations, so at time

t = n

the network have

n

nodes and

n

links. The master equation for this network is: :

p(k,s,t+1) = \frac 1 t p(k-1,s,t) + \left(1 - \frac 1 t \right)p(k,s,t),

where

p(k,s,t)

is the probability to have the node

s

with degree

k

at time

t+1

, and

s

is the time step when this node was added to the network. Note that there are only two ways for an old node

s

to have

k

links at time

t+1

: * The node

s

have degree

k-1

at time

t

and will be linked by the new node with probability

1/t

* Already has degree

k

at time

t

and will not be linked by the new node. After simplifying this model, the degree distribution is

P(k) = 2^.

Based on this growing network, an epidemic model is developed following a simple rule: Each time the new node is added and after choosing the old node to link, a decision is made: whether or not this new node will be infected. The master equation for this epidemic model is: :

p_r(k,s,t) = r_t \frac 1 t p_r(k-1,s,t) + \left(1 - \frac 1 t \right) p_r(k,s,t),

where

r_t

represents the decision to infect (

r_t = 1

) or not (

r_t = 0

). Solving this master equation, the following solution is obtained:

\tilde_r(k) = \left(\frac r 2 \right)^k.

Multilayer networks

Multilayer networks are networks with multiple kinds of relations. Attempts to model real-world systems as multidimensional networks have been used in various fields such as social network analysis, economics, history, urban and international transport, ecology, psychology, medicine, biology, commerce, climatology, physics, computational neuroscience, operations management, and finance.

Network optimization

Network problems that involve finding an optimal way of doing something are studied under the name of

combinatorial optimization Combinatorial optimization is a subfield of mathematical optimization that consists of finding an optimal object from a finite set of objects, where the set of feasible solutions is discrete or can be reduced to a discrete set. Typical combi ...

. Examples include network flow,

shortest path problem In graph theory, the shortest path problem is the problem of finding a path between two vertices (or nodes) in a graph such that the sum of the weights of its constituent edges is minimized. The problem of finding the shortest path between t ...

transport problem In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.G. Monge. ''M ...

transshipment problem Transshipment problems form a subgroup of transportation problems, where transshipment is allowed. In transshipment, transportation may or must go through intermediate nodes, possibly changing modes of transport. The Transshipment problem has it ...

, location problem, matching problem,

assignment problem The assignment problem is a fundamental combinatorial optimization problem. In its most general form, the problem is as follows: :The problem instance has a number of ''agents'' and a number of ''tasks''. Any agent can be assigned to perform any t ...

, packing problem, routing problem, critical path analysis and PERT (Program Evaluation & Review Technique).

Background and history

Department of Defense initiatives

Network properties

Size

Density

Planar network density

Average degree

Average shortest path length (or characteristic path length)

Diameter of a network

Clustering coefficient

Connectedness

Node centrality

Node influence

Community structure

Network models

Erdős–Rényi random graph model

Configuration model

Watts–Strogatz small world model

Barabási–Albert (BA) preferential attachment model

Non-linear preferential attachment

Mediation-driven attachment (MDA) model

Fitness model

Network analysis

Social network analysis

Dynamic network analysis

Biological network analysis

Link analysis

Pandemic analysis

=Susceptible to infected

=Infected to recovered

=Infectious period

Web link analysis

=PageRank

Centrality measures

Spread of content in networks

The SIR model

The master equation approach

Multilayer networks

Network optimization

See also

References

Further reading