Maximum-entropy Random Graph Model
   HOME

TheInfoList



OR:

Maximum-entropy random graph models are
random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs l ...
models used to study
complex networks Complex Networks is an American media and entertainment company for youth culture, based in New York City. It was founded as a bi-monthly magazine, ''Complex'', by fashion designer Marc Eckō. Complex Networks reports on popular and emerging ...
subject to the
principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
under a set of structural constraints, which may be global, distributional, or local.


Overview

Any random graph model (at a fixed set of parameter values) results in a
probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...
on
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...
s, and those that are maximum
entropy Entropy is a scientific concept, most commonly associated with states of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the micros ...
within the considered class of distributions have the special property of being maximally
unbiased Bias is a disproportionate weight ''in favor of'' or ''against'' an idea or thing, usually in a way that is inaccurate, closed-minded, prejudicial, or unfair. Biases can be innate or learned. People may develop biases for or against an individ ...
null model In mathematics, for example in the study of statistical properties of graphs, a null model is a type of random object that matches one specific object in some of its features, or more generally satisfies a collection of constraints, but which is ...
s for network inference (e.g.
biological network inference Biological network inference is the process of making inferences and predictions about biological networks. By using these networks to analyze patterns in biological systems, such as food-webs, we can visualize the nature and strength of these int ...
). Each model defines a family of probability distributions on the set of graphs of size n (for each n>n_0 for some finite n_0), parameterized by a collection of constraints on J observables \_^J defined for each graph G (such as fixed expected average degree,
degree distribution In the study of graphs and networks, the degree of a node in a network is the number of connections it has to other nodes and the degree distribution is the probability distribution of these degrees over the whole network. Definition The degr ...
of a particular form, or specific
degree sequence In graph theory, the degree (or valency) of a vertex of a graph is the number of edges that are incident to the vertex; in a multigraph, a loop contributes 2 to a vertex's degree, for the two ends of the edge. The degree of a vertex v is denot ...
), enforced in the graph distribution alongside entropy maximization by the method of Lagrange multipliers. Note that in this context "maximum entropy" refers not to the entropy of a single graph, but rather the entropy of the whole probabilistic ensemble of random graphs. Several commonly studied random network models are in fact maximum entropy, for example the ER graphs G(n,m) and G(n,p) (which each have one global constraint on the number of edges), as well as the configuration model (CM). and soft configuration model (SCM) (which each have n local constraints, one for each nodewise degree-value). In the two pairs of models mentioned above, an important distinction is in whether the constraint is sharp (i.e. satisfied by every element of the set of size-n graphs with nonzero probability in the ensemble), or soft (i.e. satisfied on average across the whole ensemble). The former (sharp) case corresponds to a
microcanonical ensemble In statistical mechanics, the microcanonical ensemble is a statistical ensemble that represents the possible states of a mechanical system whose total energy is exactly specified. The system is assumed to be isolated in the sense that it canno ...
, the condition of maximum entropy yielding all graphs G satisfying Q_j(G)=q_j\forall j as equiprobable; the latter (soft) case is
canonical The adjective canonical is applied in many contexts to mean 'according to the canon' the standard, rule or primary source that is accepted as authoritative for the body of knowledge or literature in that context. In mathematics, ''canonical exampl ...
, producing an
exponential random graph model Exponential family random graph models (ERGMs) are a set of statistical models used to study the structure and patterns within networks, such as those in social, organizational, or scientific contexts. They analyze how connections ( edges) form betw ...
(ERGM).


Canonical ensemble of graphs (general framework)

Suppose we are building a random graph model consisting of a probability distribution \mathbb(G) on the set \mathcal_n of
simple graph In discrete mathematics, particularly in graph theory, a graph is a structure consisting of a Set (mathematics), set of objects where some pairs of the objects are in some sense "related". The objects are represented by abstractions called ''Ver ...
s with n vertices. The
Gibbs entropy The concept entropy was first developed by German physicist Rudolf Clausius in the mid-nineteenth century as a thermodynamic property that predicts that certain spontaneous processes are irreversible or impossible. In statistical mechanics, entrop ...
S /math> of this ensemble will be given by : S -\sum_\mathbb(G)\log\mathbb(G). We would like the ensemble-averaged values \_^J of observables \_^J (such as average degree, average clustering, or average shortest path length) to be tunable, so we impose J "soft" constraints on the graph distribution: : \langle Q_j \rangle = \sum_\mathbb(G)Q_j(G) = q_j, where j=1,...,J label the constraints. Application of the method of Lagrange multipliers to determine the distribution \mathbb(G) that maximizes S /math> while satisfying \langle Q_j \rangle=q_j, and the normalization condition \sum_\mathbb(G)=1 results in the following: : \mathbb(G) = \frac\exp\left \sum_^J\psi_j Q_j(G)\right where Z is a normalizing constant (the partition function) and \_^J are parameters (Lagrange multipliers) coupled to the correspondingly indexed graph observables, which may be tuned to yield graph samples with desired values of those properties, on average; the result is an exponential family and
canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
; specifically yielding an ERGM.


The Erdős–Rényi model G(n,m)

In the canonical framework above, constraints were imposed on ensemble-averaged quantities \langle Q_j \rangle. Although these properties will on average take on values specifiable by appropriate setting of \_^J, each specific instance G may have Q_j(G)\ne q_j, which may be undesirable. Instead, we may impose a much stricter condition: every graph with nonzero probability must satisfy Q_j(G)= q_j exactly. Under these "sharp" constraints, the maximum-entropy distribution is determined. We exemplify this with the Erdős–Rényi model G(n,m). The sharp constraint in G(n,m) is that of a fixed number of edges m, that is , \operatorname E(G), =m, for all graphs G drawn from the ensemble (instantiated with a probability denoted \mathbb_(G)). This restricts the sample space from \mathcal_n (all graphs on n vertices) to the subset \mathcal_=\\subset \mathcal_n. This is in direct analogy to the
microcanonical ensemble In statistical mechanics, the microcanonical ensemble is a statistical ensemble that represents the possible states of a mechanical system whose total energy is exactly specified. The system is assumed to be isolated in the sense that it canno ...
in classical
statistical mechanics In physics, statistical mechanics is a mathematical framework that applies statistical methods and probability theory to large assemblies of microscopic entities. Sometimes called statistical physics or statistical thermodynamics, its applicati ...
, wherein the system is restricted to a thin manifold in the
phase space The phase space of a physical system is the set of all possible physical states of the system when described by a given parameterization. Each possible state corresponds uniquely to a point in the phase space. For mechanical systems, the p ...
of all states of a particular
energy Energy () is the physical quantity, quantitative physical property, property that is transferred to a physical body, body or to a physical system, recognizable in the performance of Work (thermodynamics), work and in the form of heat and l ...
value. Upon restricting our sample space to \mathcal_, we have no external constraints (besides normalization) to satisfy, and thus we'll select \mathbb_(G) to maximize S /math> without making use of Lagrange multipliers. It is well known that the entropy-maximizing distribution in the absence of external constraints is the uniform distribution over the sample space (see
maximum entropy probability distribution In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, ...
), from which we obtain: : \mathbb_(G)=\frac=\binom^, where the last expression in terms of
binomial coefficients In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers and is written \tbinom. It is the coefficient of the te ...
is the number of ways to place m edges among \binom possible edges, and thus is the
cardinality The thumb is the first digit of the hand, next to the index finger. When a person is standing in the medical anatomical position (where the palm is facing to the front), the thumb is the outermost digit. The Medical Latin English noun for thum ...
of \mathcal_.


Generalizations

A variety of maximum-entropy ensembles have been studied on generalizations of simple graphs. These include, for example, ensembles of simplicial complexes, and weighted random graphs with a given expected degree sequence


See also

*
Principle of maximum entropy The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data (such as a proposition ...
*
Maximum entropy probability distribution In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, ...
* Method of Lagrange multipliers *
Null model In mathematics, for example in the study of statistical properties of graphs, a null model is a type of random object that matches one specific object in some of its features, or more generally satisfies a collection of constraints, but which is ...
*
Random graph In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs l ...
*
Exponential random graph model Exponential family random graph models (ERGMs) are a set of statistical models used to study the structure and patterns within networks, such as those in social, organizational, or scientific contexts. They analyze how connections ( edges) form betw ...
*
Canonical ensemble In statistical mechanics, a canonical ensemble is the statistical ensemble that represents the possible states of a mechanical system in thermal equilibrium with a heat bath at a fixed temperature. The system can exchange energy with the hea ...
*
Microcanonical ensemble In statistical mechanics, the microcanonical ensemble is a statistical ensemble that represents the possible states of a mechanical system whose total energy is exactly specified. The system is assumed to be isolated in the sense that it canno ...


References

{{Reflist Random graphs