Community Structure
In the study of complex networks, a network is said to have community structure if the nodes of the network can be easily grouped into (potentially overlapping) sets of nodes such that each set of nodes is densely connected internally. In the particular case of ''non-overlapping'' community finding, this implies that the network divides naturally into groups of nodes with dense connections internally and sparser connections between groups. But ''overlapping'' communities are also allowed. The more general definition is based on the principle that pairs of nodes are more likely to be connected if they are both members of the same community(ies), and less likely to be connected if they do not share communities. A related but different problem is community search, where the goal is to find a community that a certain vertex belongs to. Properties In the study of networks, such as computer and information networks, social networks and biological networks, a number of different chara ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
![]() |
Hierarchical Clustering
In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: * Agglomerative: Agglomerative: Agglomerative clustering, often referred to as a "bottom-up" approach, begins with each data point as an individual cluster. At each step, the algorithm merges the two most similar clusters based on a chosen distance metric (e.g., Euclidean distance) and linkage criterion (e.g., single-linkage, complete-linkage). This process continues until all data points are combined into a single cluster or a stopping criterion is met. Agglomerative methods are more commonly used due to their simplicity and computational efficiency for small to medium-sized datasets . * Divisive: Divisive clustering, known as a "top-down" approach, starts with all data points in a single cluster and recursively splits the clu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
![]() |
Statistical Inference
Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population. In machine learning, the term ''inference'' is sometimes used instead to mean "make a prediction, by evaluating an already trained model"; in this context inferring properties of the model is referred to as ''training'' or ''learning'' (rather than ''inference''), and using a model for prediction is referred to as ''inference'' (instead of ''prediction''); se ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
![]() |
Louvain Modularity
The Louvain method for community detection is a Greedy algorithm, greedy optimization method intended to extract non-overlapping communities from large Graph (discrete mathematics), networks created by Vincent Blondel, Blondel ''et al''. from the Université catholique de Louvain, University of Louvain (the source of this method's name). Modularity optimization The inspiration for this method of Community structure, community detection is the optimization of Modularity (networks), modularity as the algorithm progresses. Modularity is a scale value between −1 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities. Optimizing this value theoretically results in the best possible grouping of the nodes of a given network. But because going through all possible configurations of the nodes into groups is impractical, heuristic algorithms are used. In the Louvain Method of comm ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
![]() |
Modularity (networks)
Modularity is a measure of the structure of Complex network, networks or Graph (discrete mathematics), graphs which measures the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules. Modularity is often used in optimization methods for detecting community structure in networks. Biological networks, including animal brains, exhibit a high degree of modularity. However, modularity maximization is not statistically consistent, and finds communities in its own null model, i.e. fully random graphs, and therefore it cannot be used to find statistically significant community structures in empirical networks. Furthermore, it has been shown that modularity suffers a resolution limit and, therefore, it is unable to detect small communities. Motivation Many scientifically important problems can be represente ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
Betweenness Centrality
In graph theory, betweenness centrality is a measure of centrality in a graph based on shortest paths. For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices, that is, there exists at least one path such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. Betweenness centrality was devised as a general measure of centrality: it applies to a wide range of problems in network theory, including problems related to social networks, biology, transport and scientific cooperation. Although earlier authors have intuitively described centrality as based on betweenness, gave the first formal definition of betweenness centrality. Betweenness centrality finds wide application in network theory; it represents the degree to which nodes stand between each other. For example, in a telecommunications network, a node with higher b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Girvan–Newman Algorithm
The Girvan–Newman algorithm (named after Michelle Girvan and Mark Newman) is a hierarchical method used to detect communities in complex systems.Girvan M. and Newman M. E. J.Community structure in social and biological networks Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002) Edge betweenness and community structure The Girvan–Newman algorithm detects communities by progressively removing edges from the original network. The connected components of the remaining network are the communities. Instead of trying to construct a measure that tells us which edges are the most central to communities, the Girvan–Newman algorithm focuses on edges that are most likely "between" communities. Vertex betweenness is an indicator of highly central nodes in networks. For any node i, vertex betweenness is defined as the fraction of shortest paths between pairs of nodes that run through it. It is relevant to models where the network modulates transfer of goods between known start and end ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
![]() |
Convex Combination
In convex geometry and Vector space, vector algebra, a convex combination is a linear combination of point (geometry), points (which can be vector (geometric), vectors, scalar (mathematics), scalars, or more generally points in an affine space) where all coefficients are non-negative and sum to 1. In other words, the operation is equivalent to a standard weighted average, but whose weights are expressed as a percent of the total weight, instead of as a fraction of the ''count'' of the weights as in a standard weighted average. Formal definition More formally, given a finite number of points x_1, x_2, \dots, x_n in a real vector space, a convex combination of these points is a point of the form : \alpha_1x_1+\alpha_2x_2+\cdots+\alpha_nx_n where the real numbers \alpha_i satisfy \alpha_i\ge 0 and \alpha_1+\alpha_2+\cdots+\alpha_n=1. As a particular example, every convex combination of two points lies on the line segment between the points. A set is convex set, convex if it ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
Complete Linkage Clustering
Complete-linkage clustering is one of several methods of agglomerative hierarchical clustering. At the beginning of the process, each element is in a cluster of its own. The clusters are then sequentially combined into larger clusters until all elements end up being in the same cluster. The method is also known as farthest neighbour clustering. The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place. Clustering procedure At each step, the two clusters separated by the shortest distance are combined. The definition of 'shortest distance' is what differentiates between the different agglomerative clustering methods. In complete-linkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that rem ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Single-linkage Clustering
In statistics, single-linkage clustering is one of several methods of hierarchical clustering. It is based on grouping clusters in bottom-up fashion (agglomerative clustering), at each step combining two clusters that contain the closest pair of elements not yet belonging to the same cluster as each other. This method tends to produce long thin clusters in which nearby elements of the same cluster have small distances, but elements at opposite ends of a cluster may be much farther from each other than two elements of other clusters. For some classes of data, this may lead to difficulties in defining classes that could usefully subdivide the data. However, it is popular in astronomy for analyzing galaxy clusters, which may often involve long strings of matter; in this application, it is also known as the friends-of-friends algorithm. Overview of agglomerative clustering methods In the beginning of the agglomerative clustering process, each element is in a cluster of its own. The ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Adjacency Matrix
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph (discrete mathematics), graph. The elements of the matrix (mathematics), matrix indicate whether pairs of Vertex (graph theory), vertices are Neighbourhood (graph theory), adjacent or not in the graph. In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is Glossary of graph theory terms#undirected, undirected (i.e. all of its Glossary of graph theory terms#edge, edges are bidirectional), the adjacency matrix is symmetric matrix, symmetric. The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory. The adjacency matrix of a graph should be distinguished from its incidence matrix, a different matrix representation whose elements indicate whether vertex–edge pairs are Incidence (graph), incident or not, and its degree matrix, whic ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |
|
Hamming Distance
In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of ''substitutions'' required to change one string into the other, or equivalently, the minimum number of ''errors'' that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming. A major application is in coding theory, more specifically to block codes, in which the equal-length strings are Vector space, vectors over a finite field. Definition The Hamming distance between two equal-length strings of symbols is the number of positions at which the corresponding symbols are different. Examples The symbols may be letters, bits, or decimal digits, am ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu] |