struc2vec is a framework to generate node vector representations on a
graph that preserve the
structural identity. In contrast to ''
node2vec node2vec is an algorithm to generate vector representations of nodes on a graph. The ''node2vec'' framework learns low-dimensional representations for nodes in a graph through the use of random walks through a graph starting at a target node. It is ...
'', that optimizes node embeddings so that nearby nodes in the graph have similar embedding, ''struc2vec'' captures the roles of nodes in a graph, even if structurally similar nodes are far apart in the graph. It learns low-dimensional representations for nodes in a graph, generating
random walks through a constructed
multi-layer graph starting at each graph node. It is useful for
machine learning applications where the downstream application is more related with the
structural equivalence
Similarity in network analysis occurs when two nodes (or other more elaborate structures) fall in the same equivalence class.
There are three fundamental approaches to constructing measures of network similarity: structural equivalence, automor ...
of the nodes (e.g., it can be used to detect nodes in networks with similar functions, such as interns in the social network of a corporation). ''struc2vec'' identifies nodes that play a similar role based solely on the structure of the graph, for example computing the structural identity of individuals in
social networks. In particular, ''struc2vec'' employs a degree-based method to measure the pairwise structural role similarity, which is then adopted to build the multi-layer graph. Moreover, the distance between the latent representation of nodes is strongly correlated to their structural similarity. The framework contains three optimizations: reducing the length of degree sequences considered, reducing the number of pairwise similarity calculations, and reducing the number of layers in the generated graph.
''struc2vec'' follows the intuition that
random walks through a graph can be treated as sentences in a corpus. Each node in a graph is treated as an individual word, and short random walk is treated as a sentence. In its final phase, the algorithm employs
Gensim
Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning.
Gensim is implemented in Python and ...
's ''
word2vec'' algorithm to learn embeddings based on biased random walks. Sequences of nodes are fed into a
skip-gram
In the fields of computational linguistics and probability, an ''n''-gram (sometimes also called Q-gram) is a contiguous sequence of ''n'' items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or b ...
or
continuous bag of words model and traditional machine-learning techniques for classification can be used.
It is considered a useful framework to learn node embeddings based on structural equivalence.
References
{{Reflist
Unsupervised learning