In
network theory
Network theory is the study of graphs as a representation of either symmetric relations or asymmetric relations between discrete objects. In computer science and network science, network theory is a part of graph theory: a network can be defi ...
, link prediction is the problem of predicting the existence of a link between two entities in a network. Examples of link prediction include predicting friendship links among users in a
social network
A social network is a social structure made up of a set of social actors (such as individuals or organizations), sets of dyadic ties, and other social interactions between actors. The social network perspective provides a set of methods for an ...
, predicting co-authorship links in a
citation network
A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents.
Each Vertex (graph theory), vertex (or Vertex (graph theory), node) in the gra ...
, and predicting interactions between genes and proteins in a
biological network
A biological network is a method of representing systems as complex sets of binary interactions or relations between various biological entities. In general, networks or graphs are used to capture relationships between entities or objects. A typi ...
. Link prediction can also have a temporal aspect, where, given a snapshot of the set of links at time
, the goal is to predict the links at time
.
Link prediction is widely applicable. In e-commerce, link prediction is often a subtask for recommending items to users. In the curation of citation databases, it can be used for record deduplication. In bioinformatics, it has been used to predict
protein-protein interactions (PPI). It is also used to identify hidden groups of terrorists and criminals in security related applications.
[
]
Problem definition
Consider a network , where represents the entity nodes in the network and x represents the set of "true" links across entities in the network.
We are given the set of entities and a subset of true links which are referred to as ''observed links''.
The goal of link prediction is to identify the unobserved true links.
In the temporal formulation of link prediction the observed links correspond to true links at a time , and the goal is to infer the set of true links at time
Usually, we are also given a subset of unobserved links called ''potential links'' , and we need to identify true links among these potential links.
In the binary classification formulation of the link prediction task the potential links are classified as either true links or false links.
Link prediction approaches for this setting learn a classifier that maps links in to positive and negative labels i.e. .
In the probability estimation formulation, potential links are associated with existence probabilities.
Link prediction approaches for this setting learn a model that maps links in to a probability i.e.