SALSA Algorithm
   HOME

TheInfoList



OR:

Stochastic Approach for Link-Structure Analysis (SALSA) is a web page ranking
algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
designed by R. Lempel and S. Moran to assign high scores to hub and authority web pages based on the quantity of hyperlinks among them.


Origins

SALSA is inspired by two other link-based ranking algorithms, namely
HITS Hits or H.I.T.S. may refer to: Arts, entertainment, and media Music * '' H.I.T.S.'', 1991 album by New Kids on the Block * ''...Hits'' (Phil Collins album), 1998 * ''Hits'' (compilation series), 1984–2006; 2014, a British compilation album s ...
and
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. Accordin ...
, in the following ways: * like HITS, the algorithm assigns two scores to each web page: a hub score and an authority score. An authority is a page which is significantly more relevant to a given topic than other pages, whereas a hub is a page which contains many links to authorities; * like HITS, SALSA also works on a ''focused subgraph'' which is topic-dependent. This focused subgraph is obtained by first finding a set of pages most relevant to a given topic (e.g. take the top-n pages returned by a text-based search algorithm) and then augmenting this set with web pages that link directly to it and with pages that are linked directly from it. Because of this selection process, the hub and authority scores are topic-dependent; * like PageRank, the algorithm computes the scores by simulating a random walk through a
Markov chain In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally ...
that represents the graph of web pages. SALSA however works with two different Markov chains: a chain of hubs and a chain of authorities. This is a departure from HITS's notions of hubs and authorities based on a mutually reinforcing relationship.


Properties

SALSA can be seen as an improvement of HITS. It is computationally lighter since its ranking is equivalent to a weighted in/out degree ranking. The computational cost of the algorithm is a crucial factor since HITS and SALSA are computed at query time and can therefore significantly affect the response time of a search engine. This should be contrasted with query-independent algorithms like PageRank that can be computed off-line. SALSA is less vulnerable to the Tightly Knit Community (TKC) effect than HITS. A TKC is a topological structure within the Web that consists of a small set of highly interconnected pages. The presence of TKCs in a focused subgraph is known to negatively affect the detection of meaningful authorities by HITS. The
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
Social network uses a SALSA style algorithm to suggest accounts to follow.Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Bosagh Zade
WTF: The who-to-follow system at Twitter
Proceedings of the 22nd international conference on World Wide Web


References

*{{cite journal , last=Lempel , first=R. , author2=Moran S. , date=April 2001 , title=SALSA: The Stochastic Approach for Link-Structure Analysis , journal=ACM Transactions on Information Systems , volume=19 , issue=2 , pages=131–160 , doi=10.1145/382979.383041, citeseerx=10.1.1.38.5859 , s2cid=9607841 Internet search algorithms Link analysis