An Experimental Evaluation of SimRank-based Similarity Search Algorithms

@article{Zhang2017AnEE,
  title={An Experimental Evaluation of SimRank-based Similarity Search Algorithms},
  author={Zhipeng Zhang and Yingxia Shao and Bin Cui and Ce Zhang},
  journal={Proc. VLDB Endow.},
  year={2017},
  volume={10},
  pages={601-612}
}
Given a graph, SimRank is one of the most popular measures of the similarity between two vertices. We focus on efficiently calculating SimRank, which has been studied intensively over the last decade. This has led to many algorithms that efficiently calculate or approximate SimRank being proposed by researchers. Despite these abundant research efforts, there is no systematic comparison of these algorithms. In this paper, we conduct a study to compare these algorithms to understand their pros… 
SimRank*: effective and scalable pairwise similarity search based on graph topology
TLDR
This paper proposes an effective and scalable similarity model, SimRank*, which can resolve the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, and empirically verify the richer semantics of SimRank, and validate its high computational efficiency and scalability on large graphs with billions of edges.
Dynamical SimRank search on time-varying networks
TLDR
The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.
ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs
TLDR
ProbeSim is presented, an index-free algorithm for single-source and top-$k$ SimRank queries that provides a non-trivial theoretical guarantee in the absolute error of query results and offers satisfying practical efficiency and effectiveness.
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs
TLDR
Prsim is proposed, an algorithm that exploits the structure of graphs to efficiently answer single-source SimRank queries and runs in sub-linear time if the degree distribution of the input graph follows the power-law distribution, a property possessed by many real-world graphs.
Fast top-k similarity search in large dynamic attributed networks
READS: A Random Walk Approach for Efficient and Accurate Dynamic SimRank
TLDR
A random walk based indexing scheme to compute SimRank efficiently and accurately over large dynamic graphs is proposed and it is shown that the algorithm outperforms the state-of-the-art static and dynamic SimRank algorithms.
Boosting SimRank with Semantics
TLDR
This work revisits SimRank, a popular and well studied similarity measure for information networks, that quantifies the similarity of two nodes based on the similarities of their neighbors, and asks can SimRank be enriched with semantics while preserving its semantics.
UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph
TLDR
A Monte Carlo based method to enable the fast top-to-bottom SimRank computation over large undirected graphs, which outperforms the state-of-the-art methods by orders of magnitude and is extended to existing distributed graph processing frameworks to improve its scalability.
Memory-Aware Framework for Efficient Second-Order Random Walk on Large Graphs
TLDR
A cost model is designed, and a new node sampling method following the acceptance-rejection paradigm is proposed to achieve a better balance between memory and time cost and a memory-aware framework is proposed on the basis of the cost model.
Computing User Similarity by Combining SimRank++ and Cosine Similarities to Improve Collaborative Filtering
TLDR
The experimental results indicate that the proposed aggregated similarity measure overall outperforms the other three similarity measures in terms of both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), especially in the cases of 30-100 nearest neighbors.
...
...

References

SHOWING 1-10 OF 28 REFERENCES
Scalable similarity search for SimRank
TLDR
This paper proposes a very fast and scalable SimRank-based similarity search problem, and establishes a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of the linear recursive formula.
An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs
TLDR
This paper proposes a novel two-stage random-walk sampling framework (TSF) for SimRank-based similarity search (e.g., top-k search) and demonstrates that TSF can handle dynamic billion-edge graphs with high performance.
Fast Single-Pair SimRank Computation
TLDR
This paper proposes a Single-Pair SimRank approach that performs an iterative computation to obtain the similarity of a single node-pair and confirms the accuracy and efficiency of this approach in extensive experimental studies over synthetic and real datasets.
Efficient Partial-Pairs SimRank Search for Large Networks
TLDR
A novel "seed germination" model that computes partial-pairs SimRank in O(k|E| min{|A|, |B|}) time and O(|E | + k|V|) memory for k iterations on a graph of |V| nodes and |E| edges, allowing scores to be assessed accurately on graphs with tens of millions of links.
Efficient SimRank-based Similarity Join Over Large Graphs
TLDR
This paper adopts "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality, and extends the technique to the partition-based framework.
Efficient search algorithm for SimRank
TLDR
The solution, SimMat, is based on two ideas: It computes the approximate similarity of a selected node pair efficiently in non-iterative style based on the Sylvester equation, and it prunes unnecessary approximate similarity computations when searching for the high similarity nodes by exploiting estimationsbased on the Cauchy-Schwarz inequality.
Scaling link-based similarity search
TLDR
The experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.
SimRank: a measure of structural-context similarity
TLDR
A complementary approach, applicable in any domain with object-to-object relationships, that measures similarity of the structural context in which objects occur, based on their relationships with other objects is proposed.
A recommender system based on local random walks and spectral methods
TLDR
This paper designs recommender systems for weblogs based on the link structure among them and designs a similarity metric among nodes of a social network using the eigenvalues and eigenvectors of a normalized adjacency matrix of the social network graph.
SimFusion: measuring similarity using unified relationship matrix
TLDR
It is claimed that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require com- bination of information from heterogeneous sources.
...
...