# An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

@article{Shao2015AnES, title={An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs}, author={Yingxia Shao and Bin Cui and Lei Chen and Mingming Liu and Xing Xie}, journal={Proc. VLDB Endow.}, year={2015}, volume={8}, pages={838-849} }

SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large…

## Figures and Tables from this paper

## 51 Citations

UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph

- Computer Science2017 IEEE 33rd International Conference on Data Engineering (ICDE)
- 2017

A Monte Carlo based method, UniWalk, is designed to enable the fast top-k SimRank computation over large undirected graphs without indexing, and outperforms the state-of-the-art methods by orders of magnitude.

Efficient SimRank-Based Similarity Join

- Computer ScienceACM Trans. Database Syst.
- 2017

This article adopts “SimRank” to evaluate the similarity between two vertices in a large graph because of its generality, and proposes an efficient method without building the vertex-pair graph to find the h-go cover + vertex pairs.

Accelerating pairwise SimRank estimation over static and dynamic graphs

- Computer ScienceThe VLDB Journal
- 2018

Three algorithms to query pairwise SimRank over static and dynamic graphs efficiently, by using different sample reduction strategies are proposed, and it is shown that these algorithms outperform the state-of-the-artstatic and dynamic solutions for pairwiseSimRank estimation.

Efficient Similarity Search for Sets over Graphs

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2021

Camo is presented, an efficient algorithm for retrieving the top-k similarities from an arbitrary set of pairs and two types of indexes are introduced to boost the efficiency of Carmo.

Dynamical SimRank search on time-varying networks

- Computer ScienceThe VLDB Journal
- 2017

The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.

SimRank*: effective and scalable pairwise similarity search based on graph topology

- Computer ScienceThe VLDB Journal
- 2018

This paper proposes an effective and scalable similarity model, SimRank*, which can resolve the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, and empirically verify the richer semantics of SimRank, and validate its high computational efficiency and scalability on large graphs with billions of edges.

P-Simrank: Extending Simrank to Scale-Free Bipartite Networks

- Computer ScienceWWW
- 2020

P-Simrank is introduced which extends the idea of Simrank to Scale-free bipartite networks and produces sub-optimal similarity scores in case of bipartITE graphs where degree distribution of vertices follow power-law.

Efficient similarity join for certain graphs

- Computer ScienceMicrosystem Technologies
- 2019

This paper proposes an efficient similarity join method, where local sensitive hash (LSH) and Minhash are used to sharply reduce the time needed to compare candidate graph pairs as well as improve the quality of similarity matching through graph associated vertex degree matrix.

An Experimental Evaluation of SimRank-based Similarity Search Algorithms

- Computer ScienceProc. VLDB Endow.
- 2017

Depending on the requirements of different applications, the optimal choice of algorithms differs, and this paper provides an empirical guideline for making such choices.

Efficient graph similarity join for information integration on graphs

- Computer ScienceFrontiers of Computer Science
- 2015

A preprocessing strategy to remove the mismatching graph pairs with significant differences and a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method.

## References

SHOWING 1-10 OF 32 REFERENCES

Efficient SimRank-based Similarity Join Over Large Graphs

- Computer ScienceProc. VLDB Endow.
- 2013

This paper adopts "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality, and extends the technique to the partition-based framework.

Efficient search algorithm for SimRank

- Computer Science2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013

The solution, SimMat, is based on two ideas: It computes the approximate similarity of a selected node pair efficiently in non-iterative style based on the Sylvester equation, and it prunes unnecessary approximate similarity computations when searching for the high similarity nodes by exploiting estimationsbased on the Cauchy-Schwarz inequality.

Scalable similarity search for SimRank

- Computer ScienceSIGMOD Conference
- 2014

This paper proposes a very fast and scalable SimRank-based similarity search problem, and establishes a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of the linear recursive formula.

Exploiting the Block Structure of Link Graph for Efficient Similarity Computation

- Computer SciencePAKDD
- 2009

An algorithm called BlockSimRank is proposed, which partitions the link graph into blocks, and obtains similarity of each node-pair in the graph efficiently, based on random walk on two-layer model with time complexity as low as O (n 4/3) and less memory need.

Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs

- Computer ScienceWAIM
- 2010

This paper presents a novel algorithm to estimate the SimRank between vertices in O(n3 + K ċ n2) time, where n is the number of vertices, and K isThe number of iterations.

Parallel SimRank computation on large graphs with iterative aggregation

- Computer ScienceKDD
- 2010

This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.

Towards efficient SimRank computation on large networks

- Computer Science, Mathematics2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013

An adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and an efficient algorithm for speeding up the computation of SimRank to 0(Kd'n2) time, where d' is typically much smaller than the average in-degree of a graph.

Fast computation of SimRank for static and dynamic information networks

- Computer ScienceEDBT '10
- 2010

A family of novel approximate SimRank computation algorithms for static and dynamic information networks are developed and their corresponding theoretical justification and analysis are given.

On Top-k Structural Similarity Search

- Computer Science2012 IEEE 28th International Conference on Data Engineering
- 2012

An algorithmic framework called TopSim is proposed based on transforming the top-k SimRank problem on a graph G to one of finding thetop-k nodes with highest authority on the product graph G G, which further accelerate Top Sim by merging similarity paths and develop a more efficient algorithm called Top Sim-SM.

Scaling link-based similarity search

- Computer ScienceWWW '05
- 2005

The experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.