An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

@article{Shao2015AnES,
  title={An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs},
  author={Yingxia Shao and Bin Cui and Lei Chen and Mingming Liu and Xing Xie},
  journal={Proc. VLDB Endow.},
  year={2015},
  volume={8},
  pages={838-849}
}
SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in terms of time and space cost. None of them can efficiently support similarity search over large… 
UniWalk: Unidirectional Random Walk Based Scalable SimRank Computation over Large Graph
TLDR
A Monte Carlo based method, UniWalk, is designed to enable the fast top-k SimRank computation over large undirected graphs without indexing, and outperforms the state-of-the-art methods by orders of magnitude.
Efficient SimRank-Based Similarity Join
TLDR
This article adopts “SimRank” to evaluate the similarity between two vertices in a large graph because of its generality, and proposes an efficient method without building the vertex-pair graph to find the h-go cover + vertex pairs.
Accelerating pairwise SimRank estimation over static and dynamic graphs
TLDR
Three algorithms to query pairwise SimRank over static and dynamic graphs efficiently, by using different sample reduction strategies are proposed, and it is shown that these algorithms outperform the state-of-the-artstatic and dynamic solutions for pairwiseSimRank estimation.
Efficient Similarity Search for Sets over Graphs
TLDR
Camo is presented, an efficient algorithm for retrieving the top-k similarities from an arbitrary set of pairs and two types of indexes are introduced to boost the efficiency of Carmo.
Dynamical SimRank search on time-varying networks
TLDR
The efficient dynamical computation of all-pairs SimRanks on time-varying graphs is studied and it is shown that the SimRank update in response to every link update is expressible as a rank-one Sylvester matrix equation.
SimRank*: effective and scalable pairwise similarity search based on graph topology
TLDR
This paper proposes an effective and scalable similarity model, SimRank*, which can resolve the “zero-similarity” problem that exists in Jeh and Widom’s SimRank model, and empirically verify the richer semantics of SimRank, and validate its high computational efficiency and scalability on large graphs with billions of edges.
P-Simrank: Extending Simrank to Scale-Free Bipartite Networks
TLDR
P-Simrank is introduced which extends the idea of Simrank to Scale-free bipartite networks and produces sub-optimal similarity scores in case of bipartITE graphs where degree distribution of vertices follow power-law.
Efficient similarity join for certain graphs
TLDR
This paper proposes an efficient similarity join method, where local sensitive hash (LSH) and Minhash are used to sharply reduce the time needed to compare candidate graph pairs as well as improve the quality of similarity matching through graph associated vertex degree matrix.
An Experimental Evaluation of SimRank-based Similarity Search Algorithms
TLDR
Depending on the requirements of different applications, the optimal choice of algorithms differs, and this paper provides an empirical guideline for making such choices.
Efficient graph similarity join for information integration on graphs
TLDR
A preprocessing strategy to remove the mismatching graph pairs with significant differences and a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method.
...
...

References

SHOWING 1-10 OF 32 REFERENCES
Efficient SimRank-based Similarity Join Over Large Graphs
TLDR
This paper adopts "SimRank" to evaluate the similarity of two vertices in a large graph because of its generality, and extends the technique to the partition-based framework.
Efficient search algorithm for SimRank
TLDR
The solution, SimMat, is based on two ideas: It computes the approximate similarity of a selected node pair efficiently in non-iterative style based on the Sylvester equation, and it prunes unnecessary approximate similarity computations when searching for the high similarity nodes by exploiting estimationsbased on the Cauchy-Schwarz inequality.
Scalable similarity search for SimRank
TLDR
This paper proposes a very fast and scalable SimRank-based similarity search problem, and establishes a Monte-Carlo based algorithm to compute a single pair SimRank score s(u,v), which is based on the random-walk interpretation of the linear recursive formula.
Exploiting the Block Structure of Link Graph for Efficient Similarity Computation
TLDR
An algorithm called BlockSimRank is proposed, which partitions the link graph into blocks, and obtains similarity of each node-pair in the graph efficiently, based on random walk on two-layer model with time complexity as low as O (n 4/3) and less memory need.
Taming Computational Complexity: Efficient and Parallel SimRank Optimizations on Undirected Graphs
TLDR
This paper presents a novel algorithm to estimate the SimRank between vertices in O(n3 + K ċ n2) time, where n is the number of vertices, and K isThe number of iterations.
Parallel SimRank computation on large graphs with iterative aggregation
TLDR
This paper exploits the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs and proposes to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs.
Towards efficient SimRank computation on large networks
TLDR
An adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and an efficient algorithm for speeding up the computation of SimRank to 0(Kd'n2) time, where d' is typically much smaller than the average in-degree of a graph.
Fast computation of SimRank for static and dynamic information networks
TLDR
A family of novel approximate SimRank computation algorithms for static and dynamic information networks are developed and their corresponding theoretical justification and analysis are given.
On Top-k Structural Similarity Search
TLDR
An algorithmic framework called TopSim is proposed based on transforming the top-k SimRank problem on a graph G to one of finding thetop-k nodes with highest authority on the product graph G G, which further accelerate Top Sim by merging similarity paths and develop a more efficient algorithm called Top Sim-SM.
Scaling link-based similarity search
TLDR
The experimental results suggest that the hyperlink structure of vertices within four to five steps provide more adequate information for similarity search than single-step neighborhoods.
...
...