• Corpus ID: 2102067

Large-Scale Spectral Clustering on Graphs

  title={Large-Scale Spectral Clustering on Graphs},
  author={Jialu Liu and Chi Wang and Marina Danilevsky and Jiawei Han},
  booktitle={International Joint Conference on Artificial Intelligence},
Graph clustering has received growing attention in recent years as an important analytical technique, both due to the prevalence of graph data, and the usefulness of graph structures for exploiting intrinsic data characteristics. However, as graph data grows in scale, it becomes increasingly more challenging to identify clusters. In this paper we propose an efficient clustering algorithm for large-scale graph data using spectral methods. The key idea is to repeatedly generate a small number of… 

Figures and Tables from this paper

Developing an efficient spectral clustering algorithm on large scale graphs in spark

An Efficient Spectral Clustering Algorithm on Large Scale Graphs in Spark (ESCALG) is proposed, using map reduce function and shuffling phases in Dijkstra's algorithm and a sparse matrix as a data structure, which less time in execution.

Large-scale spectral clustering using diffusion coordinates on landmark-based bipartite graphs

This work proposes a landmark-based scalable spectral clustering approach in which the selected landmark set and the given data are used to form a bipartite graph and then run a diffusion process on it to obtain a family of diffusion coordinates for clustering.

A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs

DPOCG is presented, a dynamic programming framework for large-scale online clustering on graphs, which improves the scalability of a wide range of graph clustering algorithms and theoretically analyze DPOCG in terms of supernode generation, clusters on reduced graph, and computational complexity.

gSparsify: Graph Motif Based Sparsification for Graph Clustering

This paper proposes gSparsify, a graph sparsification method, to preferentially retain a small subset of edges from a graph which are more likely to be within clusters, while eliminating others with less or no structure correlation to clusters, enabling faster graph clustering without a compromise to clustering quality.

Orthogonal and Nonnegative Graph Reconstruction for Large Scale Clustering

A novel approach denoted by orthogonal and nonnegative graph reconstruction (ONGR) that scales linearly with the data size and offers interpretability that the final cluster labels can be directly obtained without post-processing is proposed.

Structured Graph Reconstruction for Scalable Clustering

The proposed method has linear time complexity with respect to the data size that it mainly needs to implicitly construct a graph and iteratively perform economical singular value decomposition for a small size matrix and the interpretability of the indicator matrix is offered due to the nonnegative constraint, and thus the method can provide the cluster labels with no post-processing.

A Divide and Conquer Framework for Distributed Graph Clustering

A novel divide and conquer framework for graph clustering is proposed, and theoretical guarantees of exact recovery of the clusters are established, which can identify small clusters.

Improved spectral clustering based on Nyström method

A spectral clustering algorithm for massive data analysis based on the proposed Nyström sampling method, and the experiments show the method is both feasible and effective.

Large-Scale Spectral Clustering Based on Representative Points

The proposed RPSC method first generates two-layer representative points successively by BKHK, then it constructs the hierarchical bipartite graph and performs spectral analysis on the graph using the parameter-free neighbor assignment method, which avoids the need to tune the extra parameters.

Scalable Spectral Clustering Using Random Binning Features

This paper presents a novel scalable spectral clustering method using Random Binning features (RB) to simultaneously accelerate both similarity graph construction and the eigendecomposition and introduces a state-of-the-art SVD solver to effectively compute eigenvectors of a large sparse feature matrix generated by RB.



A Spectral Clustering Approach To Finding Communities in Graph

This paper shows how optimizing the Q function can be reformulated as a spectral relaxation problem and proposes two new spectral clustering algorithms that seek to maximize Q and indicates that the new algorithms are efficient and effective at finding both good clusterings and the appropriate number of clusters across a variety of real-world graph data sets.

Fast Spectral Clustering of Data Using Sequential Matrix Compression

This paper proposes a very fast and scalable spectral clustering algorithm called the sequential matrix compression (SMC) method, which scale down the computational complexity of spectral clusters by sequentially reducing the dimension of the Laplacian matrix in the iteration steps with very little loss of accuracy.

Experiments on Graph Clustering Algorithms

An experimental evaluation of graph clustering approaches is conducted and by combining proven techniques from graph partitioning and geometric clustering, a new approach is introduced that compares favorably.

Large Scale Spectral Clustering with Landmark-Based Representation

This paper proposes a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems, where the original data points are represented as the linear combinations of landmarks and the spectral embedding of the data can be efficiently computed with the landmark-based representation.

Large Scale Spectral Clustering Using Resistance Distance and Spielman-Teng Solvers

This paper bypasss the eigen-decomposition of the original Laplacian matrix by leveraging the recently introduced Spielman and Teng near-linear time solver for systems of linear equations and random projection.

Parallel Spectral Clustering in Distributed Systems

This work investigates two representative ways of approximating the dense similarity matrix and picks the strategy of sparsifying the matrix via retaining nearest neighbors and investigates its parallelization, which can effectively handle large problems.

Community detection in graphs

Parallel Spectral Clustering Algorithm for Large-Scale Community Data Mining

Empirical study on a large community dataset obtained from Orkut demonstrates the scalability of the parallel spectral clustering algorithm, which is parallelized by dividing both memory use and computation on distributed machines.

Fast approximate spectral clustering

This work develops a general framework for fast approximate spectral clustering in which a distortion-minimizing local transformation is first applied to the data, and develops two concrete instances of this framework, one based on local k-means clustering (KASP) and onebased on random projection trees (RASP).

Co-clustering documents and words using bipartite spectral graph partitioning

A new spectral co-clustering algorithm is used that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings and it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitionsing problem.